I have a project where I am testing a device that is very sensitive to noise (electromagnetic, radio, etc.). The device generates 5-6 bytes per second of binary data (looks like gibberish for the untrained eye) based on input input (audio).
Depending on the noise, once the device will skip characters, sometimes it will insert random characters, sometimes multiple of them.
I wrote an application that gives the user the ability to see on the fly the errors that he creates (compared to the main file [for example, what the device should output in ideal conditions]). My algorithm basically takes every byte in the live data and compares it with the byte at the same position in the known main file. If the bytes do not match, I have a window of 10 characters in both directions from the current position, where I will look for a match nearby. If this matches (plus a check or two), I visually mark the location in the user interface and log an error.
This approach works quite well, and in fact, given the speed of incoming data, it works in real time. However, I feel that what I am doing is not optimal, and the approach will fall apart if the data flows at a faster rate.
Are there other approaches? Are algorithms known for this type of thing? I read many years ago that NASA datasets (for example, those that communicate with spacecraft and on the Moon / Mars) had a data loss of 0.00001%, despite the huge interference in space.
Any ideas?
source
share