netstring
A relatively simple netstring format is possible for this application.
For example, the text "hello world!" encoded as:
12:hello world!,
An empty string is encoded as three characters:
0:,
which can be represented as a series of bytes
'0' ':' ','
The word 0x1234abcd in one netstring (using the network byte order ), followed by the word 0xabcd3f56 in another netstring, is encoded as a series of bytes
'\n' '4' ':' 0x12 0x34 0xab 0xcd ',' '\n' '\n' '4' ':' 0xab 0xcd 0x3f 0x56 ',' '\n'
(the newline character '\ n' before and after each netstring is optional, but makes testing and debugging easier).
frame synchronization
how can I make sure that the device does not start reading in the wrong place.
A common solution to the frame synchronization problem is to read into a temporary buffer, hoping we started reading in the right place. Later we run some checks for message consistency in the buffer. If the message does not pass the check, something went wrong, so we throw the data into the buffer and start all over again. (If this was an important message, we hope the transmitter redirects it).
For example, if a serial cable is connected to the middle of the first grid, the receiver sees a byte string:
0xab 0xcd ',' '\n' '\n' '4' ':' 0xab 0xcd 0x3f 0x56 ',' '\n'
Since the receiver is smart enough to wait for the ":" before waiting for the next byte to be valid, the receiver will be able to ignore the first partial message and then correctly receive the second message.
In some cases, you know in advance what the actual length (s) of the messages will be; making it even easier for the receiver to detect that he began to read in the wrong place.
sending a start message token as data
I thought of using a marker to start the message, but what if I want to send a number that I select as data?
After sending the netstring header, the transmitter sends the raw data as is - even if it looks like a message start marker.
In the normal case, the receiver already has frame synchronization. The netstring parser has already read the header "length" and ":", so the netstring parser places the raw data bytes in the right place in the buffer - even if these data bytes look like the header bytes ":" or "," the bottom byte.
pseudo code
// netstring parser for receiver // WARNING: untested pseudocode // 2012-06-23: David Cary releases this pseudocode as public domain. const int max_message_length = 9; char buffer[1 + max_message_length]; // do we need room for a trailing NULL ? long int latest_commanded_speed = 0; int data_bytes_read = 0; int bytes_read = 0; int state = WAITING_FOR_LENGTH; reset_buffer() bytes_read = 0; // reset buffer index to start-of-buffer state = WAITING_FOR_LENGTH; void check_for_incoming_byte() if( inWaiting() ) // Has a new byte has come into the UART? // If so, then deal with this new byte. if( NEW_VALID_MESSAGE == state ) // oh dear. We had an unhandled valid message, // and now another byte has come in. reset_buffer(); char newbyte = read_serial(1); // pull out 1 new byte. buffer[ bytes_read++ ] = newbyte; // and store it in the buffer. if( max_message_length < bytes_read ) reset_buffer(); // reset: avoid buffer overflow switch state: WAITING_FOR_LENGTH: // FIXME: currently only handles messages of 4 data bytes if( '4' != newbyte ) reset_buffer(); // doesn't look like a valid header. else // otherwise, it looks good -- move to next state state = WAITING_FOR_COLON; WAITING_FOR_COLON: if( ':' != newbyte ) reset_buffer(); // doesn't look like a valid header. else // otherwise, it looks good -- move to next state state = WAITING_FOR_DATA; data_bytes_read = 0; WAITING_FOR_DATA: // FIXME: currently only handles messages of 4 data bytes data_bytes_read++; if( 4 >= data_bytes_read ) state = WAITING_FOR_COMMA; WAITING_FOR_COMMA: if( ',' != newbyte ) reset_buffer(); // doesn't look like a valid message. else // otherwise, it looks good -- move to next state state = NEW_VALID_MESSAGE; void handle_message() // FIXME: currently only handles messages of 4 data bytes long int temp = 0; temp = (temp << 8) | buffer[2]; temp = (temp << 8) | buffer[3]; temp = (temp << 8) | buffer[4]; temp = (temp << 8) | buffer[5]; reset_buffer(); latest_commanded_speed = temp; print( "commanded speed has been set to: " & latest_commanded_speed ); } void loop () # main loop, repeated forever # then check to see if a byte has arrived yet check_for_incoming_byte(); if( NEW_VALID_MESSAGE == state ) handle_message(); # While we're waiting for bytes to come in, do other main loop stuff. do_other_main_loop_stuff();
more tips
When defining a serial communication protocol, I believe that testing and debugging is much easier if the protocol always uses ASCII text characters that are human-readable and not any arbitrary binary values.
frame synchronization (again)
I thought of using a marker to start the message, but what if I want to send a number that I select as data?
We have already considered the case when the receiver already has frame synchronization. The case where the receiver does not yet have frame synchronization is rather dirty.
The simplest solution is to send the transmitter a series of harmless bytes (possibly newline or space characters) the length of the maximum possible valid message, like a preamble immediately before each grid. No matter what state the receiver is in when the serial cable is connected, these harmless bytes ultimately put the receiver in WAITING_FOR_LENGTH state. And then, when the tranmitter sends the packet header (the length followed by ":"), the receiver correctly recognizes it as the packet header and restored frame synchronization.
(The transmitter should not transmit this preamble before each packet. Perhaps the transmitter could send it to 1 out of 20 packets; then the receiver is guaranteed to restore frame synchronization in 20 packets (usually less) after connecting a serial cable).
other protocols
Other systems use a simple Fletcher-32 checksum or something more complex to detect many kinds of errors that the netstring format cannot detect (<a href = "" rel = "nofollow"> a , b ), and can even be synchronized without preamble.
Many protocols use a special “start of packet” token and use various “screening” methods to avoid actually sending a literal “initial packet” of bytes in the transmitted data, even if the real data we want to send has such a value. ( Consistent overhead byte filling , bit stuffing , quoted-printable and other types of binary text encoding , etc ..).
These protocols have the advantage that the receiver can be sure that when we see the “start of packet” token, this is the actual start of the packet (and not some data byte that accidentally matches the same value). This makes it easier to handle loss of synchronization - just drop the bytes to the next "start of packet" token.
Many other formats, including the netstring format, allow you to transfer any possible byte value as data. Thus, receivers should be smarter than handling the byte of the start header, which can be the actual start header or the data byte - but at least they don't have to deal with “escaping” or a surprisingly large buffer is needed, in the worst case, hold "fixed 64 byte data message" after exiting.
Choosing one approach is really no simpler than another - it just pushes complexity to another place, as the waterbed theory predicts .
Could you give up discussing various ways to handle start-of-header bytes, including these two methods, in Wikibook Sequential Programming , and editing this book to make it better?