What could cause my packet byte order to become partially scrambled?

I am sending packets over a TCP socket between a Linux Centos 4 machine and a Windows XP machine running Interix with Gentoo. When a packet is received by Interix, about 10% of the characters are sequentially scrambled at the same offsets from the beginning of the packet. On the sending side of Linux, the package has this correct content:

-----BEGIN PUBLIC KEY----- MIIBojCCARcGByqGSM4+AgEwggEKAoGBAP//////////yQ/aoiFowjTExmKLgNwc ^ ^^^^^^^^^^^^^ 0SkCTgiKZ8x0Agu+pjsTmyJRSgh5jjQE3e+VGbPNOkMbMCsKbfJfFDdP4TVtbVHC ^^^^^^^^ ReSFtXZiXn7G9ExC6aY37WsL/1y29Aa37e44a/taiZ+lrp8kEXxLH+ZJKGZR7OZT gf//////////AgECAoGAf//////////kh+1RELRhGmJjMUXAbg5olIEnBEUz5joB Bd9THYnNkSilBDzHGgJu98qM2eadIY2YFYU2+S+KG6fwmra2qOEi8kLauzEvP2N6 JiF00xv2tYX/rlt6A1v29xw1/a1Ez9LXT5IIviWP8ySUMyj2cynA//////////8D gYQAAoGAKcjWmS+h/a6xY6HfNeVBk+vU4ZQoi4ROBT8NXdiFQUeLwT/WpE/8oAxn KCOssVcoF54bF8JlEL0McWjQUzMrqoQedizALRRdH7kTUM/yqZZdxLgRFmiFDUXT XxsFFB5hlLpMqy9lqpNMN8+e5m9ISgu8zHMlTBQXsnwds0VkbeU= -----END PUBLIC KEY----- 

But on Interix, the contents of the package are a bit scrambled (but most are true):

 -----BEGIN PUBLIC KEY----- MIIBojCCARcGByqGSM4+AgEwggEKAoGBAP//////y////iFowjTExQ/aomKLgNwc ^ ^^^^^^^^^^^^^ KigTCkS0Z8x0Agu+pjsTmyJRSgh5jjQE3e+VGbPNOkMbMCsKbfJfFDdP4TVtbVHC ^^^^^^^^ ReSFtXZiXn7G9ExC6aY37WsL/1y29Aa37e44a/taiZ+lrp8kEXxLH+ZJKGZR7OZT gf//////////AgECAoGAf//////////kh+1RELRhGmJjMUXAbg5olIEnBEUz5joB Bd9THYnNkSilBDzHGgJu98qM2eadIY2YFYU2+S+KG6fwmra2qOEi8kLauzEvP2N6 JiF00xv2tYX/rlt6A1v29xw1/a1Ez9LXT5IIviWP8ySUMyj2cynA//////////8D gYQAAoGAKcjWmS+h/a6xY6HfNeVBk+vU4ZQoi4ROBT8NXdiFQUeLwT/WpE/8oAxn KCOssVcoF54bF8JlEL0McWjQUzMrqoQedizALRRdH7kTUM/yqZZdxLgRFmiFDUXT XxsFFB5hlLpMqy9lqpNMN8+e5m9ISgu8zHMlTBQXsnwds0VkbeU= -----END PUBLIC KEY----- 

I have pointed out the differences with ^ characters above. There may be a few more characters around y if repeated / hides the extra characters that were moved in this section.

This code works fine between several pairs of boards:

  • Linux and Linux
  • Linux and BSD
  • Linux and Cygwin

Could this be a bug in Interix and Gentoo code? I work in Windows XP, Interix v3.5. I notice that all the correct characters are present, but their order is scrambled sequentially, parts are reversed, others are cut out and inserted again in another place. The packet is read on the receiving side with ::read() in the TCP socket file descriptor. There is a lot of code here, so I'm not sure which parts would be most relevant to include, but will try to add additional code if specific requests are made.

 const int fd; // Passed in by caller. char *buf; // Passed in by caller. size_t want = count; // This value is 625 for the packet in question. // As ::read() is called, got is adjusted, until the whole packet is read. size_t got = 0; while (got < want) { // We call ::select() to ensure bytes are available before calling ::read(). ssize_t result = ::read(fd, buf, want - got); if (result < 0) { // Handle error (not getting called, so omitted). } else { if (result != 0) { // We are coming in here in one try and got is set to 625, the amount we want... // Not an error, increment the byte counter 'got' & the read pointer, // buf. got += result; buf += result; } else { // EOF because zero result from read. eof = true; // Connection reset by peer. break; } } } 

What experiments can I do to help get a hold of where the error comes from?

+4
source share
2 answers

The mystery is solved! The problem was that off_t was 32 bits wide on a Windows XP machine and 64 bits wide on a Centos machine. When a packet is sent, its memory layout, which includes some off_t objects, is placed from the host in the network byte order (from small end to large end), and then on the Windows machine, when it receives the packet, it returns from the network to the host. Since the memory layout was different, I got the scrambling seen above.

I solved the problem using my own soff_t everywhere, 64 bit wide.

However, I then ran into another problem when the compiler did not pack the structure in the same way on both machines, and on the windows it inserted 4 bytes in order to align the long 8 bytes, while on Centos it did not:

 typedef struct Option { char[56] _otherStuff; int _cpuFreq; int _bufSize; soff_t _fileSize; // Original bug fixed by forcing these 8 bytes wide. soff_t _seekTo; // Original bug fixed by forcing these 8 bytes wide. int _optionBits; int _padding; // To fix next bug, I added this 4 bytes long long _mtime; long long _mode; } __attribute__ ((aligned(1), packed)) Option; 

I used __attribute__ ((aligned(1), packed)) to make the packaging be consistent and dense, but in Windows XP this was not or cannot be done. I solved this by adding _padding to make the next 8-byte element be 8-byte aligned on Centos and thus accept Windows XP.

0
source

I would say that you have a concurrency error on "buf" or perhaps a duplicate of free() or reuse after free() .

0
source

All Articles