Does anyone know this unusual data format?

Has anyone seen this data format? I was provided with a huge number of records for import from a flat file that contains numeric fields in some kind of packed binary format. I know from the context that they represent numbers, and I have some existing translations / decodings, enough to tell me a little about how to convert. A low order byte represents the least significant digit and may have a character encoded. Here is the decoded digit, then the encoded byte and the corresponding bit pattern.

0, 0c, 0000 1100 1, 1c, 0001 1100 2, b1, 1011 0001 3, 14, 0001 0100 4, 3c, 0011 1100 5, 2a, 0010 1010 6, 25, 0010 0101 7, 40, 0100 0000 8, d0, 1101 0000 9, 91, 1001 0001 

Bytes that go beyond this first one seem to pack two values, it looks like 100 mappings from 00 to 99, I will show a few here, first a decoded pair of digits and a hexadecimal value.

 00, 00, 0000 0000 01, 01, 0000 0001 02, 02, 0000 0010 03, 03, 0000 0011 04, dc, 1101 1100 05, 09, 0000 1001 06, c3, 1100 0011 07, 7f, 0111 1111 08, ca, 1100 1010 09, b2, 1011 0010 10, 10, 0001 0000 11, 11, 0001 0001 12, 12, 0001 0010 13, 13, 0001 0011 14, db, 1101 1011 15, da, 1101 1010 16, 08, 0000 1000 17, c1, 1100 0001 18, 18, 0001 1000 19, 19, 0001 1001 20, c4, 1100 0100 21, b3, 1011 0011 22, c0, 1100 0000 23, d9, 1101 1001 24, bf, 1011 1111 

If I came across 000125, then the result would be 16. 000000c90c converts to 350. If I find 000000000000000f, it should convert to 0, but I don't see how 0000ec should result in -8.

There are enough repeating patterns here that make me suspect that this is a kind of encoding. And what I have now is enough to decode a lot of positive numbers, but not all, and I don’t know how to handle negative values, and I’m not sure that information is lost in my comparison (thinking about ieee floating point forms) .

Any ideas? Thanks!

+7
source share
1 answer

Since he does not use any of the traditional mainframe formats, nor the parity / error correction scheme (counting the set bits), I can only assume that this is not something common in recent history. Maybe some kind of XOR operation is applied to one of these old formats, but if so, it doesn't seem to match the pattern I can detect.

Given that no one has seen this format or knows how to write an algorithm to decode it, I just assume that it should have been a semi-judgmental attempt to encrypt numbers. If I can find the time, I will write a code to analyze all 100 million values ​​and see if I can find anything useful, but for now I just wait and see if the data creators / can provide an answer. Or a hint.

I am going to note this because I do not want to torment people with an insoluble mystery. Sorry if someone was upset, I only hoped that it was something obscure that someone here could see before.

0
source

All Articles