The code in JustBasic can be found here complete with a text input file.
BASIC Files archive only - forum post
EBPE by TomC 02/2014 - Encoding of Byte Byte Code
EBPE uses two mail processes to encode byte pairs
1. Compresses the dictionary (considered new)
The dictionary contains 3 bytes:
AA – the two char to be replaced by (byte pair) 1 – this single token (tokens are unused symbols)
So "AA1" tells us when it decodes that every time we see "1" in the data file, replace it with "AA" .
While long sequences of consecutive tokens are possible, let's look at these 8 tokens:
AA1BB3CC4DD5EE6FF7GG8HH9
Length 24 bytes (8 * 3)
Token 2 is not in a file indicating that it is not an open token using or another way of saying this: 2 was in the source data.
We can see that the last 7 tokens 3,4,5,6,7,8,9 are sequential, therefore at any time we see the sequential launch of 4 tokens or more, allows us to change our dictionary:
AA1BB3<255>CCDDEEFFGGHH<255>
Where <255> tells us that tokens for pairs of bytes are implied and increased by 1 more than the last token we saw ( 3 ). We increment by one until we see the next <255> indicating the end of the run.
- The source dictionary was 24 bytes,
- Advanced Dictionary - 20 bytes.
I saved 175 bytes using this extension in a text file, where the tokens from 128 to 254 will be sequentially, as well as others in general, enable the launch created by the lowercase preprocessing.
2. Compresses the data file
The reuse of rarely used characters as tokens is not new.
After using all the characters for compression (except <255> ), we look at the file and find one "j" . Let this char do a double duty:
"<255>j" means it's literal "j""j" now used as a token for re-compression,
If j happened 1 time in the data file, we need to add 1 <255> and a 3-byte entry in the dictionary, so we need to save more than 4 bytes in the BPE because it's worth it.
If j happened 6 times, we need 6 <255> and a 3-byte dictionary so we need to store more than 9 bytes in the BPE, so it's worth it.
Depending on whether further compression is possible and how many pairs of bytes left in the file, this post-process saved more than 100 bytes on test runs.
Note. When unpacking, make sure that you do not unpack every "j" . You need to look at the previous character to make sure that it is not <255> in order to unpack. Finally, after decompression, go and delete <255> to recreate the original file.
3. What will happen next in EBPE?
Unknown at present