Here is a simple algorithm:
Note that you do not need to pass the alphabet used for coding. In addition, you do not use (and do not pass) the probabilities of the input characters, as in standard compressions, so we just re-encode the data somehow.
In this case, we can assume that the input data in the number are presented with a base equal to the power of the input alphabet. We just need to change its presentation to another base, this is a simple task.
EDITED example:
input alpabet: ABC , output alphabet: 0123456789
message ABAC will be transferred to 0102 in base 3, i.e. 11 (9 + 2) in base 10.
11 to the base 10: 11
We may have a problem decrypting it, because we do not know how many 0-es to use at the beginning of the decoded result, so we should use one of the modifications:
1) somehow encodes the size of the compressed data in the stream.
2) use dummy 1 at the beginning of the stream: this way our example will be:
10102 (base 3) = 81 + 9 + 2 = 92 (base 10).
Now, after decoding, we just need to ignore the first 1 (this also provides basic error detection).
The main problem with this approach is that in most cases (GCD == 1), each new coded character will completely change the result. It will be very difficult and difficult to implement. As a result, we get arithmetic coding as the best solution (in fact, this is a simplified version).
ruslik
source share