So, I am implementing a continue fraction library to handle a subset of quadratic integers and rational numbers . Continued members of the fraction are represented by unsigned integers. When working with continuous fractions, I noticed the following general patterns:
- Most of the terms are small single digits, with the most common being 1.
- Some terms can be huge, the maximum possible for my application is 366 bits, but this is very rare.
- A large term is an approximation with a very good number, which means that for a large fraction there are usually fewer members.
- The worst possible continued fraction is the golden ratio, and a rational approximation of this with 366 bits of precision corresponds to about 525 1 per line.
- Random rational numbers usually do not have large runs of the same number, but can have from two to four lines, with 1 again being the most common.
Thus, I have a limit both in the number of members and in the size of the member, and the number of members is approximately inversely proportional to their size. Thus, storing these terms in machine words or even bytes is usually very wasteful for space, and I still need to handle verbose arithmetic in the worst case. Given the roughly inverse relationship between the size of the term and the number of terms (which both also depend on the size of the numerator of fractions and the denominator), I tried to find or come up with a good compression scheme so that I do not lose as much space as the integer terms.
I looked at Huffman encoding , since the encoding and decoding speed is good, but I'm not sure how to come up with probabilities for code values. I have a vague intuition that Stern-Brocot Trees can offer a hint, since they are binary trees with a direct connection with continued fractions.
Does anyone know of a good compression approach for handling a large number of small numbers with random huge numbers, where runs of the same number are usually short (but can be long in rare cases)? In particular, I need to be able to encode and decode pretty quickly (for example, O (n * lg (n)) is the absolute worst speed I can tolerate with O (n) or better) and be able to get the position separate terms so that I know from what date I work (fourth, fifth, etc.).
In addition, I am not interested in using any third-party real or ongoing fraction libraries. I looked at a few, and they are either insufficient or too small for my needs, and I would like the coding experience to be for my own edification.
Update
I also found out about the Gauss-Kuzmin distribution , which gives the probability distribution of a particular continued member of the fraction k for a random number evenly distributed between 0 and 1. This
P(k) = -lg(1.0 - 1.0/((k + 1.0)**2) in the Python pseudocode, where lg is the logarithm of base 2. This is the limit, as k approaches infinity, so my case is somewhat more limited ( although 2**366 is still huge.) The entropy of continued fractions by Linas Vepstas gives (information) the entropy of the Gauss-Kuzmin distribution of about 3.43 bits. My maximum denominator is large enough that the informational entropy of my continued fractions is probably close to this limit, although the graph in a related article shows that the restriction is approaching rather slowly, and therefore to that it differs for relatively small denominators.