You can preproject lossless data to improve compression. Leave the first value as is. Make each subsequent value the difference between it and the previous value minus one. Are you sure that such differences are non-negative. Now encode each integer as a variable-length integer using sequences of bytes. For instance. when decoding 0.127 - one byte. If the high bit of this first byte is set (128..255), then take the low seven bits as the low seven bits of an integer and get the next byte. Use the entire byte if the most significant bit is zero as the next eight more significant bits, or only the least seven bits if the most significant bit is one. Continue until you reach a byte with a high bit equal to zero, which means the end of an integer.
Now you encoded the integers as a sequence of bytes, perhaps a little shorter than encoding each source integer, for example, four or eight bytes. In addition, you can now apply any standard compression technique that works in a series of bytes and can potentially expect some gains from this. For instance. if the series of consecutive line numbers are common, then you get a line with zero bytes, which is strongly compressed.
For maximum compression and decompression speeds while reducing compression, look at lz4 . If you don't need something fast, check out zlib , where you can choose the speed and efficiency of compression with the compression level.
For your examples, a random selection of 1500 out of 10000 results in compression of about 1720 bytes, compression of 1600 bytes. Random samples of 50,000 out of 100,000 results in compressed form no more than 50,000 bytes, 18600 bytes are compressed. Compressions were performed with the fastest zlib compression, level 1.
Please note that in the latter case, when half of the line numbers are used, it would be more efficient to use a bit array that would be uncompressed 12,500 bytes. In this case, the data cannot be compressed, since the bitmap seems random (half of the bits are set, half is not set). More or less, for example. 25,000 or 75,000, both results in bitmaps that are compressed to about 10,500 bytes.
Compressed bitmaps are smaller for approximately 12,500 line numbers and higher, while spaced variables integers are less than less than 12,500 line numbers. This clipping is the point at which both approaches have approximately the same uncompressed size of 12,500 bytes.