Compress URL text (not shorthand) and save in mysql

I have a url table in mysql that has only two id and varchar (255) fields for url. There are currently over 50 million URLs, and my boss has just made it clear that we are expanding our current project, which will add more URLs to this URL table, and the expected numbers are about 150 million in the middle in next year.

Currently, the database size is about 6 GB, so I can say with confidence that if everything is left in the same way, then it will cross 20 GB, which is not very good. So, I am thinking of some solution that can reduce url storage space.

I also want to make it clear that this table is not a busy table, and I have very few queries, so I just want to save disk space and, more importantly, I'm looking to learn new ideas for short text compression and storing it in mysql

BUT in the future, you can also access this table, so it’s better to optimize the table long before the time comes.

I did a bit of work to change the url to number and save using BIGINT, but since it has limitations of 64 bits, so it did not work out well enough. The same applies to the BIT data type and imposes a 64-bit limit.

My idea of ​​converting to numerical form is basically that an 8 byte BIGINT stores 19 digits, so if each digit indicates a character in the character set of all possible characters, then it can store 19 characters in 8 bytes if all characters vary from 1- 10, but, as in the real world, there are 52 characters of the English language and 10 digits plus several characters, so it has about 100 characters. Thus, in the worst case, BIGINT can still point to 6 characters, and yes, this is not the final verdict, it still needs some training to know exactly what each digit indicates, is it 10+ or ​​more than 30 digits or 80+ digits, but you have got pretty much the idea of ​​what I'm thinking.

, , url , URL-, .

, smaz huffman compression algo, , , .

, , , varchars .

+3
2

128- , (16) 16 - . 64 (512 ), , . BIT, .

, URL- , , URL-, , AZ az 0-9 , 62 X 62 X 62.

, , URL .

+2

- . , (http, https, ftp - ), , , "wwww", , , ".com", ". org", ".edu" - . , , , .

URL- , , , - ( , ). , URL-, , URL- , . , URL-, (, "http://stackoverflow.com/questions" ). , , . , , , , .

+4

All Articles