Store a large dictionary in a Python file

I have a dictionary with many entries and a huge vector as meanings. These vectors can be 60,000 dimensions, and I have about 60,000 entries in the dictionary. To save time, I want to save this after calculation. However, the use of brine led to a huge file. I tried to save JSON, but the file remains very large (for example, 10.5 MB on a sample of 50 records with smaller sizes). I also read about sparse matrices. Since most entries will be 0, this is a possibility. Will it reduce the file size? Is there any other way to save this information? Or am I just out of luck?

Update:

Thanks to everyone for the answers. I want to save this data, as this is the number of words. For example, for given sentences, I save the number of words that the word 0 (at position 0 in the array) appears in the sentence. There are clearly more words in all sentences than in one sentence, hence the set of zeros. Then I want to use this array to collect at least three, maybe six classifiers. It seemed easier to create arrays with words, and then run classifiers overnight for training and testing. For this I use sklearn. This format was chosen so as to be compatible with other formats of vector objects, that's why I approach the problem. If this is not the way, in this case, please let me know. I know very well that I am very good at learning coding!

I also started introducing sparse matrices. Now the file is even larger (testing with a set of 300 sentences).

Update 2: Thanks everyone for the tips. John Mi was right, not doing data storage. Both he and Mike McKearns told me to use sparse matrices, which greatly accelerated the calculations! So thanks for your input. Now I have a new tool in my arsenal!

+4
source share
2 answers

See my answer to a very close question fooobar.com/questions/574189 / ... if you are okay with pickling multiple files instead of a single file.

: fooobar.com/questions/1571933/... , : fooobar.com/questions/1571934/....

numpy, , klepto, joblib , array. , , , ... .

, klepto - , API. klepto (pickle, json ..) - HDF5 . (, numpy), ( , ).

klepto "--" "--", - , / .

0

60 000 60 000 ? , 1..10, , Python array.array 1 ( 'B').

60 000 x 60 000 , 3,35 .

.

0

All Articles