Python data structure: SQL, XML, or .py file

What is the best way to store large amounts of data in python, given one (or two) 500,000 words + dictionary used to search for an undirected graph?

I considered several parameters such as storing data in XML format:

<key name="a"> <value data="1" /> <value data="2" /> </key> <key name="b"> ... 

or in a python file for direct access:

 db = {"a": [1, 2], "b": ...} 

or in a SQL database? I think this would be a better solution, but do I need to rely more on SQL for computation than python itself?

+4
source share
6 answers

The Python source technique is absolutely correct.

XML is slowly parsed and relatively hard to read by people. That's why companies like Altova work in the business โ€” XML doesn't like editing.

Python source db = {"a": [1, 2], "b": ...} -

  • Quick analysis.

  • Easy to read by people.

If you have programs that read and write giant dictionaries, use pprint to write to get good formatted output. Something easier to read.

If you're worried about portability, consider YAML (or JSON) to serialize the object. They are also quickly parsed and much easier to read than XML.

+6
source

I would consider using one of the many graph libraries available for python (e.g. python-graph )

+2
source

You need to clarify the problem a bit. I will make several assumptions: 1) your data is static, and you just want to look for it, 2) you have enough memory to store it.

If the applicationโ€™s launch speed is not critical, the data format is up to you, just as you can get it in Python memory. Use simple data types (dicts, lists, strings) to store data, not XML graphics, if you want to quickly access it. You might consider writing your own lightweight class to express nodes and store references to other nodes in a dict or array.

If application startup time is critical, consider loading your data into a Python program and etching it into a file; you can then load the structured data structure (which should be very fast) in the production application.

If, on the other hand, your data is too large to fit in memory or you want to constantly modify it, you can use SQL for storage (external server or SQLite database) or ZODB (Python object database).

+1
source

If you store your data in an XML file, it will be easier to modify it (i.e. use notepad ...), but you should take into account that reading and analyzing all this data volume from an XML file is difficult. Using an SQL database (possibly PostGres) will make the choice more experienced; DMBS is more optimized than direct reading / parsing of the file system. If you save all your data in some Python structure in a separate file, you can take advantage of the bytecode (.pyc), which does not give impetus to the computational terms, but allows you to speed up the loading (this is exactly what you want). I would choose the latter.

0
source

XML is really tree-oriented and very verbose. You can look at RDF for ways to describe a graph in XML, but it still has other drawbacks, for example. time to read, parse and instantiate 500k + and the amount of file space used.

SQL really focuses on the description of rows in tables. Of course, you can store graphics, but you will also see a penalty for performance.

I would try to poison the python first to see if it fits your needs. It will probably be the most compact and fastest for reading and instantiating all objects.

Indeed, the only reason for using other formats is that you need something that they offer, for example. transactions in SQL or cross-language XML processing.

0
source

The approach to a python file will undoubtedly be the fastest if you have a way to save the file.

0
source

All Articles