Many-to-Many Data Structure in Python

Question

Many-to-Many Data Structure in Python

I have a set of books and authors, with many relationships.

There are about 10 ^ 6 books and 10 ^ 5 authors, an average of 10 authors per book.

I need to perform a series of operations on a data set, for example, count the number of books by each author or remove all books of a certain author from the set.

What would be a good data structure to allow fast processing?

I hope for some ready-made module that can provide methods in accordance with:

obj.books.add(book1) # linking obj.books[n].author = author1 obj.authors[m].author = book1 # deleting obj.remove(author1) # should automatically remove all links to the books by author1, but not the linked books

I must clarify that I prefer not to use the database for this, but to do all this in memory.

thanks

+6

python data-structures many-to-many

Gj. Aug 21 '10 at 17:27

source share

2 answers

I hope for some ready-made module that can provide methods in accordance with:

So how does it work, what else do you need?

You have a class definition for the book and the author. You also have a Book-Author association for relationships. The methods needed to control add / change / delete are just a few lines of code.

Create large old dictionaries of author objects, books, and copyright books.

Use shelve to save everything.

Done.

+2

S. Lott Aug 21 '10 at 18:10

source share

Alex martelli · Accepted Answer · 2010-08-21T17:33:54+0000

sqlite3 (or any other good relational database, but sqlite comes with Python and is more convenient for such a small enough data set) seems to be the right approach for your task. If you prefer not to learn SQL, SQLAlchemy is a popular “wrapper” over relational databases, so to speak, which allows you to deal with them at any of several different levels of abstraction of your choice.

And "doing all this in memory" is not a problem at all (this is stupid , mind you, since you will uselessly pay the overhead for reading all the data from somewhere more persistent for each and every launch of your program, saving the database to disk will save this overhead to you - but that's another problem ;-). Just open your sqlite database as ':memory:' , and there you are - a new new relational database that lives entirely in memory (only for the duration of your process), not a single disk is involved in the procedure at all , so why not? -)

Personally, I would use SQL directly for this task - it gives me excellent control over what is happening, and easily allows adding or removing indexes for tuning performance, etc. You would use three tables: a Books (primary key identifier, other fields such as Title & c), Authors table (primary key identifier, other fields such as Name & c), and a many-to-many relationship table, say BookAuthors , with just two BookID and AuthorID , and one entry for each author connection.

The two fields of the BookAuthors table BookAuthors called “foreign keys”, referring to the fields of the identifiers of books and authors, respectively, and you can define them using ON DELETE CASCADE so that the entries refer to the book or the author who is deleted is automatically discarded one by one - an example of a high semantic level, on which even bare SQL allows you to work, and no other existing data structure can come close to matching.

Many-to-Many Data Structure in Python

More articles: