To avoid this in O(n^2) , you need to create an index for each query that you want to make. After the machine will request any value for a constant time, your O(n^2) turns into O(n) trivially. And you can also build all indexes in O(n) .
Assuming each of your values has the same fields, it will look like this:
indices = defaultdict(lambda: defaultdict(set)) for i, row in enumerate(data): for field in 'id', 'name', 'price', 'url': key = row[field] indices[field][key].add(i)
Now, to find a specific value, it is simple:
def search(field, key): return (data[index] for index in indices[field][key])
To find a group of or ed values together, simply search individually and set.union them together, for example:
def search_disj(factors): sets = (indices[field][key] for field, key in factors) return (data[index] for index in reduce(set.union, sets))
And to find the disjunction group and ed together, do the same for each, and then set.intersection all the results together.
Depending on your data, it may be more efficient to simply look at the first index and then linearly search for results for other factors. You can optimize this by changing the fields so that you first look for the smallest len(indices[field]) . (Or, in this case, with the smallest sum (len (indices [field]) for the field in disj).)
If you can arbitrarily embed conjunctions of disjunctions of conjunctions ... until you move to single elements, you simply call either other mutually recursive ones (with a base case for flat elements). You can even expand it to a completely logical search (although you will also need the not - universe - indices[field][key] operation, where universe = set(range(len(data))) is for this).
If the data is very large, you cannot store all indexes in memory.
Or, even if you can store all indexes in memory, caching and even page skipping can make the hash table less ideal, in which case you probably want to consider something based on the B-tree (e.g. blist.sorteddict ) instead of dict. It also gives you the advantage that you can search for ranges of values, order results, etc. The disadvantage is that all those n times become n log n , but if you need functionality or you get two orders of magnitude - profitable terrain in exchange for the cost of log(n, base) , which turns out to be only 7, it may be worth it.
Or, conversely, use some kind of disk storage with disk support, for example anydbm .
However, really, what you are building is a relational database with a single relation (table). In many cases, you would be better off using only off-the-shelf relational databases such as sqlite3 , which Python comes with built-in. Then the code to create the index is as follows:
db.execute('CREATE INDEX id_idx ON data (id)')
... and you can just make queries, and they magically use the right indexes in the best way:
curs = db.execute('SELECT * FROM data WHERE name = ? AND (price = ? OR url = ?)', filters)