The fastest way to do a string search?

Assuming we have a certain number of possible lines:

possible_strings_list = ['foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs'] 

and get new lines, which are known to be one of them. We want to assign an integer to each new line, for example

 if new_string == 'foo': return 0 elif new_string == 'bar': return 1 ... 

What is the fastest way to do this in Python 3.6? I tried several ways and using the dictionary is the fastest:

 list_index 2.7494255019701086 dictionary 0.9412809460191056 if_elif_else 2.10705983400112 lambda_function 2.6321219780365936 tupple_index 2.751029207953252 ternary 1.931659944995772 np_where 15.610908019007184 

However, I am more or less new to Python, and I am wondering if there are other and faster solutions. Do you have any suggestions?

My full testig code:

 import timeit import random import numpy as np def list_index(i): return(possible_strings_list.index(i)) def dictionary(i): return possible_strings_dict[i] def tupple_index(i): return possible_strings_tup.index(i) def if_elif_else(i): if i == 'foo': return 1 elif i == 'bar': return 2 elif i == 'baz': return 3 elif i == 'qux': return 4 elif i == 'spam': return 5 elif i == 'ham': return 6 elif i == 'eggs': return 7 def ternary(i): return 0 if i == 'foo' else 1 if i == 'baz' else 2 if i == 'bar' else 3 if i == 'qux' else 4 if i == 'spam'else 5 if i == 'ham' else 6 n = lambda i: 0 if i == 'foo' else 1 if i == 'baz' else 2 if i == 'bar' else 3 if i == 'qux' else 4 if i == 'spam'else 5 if i == 'ham' else 6 def lambda_function(i): return n(i) def np_where(i): return np.where(possible_strings_array == i)[0][0] ## def check(function): for i in testlist: function(i) possible_strings_list = ['foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs'] testlist = [random.choice(possible_strings_list) for i in range(1000)] possible_strings_dict = {'foo':0, 'bar':1, 'baz':2, 'qux':3, 'spam':4, 'ham':5, 'eggs':6} possible_strings_tup = ('foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs') allfunctions = [list_index, dictionary, if_elif_else, lambda_function, tupple_index, ternary, np_where] for function in allfunctions: t = timeit.Timer(lambda: check(function)) print(function.__name__, t.timeit(number=10000)) 
+8
optimization python string
source share
1 answer

Dictionary search is the fastest way to perform this search. With this analysis, you usually compare the “Time Complexity” for each process.

To search for a dictionary, the time complexity is “constant time,” or O (1). Although this may mean that this is usually an integer value of the steps that the algorithm can perform, there is literally one in this case.

Other methods will require iteration (or in the case of an elses traversal), which is essentially a similar approach. They will vary from having to look at all the values ​​of O (n), to look at some values, O (log n).

Since n is the size of the test set, and as the set becomes larger, the variance of the results will also be, while the dictionary will consistently surpass the other options shown.

There is no possible way to be faster than O (1). The only drawback of the approach that you showed is that increasing the set may require more memory, this is called the saving complexity of the algorithm. However, in this case, since we need only one value for the element in the set, the spatial complexity will be O (n), which is negligible.

In the general sense of optimizations, it is important to consider what complexity is present in the current solution, and how important it is to improve this complexity. If improvements are made, they should be aimed at achieving different levels of performance, for example, from O (n) to O (log n) or O (log n) to O (1).

Time complexity comparison
Image courtesy: http://bigocheatsheet.com/

Microoptimizations tend to occur when optimizations are performed at the same level of complexity and at the same level of complexity, and they are often not constructive on their own.

0
source share

All Articles