What is the most efficient algorithm for finding indices of multiple characters within a string (javascript)?

I am looking for the fastest method that I can use to search text for text for multiple character indices.

For instance:

searchString = 'abcdefabcdef'; searchChars = ['a','b']; // returns {'a':[0,6], 'b':[1,7]} 
+4
source share
3 answers

After selecting several single-pass algorithms and Guffa regex, I ended up with this:

 function findIndexesMultiPass(str,find) { var x, output = {}; for (var i = 0; i < find.length; i++) { output[find[i]] = []; x = 0; while ((x = str.indexOf(find[i], x)) > -1) { output[find[i]].push(x++); } } return output; } var searchString = "abcd abcd abcd"; var searchChars = ['a', 'b']; var result = findIndexesMultiPass(searchString, searchChars); // {'a':[0,5,10], 'b':[1,6,11]} 

This turned out to be pretty slow:

 function findIndexesOnePass(str,find) { var output = {}; for (var i = 0; i < find.length; i++) { output[find[i]] = []; } for (var i = 0; i < str.length; i++) { var currentChar = str.charAt(i); if (output[currentChar] !== undefined) { output[currentChar].push(i); } } return output; } var searchString = "abcd abcd abcd"; var searchChars = ['a', 'b']; var result = findIndexesOnePass(searchString, searchChars); // {'a':[0,5,10], 'b':[1,6,11]} 

Rough times (indices of 3 characters)

 Google Chrome (Mac) findIndexesMultiPass: 44ms findIndexesOnePass: 799ms findIndexesRegEx: 95ms Safari findIndexesMultiPass: 48ms findIndexesOnePass: 325ms findIndexesRegEx: 293ms Firefox findIndexesMultiPass: 56ms findIndexesOnePass: 369ms findIndexesRegEx: 786ms 
0
source

You should be able to use regular expression to search for all events of each character. Sort of:

 function findIndexes(find, str) { var output = {}; for (var i = 0; i < find.length; i++) { var m = []; var r = new RegExp('.*?' + find[i], 'g'); var ofs = -1; while ((x = r.exec(str)) != null) { ofs += x[0].length; m.push(ofs); } output[find[i]] = m; } return output; } 

Edit:

Made some changes and now it works. :) However, since Javascript does not have a match method to get all matches at once, this is not really an improvement using indexOf ...: P

Edit 2:

However, you can use the regular expression to search for any of the characters, so you only need to loop the string once, and not once for each character. :)

 function findIndexes(find, str) { var output = {}; for (var i = 0; i < find.length; i++) output[find[i]] = []; var r = new RegExp('.*?[' + find.join('') + ']', 'g'); var ofs = -1; while ((x = r.exec(str)) != null) { ofs += x[0].length; output[x[0].substr(x[0].length-1,1)].push(ofs); } return output; } 
+2
source

Assuming that a search requires several letters to search and many letters to search (i.e. a small number of letters, long lines), the latter is most effective as you go through only one line once and then check each letter.

The other goes through the line as many times as there are letters to search.

0
source

All Articles