Finding an array string with a binary search substring

Question

Finding an array string with a binary search substring

I have a .txt file containing about 200,000 entries.

The format of each entry is 123456-99-Text. 123456 are unique account numbers, 99 is the location code I need (it changes from 01 to 99), and the text does not matter. These account numbers are sorted in order and with a line break in the file for each AC (111111, 111112, 111113, etc.).

I made a visual studio text panel and a search button so that someone would look for an account number. The account number is actually 11 digits, but only the first 6 questions. I wrote this as a string actnum = textbox1.text.substring(0,6)

I wrote a foreach (string x in file.readline('file.txt')) with an if (x.contains(actnum)) then string code = x.substring(8,2)) .

The program works well, but since there are so many entries, if someone is looking for an account number that does not exist, or a number at the bottom of the list, the program blocks for 10 seconds before moving to the “number” no “else statement” is found or forever find this last entry.

My question is:

Reading about binary searches I tried to try one without much success. I can't seem to get an array or file to act like a legit binary search. Is there a way to take the 6-digit actnum from textbox1, compare it with the substring of the array with the 6-digit account number, and then grab the substring 99 code from this particular string?

Binary search will help a lot! I could take 555-555 and compare it with the top or bottom half of the recording file, and then continue searching until I get the desired line, grab the entire line, and then fine-tune 99. The problem is that I cannot get the correct integer conversion of the file, because it contains both numbers AND text, so I can not use <,>, = signs correctly.

Any help on this would be greatly appreciated. The program that I am actually working on now is incredibly slow from time to time.

+7

arrays c # binary-search

Shoto Dec 15 '14 at 20:22

source share

4 answers

As one of the possible solutions (not necessarily the best), you can add your record identifiers to Dictionary<string, int> (or even Dictionary<long, int> if all record identifiers are numeric), where each key is an identifier of one line, and each value is the row index. When you need to find a specific entry, just look in the dictionary (it will be an effective search for you) and give you the line number. If an element is missing (non-existent identifier), you will not find it in the dictionary.

At this point, if the record identifier exists in the file, you have a line number - you can either load the entire file into memory (if it is not too large), or simply search for the desired line and read in the data line.

To do this, you need to go through at least one file and collect all the record identifiers from all the lines and add them to the dictionary. You will not need to perform a binary search - the dictionary will internally perform a search for you.

Edit

If you don’t need all the data from a specific line, just one bit (for example, the location code you specified), you don’t even need to store the line number (since you do not need to return to the line in the file) - just save the location data as a value in dictionary.

I personally would keep the line index, because, in my experience, such projects start small, but ultimately collect functions, and there will be a point where you will need to have everything in the file. If you expect this to happen over time, just analyze the data from each row in the data structure and save it in a dictionary - it will simplify your future life. If you are sure that you no longer need more data than one bit of information, you can simply copy the data into the dictionary.

Here is a simple example (assuming your post IDs can be parsed in long ):

 public class LineData { public int LineIndex { get; set; } public string LocationCode { get; set; } // other data from the line that you need } // ... // declare your map private Dictionary<long, LineData> _dataMap = new Dictionary<long, LineData> (); // ... // Read file, parse lines into LineData objects and put them in dictionary // ...

To find out if a record identifier exists, you simply call TryGetValue() :

 LineData lineData; if ( _dataMap.TryGetValue ( recordID, out lineData ) ) { // record ID was found }

This approach essentially saves the entire file in memory, but all data is analyzed only once (at the beginning, when building the dictionary). If this approach uses too much memory, just save the line index in the dictionary, and then go back to the file if you find the entry and parse the line on the fly.

+5

xxbbcc Dec 15 '14 at 20:28

source share

Assuming the file does not change often, you can simply load the entire file into memory using a structure that processes the search faster. If the file can change, you will need to decide the mechanism for reloading the file, whether it is restarting the program or a more complex process.

It looks like you are looking for exact matches (a 123456 search yields only one entry, designated 123456). If so, you can use Dictionary . Please note that in order to use the Dictionary you need to define the types of keys and values. It looks like in your case they will be string .

+1

Guvante Dec 15 '14 at 20:28

source share

You cannot perform a binary search in file.ReadLine because you must have access to the strings in a different order. Instead, you should read the entire file in memory (the .ReadAllLines file would be an option)

Assuming your file is sorted by substring, you can create a new class that implements IComparer

 public class SubstringComparer : IComparer<string> { public int Compare(string x, string y) { return x.Substring(0, 6).CompareTo(y.Substring(0, 6)); } }

and then your binary search will look like this:

 int returnedValue = foundStrings.BinarySearch(searchValue, new SubstringComparer());

+1

jtm001 Dec 15 '14 at 21:10

source share

Shoto · Accepted Answer · 2015-01-06T18:23:49+0000

Until I found a way to do a better type of search, I managed to find out about the embedded resources, which greatly accelerated the program. Scanning the entire file takes a split second, not 5-10 seconds. Posting the following code:

  string searchfor = textBox1.Text Assembly assm = Assembly.GetExecutingAssembly(); using (Stream datastream = assm.GetManifestResourceStream("WindowsFormsApplication2.Resources.file1.txt")) using (StreamReader reader = new StreamReader(datastream)) { string lines; while ((lines = reader.ReadLine()) != null) { if (lines.StartsWith(searchfor)) { label1.Text = "Found"; break; } else { label1.Text = "Not found"; } } }

Finding an array string with a binary search substring

More articles: