Need help with an algorithm

I need help with the algorithm. I randomly generated 6-digit numbers. How;

123654 109431

There are about 1 million of them stored in a file line by line. I have to filter them according to the rule that I am trying to describe below.

Take a number, compare it with all other digits by digit. If a number appears in the number with a value greater than one compared number, delete it. Let me show it using numbers.

Our number: 123456 Increase the first digit with 1, so the number becomes: 223456. Delete all files from the file 223456. Increase the second digit by 1, the number becomes: 133456. Delete all 133456s from the file, and so on ...

I can do it the way I describe, but I need it to be "FAST".

So can anyone help me with this?

Thanks.

+6
c # algorithm
source share
10 answers

This algorithm will store many numbers in memory, but it will process the file one number at a time, so you do not need to read all this at all. You only need to put an IEnumerable<int> for it to work.

  public static IEnumerable<int> FilterInts(IEnumerable<int> ints) { var removed = new HashSet<int>(); foreach (var i in ints) { var iStr = i.ToString("000000").ToCharArray(); for (int j = 0; j < iStr.Length; j++) { var c = iStr[j]; if (c == '9') iStr[j] = '0'; else iStr[j] = (char)(c + 1); removed.Add(int.Parse(new string(iStr))); iStr[j] = c; } if (!removed.Contains(i)) yield return i; } } 

You can use this method to create an IEnumerable<int> from a file:

  public static IEnumerable<int> ReadIntsFrom(string path) { using (var reader = File.OpenText(path)) { string line; while ((line = reader.ReadLine()) != null) yield return int.Parse(line); } } 
+1
source share

First of all, since it is about 1 million, you better run the algorithm in RAM rather than on the disk, that is, first load the contents into an array and then modify the array and then paste the results back into the file.

I would suggest the following algorithm - simple. Pre-calculate all the target numbers, in this case 223456, 133456, 124456, 123556, 123466, 123457. Now pass the array, and if the number is not one of them, write it to another array. Alternatively, if this is one of these numbers, delete it (recommended if your data structure has O (1) delete)

+5
source share

Take all the numbers from the file into an array of List, and then:

take the number of threads as the number of digits

add the first digit to the number in the first stream, the second in the second stream, and then compare it with the rest of the numbers,

It will be fast, as it will go hand in hand with processing ...

+1
source share

All sentences (so far) require six comparisons per input string, which is optional. Numbers come in as strings, so use string comparisons.

Start with @Armen Tsirunyan's idea:

Pre-calculate all target numbers, in this case 223456, 133456, 124456, 123556, 123466, 123457.

But instead of single comparisons, do this in a line:

  string arg = "223456 133456 124456 123556 123466 123457"; 

Then read the input (either from a file or into memory). Pseudocode:

  foreach (string s in theBigListOfNumbers) if (arg.indexOf(s) == -1) print s; 

This is just one comparison for each line of input, without dictionaries, maps, iterators, etc.

Edited to add:

On processors with the x86 instruction set (and not just on the Intel brand), finding a substring like this is very fast. For example, finding a character inside a string is just one machine instruction.

I have to ask others to weigh on alternative architectures.

+1
source share

First, I would just read all the numbers in the array.

When you're finally done, rewrite the file.

0
source share

It seems that the rule you are describing is for the target number abdcef that you want to find for all numbers containing + 1, b + 1, c + 1, d + 1, e + 1 or f + 1 in the appropriate place. You can do this in O (n) by going through the lines in the file and comparing each of the six digits with the digit of the target number, if the numbers do not match, write the number in the output file.

0
source share

This sounds like a potential case for a multidimensional array and possibly also insecure C # code so that you can use pointer math to iterate through so many numbers.

I would have to delve further into it, but I would also probably use the Dictionary for non-linear searches if you are comparing numbers that are not in sequence.

0
source share

Read all your numbers from the file and save them on the map, where the number is the key and the logical one is the value, meaning that this value has not been deleted. (Truth exists, false means are removed).

Then iterate over your keys. For each key, set the card to false for the values ​​that you remove from the list.

Repeat your list again and get all the keys where the value is true. This is a list of the remaining numbers.

 public List<int> FilterNumbers(string fileName) { StreamReader sr = File.OpenTest(fileName); string s = ""; Dictionary<int, bool> numbers = new Dictionary<int, bool>(); while((s = sr.ReadLine()) != null) { int number = Int32.Parse(s); numbers.Add(number,true); } foreach(int number in numbers.Keys) { if(numbers[number]) { if(numbers.ContainsKey(100000+number)) numbers[100000+number]=false; if(numbers.ContainsKey(10000+number)) numbers[10000+number]=false; if(numbers.ContainsKey(1000+number)) numbers[1000+number]=false; if(numbers.ContainsKey(100+number)) numbers[100+number]=false; if(numbers.ContainsKey(10+number)) numbers[10+number]=false; if(numbers.ContainsKey(1+number)) numbers[1+number]=false; } } List<int> validNumbers = new List<int>(); foreach(int number in numbers.Keys) { validNumbers.Add(number); } return validNumbers; } 

Perhaps this needs to be tested since I do not have a C # compiler on this computer and I am a bit rusty. The algorithm takes a bit of memory bit, which will run in linear mode.

** EDIT ** This leads to problems when one of the numbers is 9. I will update the code later.

0
source share

How about this one. You process numbers one by one. Numbers will be stored in the NumbersOK and NumbersNotOK hash tables.

  • Take one number
  • If it is not in NumbersNotOK , put it in the NumbersOK hash
  • Get deviations from the unit numbers in the hash from it - NumbersNotOK .
  • Remove all NumbersOK members if they match any of the variances.
  • Repeat from 1 to end of file
  • Save NumbersOK to a file.

This way, you will only list once. The hash table is intended only for such purposes, and it will be very fast (there are no expensive comparison methods).

This algorithm is not fully implemented, since it is not processed when certain numbers are repeated, but it can be processed with some tuning ...

0
source share

There is still a question about homework ... the fastest sort for a million numbers will be n log (n), which is 1,000,000log (1,000,000), which is 6 * 1,000,000, which is the same as matching 6 numbers with each of the millions of numbers. Therefore, direct comparisons will be faster than sorting and deleting, because after sorting you still have to compare with deleting. Unless, of course, my calculations completely missed the target.

Something else comes to mind. When you pick up the number, read it as hex, not base 10. then maybe some bitwise operators might help you somehow. Still thinking about what can be done with this. Will be updated if it works.

EDIT: We are currently thinking about lines of gray code. 123456 (our original number) and 223456 or 133456 will be disabled by only one digit, and a converter with a gray code will quickly catch it. It's late night here, so if someone finds it helpful and can give a solution ...

0
source share

All Articles