Extremely Slow Random String Generator

I came up with the code below to generate 10,0001 random strings. Lines must be unique. However, the code below takes several hours to complete this work. Can someone let me know how I can optimize it and why is it so slow?

string getRandomString(int length) { static string charset = "abcdefghijklmnopqrstuvwxyz"; string result; result.resize(length); for (int i = 0; i < length; i++) { result[i] = charset[rand() % charset.length()]; } return result; } void main(){ srand(time(NULL)); vector<string> storeUnigrams; int numUnigram = 100001; string temp = ""; int minLen = 3; int maxLen = 26; int range = maxLen - minLen + 1; int i =0; while(i < numUnigram){ int lenOfRanString = rand()%range + minLen; temp = getRandomString(lenOfRanString); bool doesithave = false; for(int j =0 ; j < storeUnigrams.size() ; j++){ if(temp.compare(storeUnigrams[j]) == 0){ doesithave = true; break; } if(temp.compare(storeUnigrams[j]) < 0){ break; } } if(!doesithave){ storeUnigrams.push_back(temp); sort(storeUnigrams.begin(),storeUnigrams.end()); i++; } } 
+4
source share
5 answers

There are two factors that slow down your code:

  • Linearly checking if string exists - O (n)
  • Sorting a vector in each iteration - O (n log n)

Use for example. a set for storing strings - it is sorted automatically, and checking for existence is quick:

 int main(){ srand(time(NULL)); set<string> storeUnigrams; int numUnigram = 100001; int minLen = 3; int maxLen = 26; int range = maxLen - minLen + 1; while(storeUnigrams.size() < numUnigram){ int lenOfRanString = rand()%range + minLen; storeUnigrams.insert(getRandomString(lenOfRanString)); } } 
+9
source

Philip's answer is fine. Another approach would be to use a Self-balancing Binary Search Tree , such as Red Black Tree instead of Vector. You can search and insert in log (n) time. If the search is empty, insert an item.

0
source

This code generates a unique random number only once and stores it in random_once[i] .

In the first for loop, an ad is created in which a random number is stored.

The second for loop is used to get pre-rendered random numbers stored in the random_once[i] array.

Yes, generating 100001 random numbers will take several hours, if not days.

 #include <ctime> #include <iostream> using namespace std; int main() { int numUnigram = 3001; int size=numUnigram; int random_once[100001]; cout<<"Please wait: Generatng "<<numUnigram<<" random numbers "; std::cout << '-' << std::flush; srand(time(0)); for (int i=0;i<size;i++) { //This code generates a unique random number only once //and stores it in random_once[i] random_once[i]=rand() % size; for(int j=0;j<i;j++) if (random_once[j]==random_once[i]) i--; //loading animation std::cout << "\b\\" << std::flush; std::cout << "\b|" << std::flush; std::cout << "\b/" << std::flush; std::cout << "\b-" << std::flush; } cout<<" \n"; // this code dispays unique random numbers stored in random_once[i] for ( i=0;i<size;i++) cout<<" "<<random_once[i]<<"\t"; cout<<" \n"; return 0; } 
0
source

Define variables outside the while loop - because they are redefined at each iteration

 int lenOfRanString = rand()%range + minLen; ; bool doesithave = false; 

Update

Thought that he advised in many books, in practice with all new compilers this will not significantly improve performance

-1
source

Use char arrays instead of strings (a string class does a lot of things behind the scenes)

-2
source

All Articles