Create random strings

I want to generate random lines as follows: ABCDE1234E , that is, each line contains 5 characters, 4 numbers, then 1 Char.

I figured out a way to create this using the following code.

 library(random) string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE, loweralpha=FALSE, unique=TRUE, check=TRUE)) number_4 <- as.vector(randomNumbers(n=5000, min=1111, max=9999, col=5, base=10, check=TRUE)) string_1 <- as.vector(randomStrings(n=5000, len=1, digits=FALSE, upperalpha=TRUE, loweralpha=FALSE, unique=FALSE, check=TRUE)) PAN.Number <- paste(string_5,number_4,string_1,sep = "") 

But these functions take a lot of time, and the random library requires a network connection.

 > system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE, + loweralpha=FALSE, unique=TRUE, check=TRUE))) user system elapsed 0.07 0.00 3.18 

Is there any method that I could try to shorten the execution time? I also tried using sample() , but I could not figure it out.

+7
random r
source share
4 answers

Using "stringi" as suggested by @akrun will be faster, but the following is also very fast and does not require additional packages:

 myFun <- function(n = 5000) { a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE)) } 

Output Example:

 myFun(10) ## [1] "BZHOF3737P" "EPOWI0674X" "YYWEB2825M" "HQIXJ5187K" "IYIMB2578R" ## [6] "YSGBG6609I" "OBLBL6409Q" "PUMAL5632D" "ABRAT4481L" "FNVEN7870Q" 
+15
source share

We can use stri_rand_strings from stringi

 library(stringi) sprintf("%s%s%s", stri_rand_strings(5, 5, '[AZ]'), stri_rand_strings(5, 4, '[0-9]'), stri_rand_strings(5, 1, '[AZ]')) 

Or more compact

 do.call(paste0, Map(stri_rand_strings, n=5, length=c(5, 4, 1), pattern = c('[AZ]', '[0-9]', '[AZ]'))) 

Benchmarks

 system.time({ do.call(paste0, Map(stri_rand_strings, n=5000, length=c(5, 4, 1), pattern = c('[AZ]', '[0-9]', '[AZ]'))) }) # user system elapsed # 0 0 0 

Ability to play timings even for one part of the expected output using the OP method

 system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE, loweralpha=FALSE, unique=TRUE, check=TRUE))) # user system elapsed # 0.86 0.24 5.52 
+10
source share

You can directly do what you want: Example of random 5 uppercase letters Example 4 digits Example 1 random capital letter

 digits = 0:9 createRandString<- function() { v = c(sample(LETTERS, 5, replace = TRUE), sample(digits, 4, replace = TRUE), sample(LETTERS, 1, replace = TRUE)) return(paste0(v,collapse = "")) } 

This will be more easily controlled and will not take much time.

+3
source share

The performance problem arises due to the use of the random package in the first place: it is clear that you can find the random::randomStrings() function in the Internet search and consider it a good way to generate random strings for use in the program, but the random package is not intended for general purpose programming . It works by querying the RANDOM.ORG server, which is inherently slower than the R built-in pseudo-random number generators.

From one of the vignettes from a random package :

There are a number of situations in which it is desirable to use non-deterministic random numbers. Examples include - distribute distributed computing on different nodes with truly independent seeds;
- Receive portable initializations for RNGs that are independent of a specific operating system or hardware features.
- check the simulation results using non-deterministic random numbers;
- provide indeterminate seeds used for lottery games or games ...

Note that most of these examples focus on seed or initialization (these are synonyms) of the R built-in pseudo-random number generators, rather than replacing them ...

+2
source share

All Articles