R Algorithm for generating all possible factorization of numbers

For example, consider number 96. It can be written in the following ways:

1. 96 2. 48 * 2 3. 24 * 2 * 2 4. 12 * 2 * 2 * 2 5. 6 * 2 * 2 * 2 * 2 6. 3 * 2 * 2 * 2 * 2 * 2 7. 4 * 3 * 2 * 2 * 2 8. 8 * 3 * 2 * 2 9. 6 * 4 * 2 * 2 10. 16 * 3 * 2 11. 4 * 4 * 3 * 2 12. 12 * 4 * 2 13. 8 * 6 * 2 14. 32 * 3 15. 8 * 4 * 3 16. 24 * 4 17. 6 * 4 * 4 18. 16 * 6 19. 12 * 8 

I know this is related to partitions, since any number written as cardinality, n , of one prime, p , is just the number of ways you can write <i> n. For example, to find all factorizations 2 ^ 5, we have to find all the ways of recording 5. They are:

  • 1 + 1 + 1 + 1 + 1 == β†’ 2 ^ 1 * 2 ^ 1 * 2 ^ 1 * 2 ^ 1 * 2 ^ 1
  • 1 + 1 + 1 + 2 == β†’ 2 ^ 1 * 2 ^ 1 * 2 ^ 1 * 2 ^ 2
  • 1 + 1 + 3 == β†’ 2 ^ 1 * 2 ^ 1 * 2 ^ 3
  • 1 + 2 + 2 == β†’ 2 ^ 1 * 2 ^ 2 * 2 ^ 2
  • 1 + 4 == β†’ 2 ^ 1 * 2 ^ 4
  • 2 + 3 == β†’ 2 ^ 2 * 2 ^ 3
  • 5 == β†’ 2 ^ 5

I found a great article by Jerome Keller on partition generation algorithms here . I adapted one of its python algorithms to R. The code below:

 library(partitions) ## using P(n) to determine number of partitions of an integer IntegerPartitions <- function(n) { a <- 0L:n k <- 2L a[2L] <- n MyParts <- vector("list", length=P(n)) count <- 0L while (!(k==1L)) { x <- a[k-1L]+1L y <- a[k]-1L k <- k-1L while (x<=y) {a[k] <- x; y <- yx; k <- k+1L} a[k] <- x+y count <- count+1L MyParts[[count]] <- a[1L:k] } MyParts } 

I tried to extend this method to numbers with more than one simple factor, but my code became very awkward. After a long struggle with this idea, I decided to try a different route. My new algorithm does not use partition generation at all. It is rather a β€œreverse lookup” algorithm that takes advantage of factorizations that have already been generated. Code below:

 FactorRepresentations <- function(n) { MyFacts <- EfficientFactorList(n) MyReps <- lapply(1:n, function(x) x) for (k in 4:n) { if (isprime(k)) {next} myset <- MyFacts[[k]] mylist <- vector("list") mylist[[1]] <- k count <- 1L for (j in 2:ceiling(length(myset)/2)) { count <- count+1L temp <- as.integer(k/myset[j]) myvec <- sort(c(myset[j], temp), decreasing=TRUE) mylist[[count]] <- myvec MyTempRep <- MyReps[[temp]] if (isprime(temp) || temp==k) {next} if (length(MyTempRep)>1) { for (i in 1:length(MyTempRep)) { count <- count+1L myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE) mylist[[count]] <- myvec } } } MyReps[[k]] <- unique(mylist) } MyReps } 

The first function in the code above is just a function that generates all the factors. Here is the code if you're interested:

 EfficientFactorList <- function(n) { MyFactsList <- lapply(1:n, function(x) 1) for (j in 2:n) { for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)} } MyFactsList } 

My algorithm is fine if you are only interested in numbers less than 10,000 (it generates all factorizations for each number <= 10,000 in about 17 seconds), but it definitely doesn't scale. I would like to find an algorithm that has the same prerequisite for generating a list of all factorizations for each number less than or equal to n , since some of the applications that I mean will refer to this factor once, so the list should be faster than generating it on the fly every time (I know that memory is here).

+7
algorithm r factorization
source share
2 answers

Your EfficientFactorList function does a good job of effectively capturing the set of all factors for each number from 1 to n, so all that remains is a set of all factorizations. As you think, using factorization of smaller values ​​to calculate factorization for larger values ​​seems to be effective.

Consider the number k with coefficients k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations k / k_1, k / k_2, ..., k / k_n by adding k_i to each factorization k / k_i to obtain the factorization k. As a processed example, we consider the calculation of factorizations 16 (which has non-trivial coefficients 2, 4, and 8). 2 has the factorization {2}, 4 has the factorization {4, 2 * 2}, and 8 has the factorization {8, 4 * 2, 2 * 2 * 2}, so we calculated the complete set of factorization by the first calculation {2 * 8, 4 * 4, 2 * 2 * 4, 8 * 2, 4 * 2 * 2, 2 * 2 * 2 * 2}, and then, taking unique factorizations, {8 * 2, 4 * 4, 4 * 2 * 2, 2 * 2 * 2 * 2}. Appendix 16 gives the final answer.

A more efficient approach is to note that we do not need to add k_i to all factorizations of k / k_i. For example, we did not need to add 2 * 2 * 4 from factorization 4, because it is already included from factorization 8. Similarly, we did not need to add 2 * 8 * from factorization 4, because it is already included from factorization 8. In the general case we need to enable factorization from k / k_i if all the values ​​in the factorization are k_i or more.

In code:

 library(gmp) all.fact <- function(n) { facts <- EfficientFactorList(n) facts[[1]] <- list(1) for (x in 2:n) { if (length(facts[[x]]) == 2) { facts[[x]] <- list(x) # Prime number } else { x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)] allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) { if (all(y >= pf)) { return(c(pf, y)) } else { return(NULL) } })) allSmaller <- do.call(c, allSmaller) facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))]) } } return(facts) } 

This is much faster than the published code:

 system.time(f1 <- FactorRepresentations(10000)) # user system elapsed # 13.470 0.159 13.765 system.time(f2 <- all.fact(10000)) # user system elapsed # 1.602 0.028 1.641 

As a health check, it also returns the same number of factorizations for each number:

 lf1 <- sapply(f1, length) lf2 <- sapply(f2, length) all.equal(lf1, lf2) # [1] TRUE 
+5
source share

In case anyone is interested in creating multiplicative sections for one number n, below are two algorithms that will do just that (the IntegerPartition function comes from the question above):

 library(gmp) library(partitions) get_Factorizations1 <- function(MyN) { pfs <- function (x1) { n1 <- length(x1) y1 <- x1[-1L] != x1[-n1] i <- c(which(y1), n1) list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L) } if (MyN==1L) return(MyN) else { pfacs <- pfs(as.integer(factorize(MyN))) unip <- pfacs$values pv <- pfacs$lengths n <- pfacs$uni mySort <- order(pv, decreasing = TRUE) pv <- pv[mySort] unip <- unip[mySort] myReps <- lapply(IntegerPartitions(pv[1L]), function(y) unip[1L]^y) if (n > 1L) { mySet <- unlist(lapply(2L:n, function(x) rep(unip[x],pv[x]))) for (p in mySet) { myReps <- unique(do.call(c, lapply(myReps, function(j) { dupJ <- duplicated(j) nDupJ <- !dupJ SetJ <- j[which(nDupJ)] lenJ <- sum(nDupJ) if (any(dupJ)) {v1 <- j[which(dupJ)]} else {v1 <- vector(mode="integer")} tList <- vector("list", length=lenJ+1L) tList[[1L]] <- sort(c(j,p)) if (lenJ > 1L) {c2 <- 1L for (a in 1:lenJ) {tList[[c2 <- c2+1L]] <- sort(c(v1,SetJ[-a],SetJ[a]*p))} } else { tList[[2L]] <- sort(c(v1,p*SetJ)) } tList } ))) } } } myReps } 

Below is the josliber code above, which is processed to handle one case. The MyFactors function comes from this message (it returns all factors of a given number).

 library(gmp) get_Factorizations2 <- function(n) { myFacts <- as.integer(MyFactors(n)) facts <- lapply(myFacts, function(x) 1L) numFacs <- length(myFacts) facts[[numFacs]] <- myFacts names(facts) <- facts[[numFacs]] for (j in 2L:numFacs) { x <- myFacts[j] if (isprime(x)>0L) { facts[[j]] <- list(x) } else { facts[[j]] <- myFacts[which(x%%myFacts[myFacts <= x]==0L)] x.facts <- facts[[j]][facts[[j]] != 1 & facts[[j]] <= (x^0.5+0.001)] allSmaller <- lapply(x.facts, function(pf) lapply(facts[[which(names(facts)==(x/pf))]], function(y) { if (all(y >= pf)) { return(c(pf, y)) } else { return(NULL) } })) allSmaller <- do.call(c, allSmaller) facts[[j]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))]) } } facts[[numFacs]] } 

Here are a few steps:

 set.seed(101) samp <- sample(10^7, 10^4) library(rbenchmark) benchmark(getFacs1=sapply(samp, get_Factorizations), getFacs2=sapply(samp, get_Factorizations2), replications=5, columns = c("test", "replications", "elapsed", "relative"), order = "relative") test replications elapsed relative 1 getFacs1 5 117.68 1.000 2 getFacs2 5 216.39 1.839 system.time(t2 <- get_Factorizations(25401600)) user system elapsed 10.89 0.03 10.97 system.time(t2 <- get_Factorizations2(25401600)) user system elapsed 21.08 0.00 21.12 length(t1)==length(t2) [1] TRUE object.size(t1) 28552768 bytes object.size(t2) 20908768 bytes 

Even if get_Factorizations1 is faster, the second method is more intuitive (see josliber's explanation above) and it creates a smaller object. For the interested reader, here is a really good article about the subject.

0
source share

All Articles