How to create a list of random integer vectors whose sum is x

Creating a random vector whose sum is X (for example, X = 1000) is quite simple:

import random def RunFloat(): Scalar = 1000 VectorSize = 30 RandomVector = [random.random() for i in range(VectorSize)] RandomVectorSum = sum(RandomVector) RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector] return RandomVector RunFloat() 

In the above code, a vector is created whose values ​​are equal to the floats and the sum of 1000.

I find it difficult to create a simple function to create a vector whose values ​​are integers and the sum is X (for example, X = 1000 * 30)

 import random def RunInt(): LowerBound = 600 UpperBound = 1200 VectorSize = 30 RandomVector = [random.randint(LowerBound,UpperBound) for i in range(VectorSize)] RandomVectorSum = 1000*30 #Sanity check that our RandomVectorSum is sensible/feasible if LowerBound*VectorSize <= RandomVectorSum and RandomVectorSum <= UpperBound*VectorSum: if sum(RandomVector) == RandomVectorSum: return RandomVector else: RunInt() 

Does anyone have any suggestions for improving this idea? My code will never exit or run into problems with recursion depth.

Edit (July 9, 2012)

Thanks to Oliver, mgilson and Dougal for their contributions. My solution is shown below.

  • Oliver was very creative with the idea of ​​multinomial distribution
  • Simply put, (1) is likely to bring out some solutions more than others. Dougal demonstrated that the multidimensional distribution of the solution space is not a uniform or normal simple assay example of the law of large numbers. Dougal also suggested using a numpy polynomial function, which saves me from many troubles, pains and headaches.
  • To overcome the (2) issue with the release, I use RunFloat () to give what appears (I did not check this to just have a superficial appearance) to be more uniform. What is the difference in this compared to (1)? I really don't know what to do. This is good enough for my use.
  • Thanks again mgilson for an alternative method that does not use numpy.

Here is the code I made for this edit:

Edit # 2 (July 11, 2012)

I realized that the normal distribution is not implemented correctly, I have since changed it to the following:

 import random def RandFloats(Size): Scalar = 1.0 VectorSize = Size RandomVector = [random.random() for i in range(VectorSize)] RandomVectorSum = sum(RandomVector) RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector] return RandomVector from numpy.random import multinomial import math def RandIntVec(ListSize, ListSumValue, Distribution='Normal'): """ Inputs: ListSize = the size of the list to return ListSumValue = The sum of list values Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1). Output: A list of random integers of length 'ListSize' whose sum is 'ListSumValue'. """ if type(Distribution) == list: DistributionSize = len(Distribution) if ListSize == DistributionSize or (ListSize-1) == DistributionSize: Values = multinomial(ListSumValue,Distribution,size=1) OutputValue = Values[0] elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped UniformDistro = [1/ListSize for i in range(ListSize)] Values = multinomial(ListSumValue,UniformDistro,size=1) OutputValue = Values[0] elif Distribution.lower() == 'normal': """ Normal Distribution Construction....It very flexible and hideous Assume a +-3 sigma range. Warning, this may or may not be a suitable range for your implementation! If one wishes to explore a different range, then changes the LowSigma and HighSigma values """ LowSigma = -3#-3 sigma HighSigma = 3#+3 sigma StepSize = 1/(float(ListSize) - 1) ZValues = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))] #Construction parameters for N(Mean,Variance) - Default is N(0,1) Mean = 0 Var = 1 #NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues] NormalDistro= list() for i in range(len(ZValues)): if i==0: ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2)) NormalDistro.append(ERFCVAL) elif i == len(ZValues) - 1: ERFCVAL = NormalDistro[0] NormalDistro.append(ERFCVAL) else: ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2)) ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2)) ERFCVAL = ERFCVAL1 - ERFCVAL2 NormalDistro.append(ERFCVAL) #print "Normal Distribution sum = %f"%sum(NormalDistro) Values = multinomial(ListSumValue,NormalDistro,size=1) OutputValue = Values[0] else: raise ValueError ('Cannot create desired vector') return OutputValue else: raise ValueError ('Cannot create desired vector') return OutputValue #Some Examples ListSize = 4 ListSumValue = 12 for i in range(100): print RandIntVec(ListSize, ListSumValue,Distribution=RandFloats(ListSize)) 

The code above can be found on github . This is part of the class that I built for the school. user1149913 also posted a good explanation of the problem.

+7
source share
7 answers

I would suggest not doing this recursively:

With selective recursion, the value from the first index has a much larger possible range, while the values ​​in subsequent indexes will be limited to the first value. This will give something similar to an exponential distribution.

Instead, I would recommend sampling from polynomial distribution . This will apply to each index uniformly, limit the sum, force all values ​​to be integers and selectively uniformly from all possible configurations that follow these rules (note: configurations that can occur in several ways will be weighted by the number of ways that they can occur )

To combine your question with multinomial notation, the total is n (integer), so each of the k values ​​(one for each index, as well as integers) should be from 0 to n. Then follow the recipe here .

(Or use numpy.random.multinomial , as recommended by @Dougal).

+3
source

I just ran both the @Oliver multi-component approach and the @mgilson code a million times each, for a length of-3, adding up to 10, and looked at how many times each possible result. Both of them are extremely heterogeneous:

aSNns.png

(I'm going to show an indexing approach.)

Is there any? It depends on whether you want an “arbitrary vector with this property, which is usually different each time,” and each real vector will be equally likely.

In the multi-component approach, of course, 3 3 4 will be much more likely than 0 0 10 (4200 times more often, as it turns out). biases are less obvious to me, but 0 0 10 and its rearrangements were the least probable (only ~ 750 times each of a million); the most common were 1 4 5 and its rearrangements; I don’t know why, but they were certainly the most common, and then 1 3 6 . This usually starts with the amount too high in this configuration (expect 15), although I'm not sure why the reduction works that way.

One way to obtain a uniform conclusion on possible vectors would be a deviation scheme. To get a vector of length K with the sum N , you must:

  • A sample of a vector of length K with integer elements is uniformly and independently between 0 and N
  • Repeat until the sum of the vector is N

Obviously, this will be very slow for non-small K and N

Another approach would be to assign numbering to all possible vectors; there are (N + K - 1) choose (K - 1) such vectors, so just select a random integer in this range to decide which one you want. One reasonable way to number them is lexicographic ordering: (0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), ...

Note that the last ( K th) element of the vector is uniquely determined by the sum of the first K-1 .

I'm sure there is a good way to jump directly to any index on this list, but I can’t think about it right now .... listing the possible results and walking on them will work, but will probably be slower than necessary. Here is the code for this (although we are actually using reverse lexicographic ordering here ...).

 from itertools import islice, combinations_with_replacement from functools import reduce from math import factorial from operator import mul import random def _enum_cands(total, length): # get all possible ways of choosing 10 of our indices # for example, the first one might be 0000000000 # meaning we picked index 0 ten times, for [10, 0, 0] for t in combinations_with_replacement(range(length), 10): cand = [0] * length for i in t: cand[i] += 1 yield tuple(cand) def int_vec_with_sum(total, length): num_outcomes = reduce(mul, range(total + 1, total + length)) // factorial(length - 1) # that integer division, even though SO thinks it a comment :) idx = random.choice(range(num_outcomes)) return next(islice(_enum_cands(total, length), idx, None)) 

As shown in the histogram above, this is virtually uniform in the possible results. It also easily adapts to upper / lower bounds for any single element; just add the condition to _enum_cands .

This is slower than any other answer: for the sum 10 length 3 I get

  • 14.7 us using np.random.multinomial ,
  • 33.9 us using mgilson's,
  • 88.1 us with this approach

I expect the difference to worsen as the number of possible outcomes increases.

If someone comes up with a great formula for indexing into these vectors, it will be much better ...

+2
source

Here is a pretty simple implementation.

 import random import math def randvec(vecsum, N, maxval, minval): if N*minval > vecsum or N*maxval < vecsum: raise ValueError ('Cannot create desired vector') indices = list(range(N)) vec = [random.randint(minval,maxval) for i in indices] diff = sum(vec) - vecsum # we were off by this amount. #Iterate through, incrementing/decrementing a random index #by 1 for each value we were off. while diff != 0: addthis = 1 if diff > 0 else -1 # +/- 1 depending on if we were above or below target. diff -= addthis ### IMPLEMENTATION 1 ### idx = random.choice(indices) # Pick a random index to modify, check if it OK to modify while not (minval < (vec[idx] - addthis) < maxval): #operator chaining. If you don't know it, look it up. It pretty cool. idx = random.choice(indices) #Not OK to modify. Pick another. vec[idx] -= addthis #Update that index. ### IMPLEMENTATION 2 ### # random.shuffle(indices) # for idx in indices: # if minval < (vec[idx] - addthis) < maxval: # vec[idx]-=addthis # break # # in situations where (based on choices of N, minval, maxval and vecsum) # many of the values in vec MUST BE minval or maxval, Implementation 2 # may be superior. return vec a = randvec(1000,20,100,1) print sum(a) 
+1
source

The most effective way to selectively select from a set of sections of N elements in K-bins is to use a dynamic programming algorithm, which is O (KN). There are many possibilities (http://mathworld.wolfram.com/Multichoose.html), so listing them will be very slow. Sampling and other Monte Carlo methods are also likely to be very slow.

Other methods that people offer, such as polynomial sampling, do not draw samples from a uniform distribution.

Let T (n, k) be the number of partitions of n elements into k bins, then we can calculate the recurrence

 T(n,1)=1 \forall n>=0 T(n,k)=\sum_{m<=n} T(nm,k-1) 

To try elements of K that are summed with N, a sample of K polynomial distributions going “backward” in repetition: Edit: T in the polynomial below should be normalized to sum to one before drawing each sample.

 n1 = multinomial([T(N,K-1),T(N-1,K-1),...,T(0,K-1)]) n2 = multinomial([T(N-n1,K-1),T(N-n1-1,K-1),...,T(0,K-1)]) ... nK = multinomial([T(N-sum([n1,...,n{k-1}]),1),T(N-sum([n1,...,n{k-1}])-1,1),...,T(0,1)]) 

Note. I allow fetch 0.

This procedure is similar to sampling a set of latent states from a segmental semi-Markov model (http://www.gatsby.ucl.ac.uk/%7Echuwei/paper/icml103.pdf).

+1
source

This version will give an even distribution:

 from random import randint def RunInt(VectorSize, Sum): x = [randint(0, Sum) for _ in range(1, VectorSize)] x.extend([0, Sum]) x.sort() return [x[i+1] - x[i] for i in range(VectorSize)] 
0
source

To give you a different approach, execute partition_function(X) and randomly select a number between 0 and the length of partition_function(1000) , and you have one. Now you just need to find an efficient way to calculate the split function. These links may help:

http://code.activestate.com/recipes/218332-generator-for-integer-partitions/

http://oeis.org/A000041

EDIT: Here is a simple code:

 import itertools import random all_partitions = {0:set([(0,)]),1:set([(1,)])} def partition_merge(a,b): c = set() for t in itertools.product(a,b): c.add(tuple(sorted(list(t[0]+t[1])))) return c def my_partition(n): if all_partitions.has_key(n): return all_partitions[n] a = set([(n,)]) for i in xrange(1,n/2+1): a = partition_merge(my_partition(i),my_partition(ni)).union(a) all_partitions[n] = a return a if __name__ == '__main__': n = 30 # if you have a few years to wait uncomment the next line # n = 1000 a = my_partition(n) i = random.randint(0,len(a)-1) print(list(a)[i]) 
0
source

What with:

 import numpy as np def RunInt(VectorSize, Sum): a = np.array([np.random.rand(VectorSize)]) b = np.floor(a/np.sum(a)*Sum) for i in range(int(Sum-np.sum(b))): b[0][np.random.randint(len(b[0]))] += 1 return b[0] 
0
source

All Articles