Can I use more than 26 letters in `numpy.einsum`?

I use np.einsum to multiply probability tables, for example:

 np.einsum('ijk,jklm->ijklm', A, B) 

The problem is that I am dealing with more than 26 random variables (axes) in general, so if I assign a letter to each random variable, I run out of letters. Is there any other way that I can point out above to avoid this problem without resorting to a mess of np.sum and np.dot ?

+6
source share
3 answers

Short answer: you can use any of 52 letters (upper and lower). These are all letters in English. Any autumn axis names should appear on these 52 or equivalent numbers. Practically speaking, you will want to use part of these 52 in any einsum call.


@kennytm suggests using alternative input syntax. A few examples of spaces show that this is not a solution. 26 is still a practical limit (despite suspicious error messages).

 In [258]: np.einsum(np.ones((2,3)),[0,20],np.ones((3,4)),[20,2],[0,2]) Out[258]: array([[ 3., 3., 3., 3.], [ 3., 3., 3., 3.]]) In [259]: np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-259-ea61c9e50d6a> in <module>() ----> 1 np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2]) ValueError: invalid subscript '|' in einstein sum subscripts string, subscripts must be letters In [260]: np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-260-ebd9b4889388> in <module>() ----> 1 np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2]) ValueError: subscript is not within the valid range [0, 52] 

I'm not quite sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You do not want to write an einsum string using more than 52 axes at once. The resulting iterator will be too large (for memory or time).

I present some sort of mapping function that can be used as:

  astr = foo(A.names, B.names) # foo(['i','j','k'],['j','k','l','m']) # foo(['a1','a2','a3'],['a2','a3','b4','b5']) np.einsum(astr, A, B) 

https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py

is a version of Python einsum . Roughly speaking, einsum parses the row of indices, creating a list of op_axes , which can be used in np.nditer to establish the required calculation of the sum of production. With this code, I can see how the translation is done:

From the example in the __name__ block:

  label_str, op_axes = parse_subscripts('ik,kj->ij', Labels([A.ndim,B.ndim])) print op_axes # [[0, -1, 1], [-1, 1, 0], [0, 1, -1]] fine # map (4,newaxis,3)(newaxis,3,2)->(4,2,newaxis) print sum_of_prod([A,B],op_axes) 

In your example with a full diagnostic result will be

 In [275]: einsum_py.parse_subscripts('ijk,jklm->ijklm',einsum_py.Labels([3,4])) jklm {'counts': {105: 1, 106: 2, 107: 2, 108: 1, 109: 1}, 'strides': [], 'num_labels': 5, 'min_label': 105, 'nop': 2, 'ndims': [3, 4], 'ndim_broadcast': 0, 'shapes': [], 'max_label': 109} [('ijk', [105, 106, 107], 'NONE'), ('jklm', [106, 107, 108, 109], 'NONE')] ('ijklm', [105, 106, 107, 108, 109], 'NONE') iter labels: [105, 106, 107, 108, 109],'ijklm' op_axes [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]] Out[275]: (<einsum_py.Labels at 0xb4f80cac>, [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]]) 

Using 'ajk,jkzZ->ajkzZ' changes the labels, but also leads to op_axes .


Here is the first draft of the translation function. It should work for any list of lists (hashed items):

 def translate(ll): mset=set() for i in ll: mset.update(i) dd={k:v for v,k in enumerate(mset)} x=[''.join([chr(dd[i]+97) for i in l]) for l in ll] # ['cdb', 'dbea', 'cdbea'] y=','.join(x[:-1])+'->'+x[-1] # 'cdb,dbea->cdbea' In [377]: A=np.ones((3,1,2),int) In [378]: B=np.ones((1,2,4,3),int) In [380]: ll=[list(i) for i in ['ijk','jklm','ijklm']] In [381]: y=translate(ll) In [382]: y Out[382]: 'cdb,dbea->cdbea' In [383]: np.einsum(y,A,B).shape Out[383]: (3, 1, 2, 4, 3) 

Using set to match index objects means that the trailing index characters are unordered. As long as you specify RHS, this should not be a problem. I also ignored ellipsis .

==================

The version of the einsum input einsum converted to the version of the substring string in einsum_list_to_subscripts() (in numpy/core/src/multiarray/multiarraymodule.c ). He replaces ellipsis with "...". He raised the error message [0.52] if ( s < 0 || s > 2*26) , where s is the number in one of these subscriptions. And converts s to a string with

  if (s < 26) { subscripts[subindex++] = 'A' + s; } else { subscripts[subindex++] = 'a' + s; 

But the second case does not seem to work; I get errors, for example, for 26.

 ValueError: invalid subscript '{' in einstein sum subscripts string, subscripts must be letters 

That 'a'+s is false if s>26 :

 In [424]: ''.join([chr(ord('A')+i) for i in range(0,26)]) Out[424]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' In [425]: ''.join([chr(ord('a')+i) for i in range(0,26)]) Out[425]: 'abcdefghijklmnopqrstuvwxyz' In [435]: ''.join([chr(ord('a')+i) for i in range(26,52)]) Out[435]: '{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94' 

That 'a'+s is wrong; it should be:

 In [436]: ''.join([chr(ord('a')+i-26) for i in range(26,52)]) Out[436]: 'abcdefghijklmnopqrstuvwxyz' 

I posted https://github.com/numpy/numpy/issues/7741

The existence of this error after all this time indicates that the format of subscriptions is not common, and that the use of large numbers in this list is even less common.

+6
source

You can use the einsum(op0, sublist0, op1, sublist1, ..., [sublistout]) form einsum(op0, sublist0, op1, sublist1, ..., [sublistout]) instead of i,j,ik->ijk , which the API does not limit to 52 axes *. How this detailed form corresponds to the ijk form is in the documentation .

Op's

 np.einsum('ijk,jklm->ijklm', A, B) 

will be recorded as

 np.einsum(A, [0,1,2], B, [1,2,3,4], [0,1,2,3,4]) 

(* Note: implementation is still limited to 26 axes. See @hpaulj answer and its bug report for an explanation)


Equivalents from numpy examples:

 >>> np.einsum('ii', a) >>> np.einsum(a, [0,0]) >>> np.einsum('ii->i', a) >>> np.einsum(a, [0,0], [0]) >>> np.einsum('ij,j', a, b) >>> np.einsum(a, [0,1], b, [1]) >>> np.einsum('ji', c) >>> np.einsum(c, [1,0]) >>> np.einsum('..., ...', 3, c) >>> np.einsum(3, [...], c, [...]) >>> np.einsum('i,i', b, b) >>> np.einsum(b, [0], b, [0]) >>> np.einsum('i,j', np.arange(2)+1, b) >>> np.einsum(np.arange(2)+1, [0], b, [1]) >>> np.einsum('i...->...', a) >>> np.einsum(a, [0, ...], [...]) >>> np.einsum('ijk,jil->kl', a, b) >>> np.einsum(a, [0,1,2], b, [1,0,3], [2,3]) 
+3
source

If you are talking about ijk letters in your example and have more than available alphabetic characters, then you cannot.

In einsum numpy code, here and here numpy checks each character one by one using isalpha and there seems to be no way to create names with more than 1 character.

You may be able to use capital letters, but the main answer to the question is that you cannot have names for axes with more than 1 character.

+2
source

All Articles