Short answer: you can use any of 52 letters (upper and lower). These are all letters in English. Any autumn axis names should appear on these 52 or equivalent numbers. Practically speaking, you will want to use part of these 52 in any einsum call.
@kennytm suggests using alternative input syntax. A few examples of spaces show that this is not a solution. 26 is still a practical limit (despite suspicious error messages).
In [258]: np.einsum(np.ones((2,3)),[0,20],np.ones((3,4)),[20,2],[0,2]) Out[258]: array([[ 3., 3., 3., 3.], [ 3., 3., 3., 3.]]) In [259]: np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-259-ea61c9e50d6a> in <module>() ----> 1 np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2]) ValueError: invalid subscript '|' in einstein sum subscripts string, subscripts must be letters In [260]: np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2]) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-260-ebd9b4889388> in <module>() ----> 1 np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2]) ValueError: subscript is not within the valid range [0, 52]
I'm not quite sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You do not want to write an einsum string using more than 52 axes at once. The resulting iterator will be too large (for memory or time).
I present some sort of mapping function that can be used as:
astr = foo(A.names, B.names)
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py
is a version of Python einsum . Roughly speaking, einsum parses the row of indices, creating a list of op_axes , which can be used in np.nditer to establish the required calculation of the sum of production. With this code, I can see how the translation is done:
From the example in the __name__ block:
label_str, op_axes = parse_subscripts('ik,kj->ij', Labels([A.ndim,B.ndim])) print op_axes
In your example with a full diagnostic result will be
In [275]: einsum_py.parse_subscripts('ijk,jklm->ijklm',einsum_py.Labels([3,4])) jklm {'counts': {105: 1, 106: 2, 107: 2, 108: 1, 109: 1}, 'strides': [], 'num_labels': 5, 'min_label': 105, 'nop': 2, 'ndims': [3, 4], 'ndim_broadcast': 0, 'shapes': [], 'max_label': 109} [('ijk', [105, 106, 107], 'NONE'), ('jklm', [106, 107, 108, 109], 'NONE')] ('ijklm', [105, 106, 107, 108, 109], 'NONE') iter labels: [105, 106, 107, 108, 109],'ijklm' op_axes [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]] Out[275]: (<einsum_py.Labels at 0xb4f80cac>, [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]])
Using 'ajk,jkzZ->ajkzZ' changes the labels, but also leads to op_axes .
Here is the first draft of the translation function. It should work for any list of lists (hashed items):
def translate(ll): mset=set() for i in ll: mset.update(i) dd={k:v for v,k in enumerate(mset)} x=[''.join([chr(dd[i]+97) for i in l]) for l in ll]
Using set to match index objects means that the trailing index characters are unordered. As long as you specify RHS, this should not be a problem. I also ignored ellipsis .
==================
The version of the einsum input einsum converted to the version of the substring string in einsum_list_to_subscripts() (in numpy/core/src/multiarray/multiarraymodule.c ). He replaces ellipsis with "...". He raised the error message [0.52] if ( s < 0 || s > 2*26) , where s is the number in one of these subscriptions. And converts s to a string with
if (s < 26) { subscripts[subindex++] = 'A' + s; } else { subscripts[subindex++] = 'a' + s;
But the second case does not seem to work; I get errors, for example, for 26.
ValueError: invalid subscript '{' in einstein sum subscripts string, subscripts must be letters
That 'a'+s is false if s>26 :
In [424]: ''.join([chr(ord('A')+i) for i in range(0,26)]) Out[424]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' In [425]: ''.join([chr(ord('a')+i) for i in range(0,26)]) Out[425]: 'abcdefghijklmnopqrstuvwxyz' In [435]: ''.join([chr(ord('a')+i) for i in range(26,52)]) Out[435]: '{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94'
That 'a'+s is wrong; it should be:
In [436]: ''.join([chr(ord('a')+i-26) for i in range(26,52)]) Out[436]: 'abcdefghijklmnopqrstuvwxyz'
I posted https://github.com/numpy/numpy/issues/7741
The existence of this error after all this time indicates that the format of subscriptions is not common, and that the use of large numbers in this list is even less common.