Arm NEON and poly8_t and poly16_t

I recently studied neon optimization using inline functions, and I came across poly8_t and poly16_t data types. Then I do not care what it is.

I searched the entire network, but still could not find ANY explanation of what they are.

Can someone explain them to me?

Edit : Thanks for the answers, but why, if it's just another way of multiplying, etc., does it have a completely different data type?

+6
source share
4 answers

Left = Regular Multiplication, Right = Bassless Multiplication

1 1 0 1 1 1 0 1 * 1 0 0 1 1 0 0 1 ------------ --> -------------- (1)1 1 0 1 <-- (1) is carry 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 + 1 1 0 1 + GF(2) or XOR ------------- --------------- 1 1 1 0 1 0 1 1 1 0 0 1 0 1 

Each 1 or 0 in a diagonally descending matrix is ​​a partial product of one source bit from the vector "1101" and one bit of the source from another vector "1001".

The correct applications are found in CRC (ECC) error correction code calculations (Reed Solomon, BCH) and cryptography (elliptic curves, internal AES elements).

Illustrating the connection with polynomial multiplication, the operation described above is summarized as

  1101 == x^3 + x^2 + 0 + 1; 1001 == x^3 + 0 + 0 + 1; 

Regular polynomial multiplication: p (x) * (x ^ 3 + 1) == p (x) * x ^ 3 + p (x) ==

  (x^3 + x^2 + 1)(x^3 + 1) == x^6+x^5+x^3 + x^3+x^2+1 == 1x^6 + 1x^5 + 0x^4 + 2x^3 + 1^x2 + 0x + 1 == "1102101" 

In GF (2), each coefficient is simply calculated modulo 2, making 1100101b .

The data type in GF looks the same as uint8_t, uint16_t, or possibly up to 128_t in that the data type for GF (2 ^ 8) contains 256 unique bitpatterns. However, for example, the bitpattern '00010001', for example. does not have a traditional interpretation. (This is not 17 decimal, but perhaps the 123rd degree of β€œunity” modulo another polynomial.) Multiplying this number by the same β€œunity” modulo the generator polynomial g (x), leads to the 124th degree, etc. d. Then the properties (identities) of the final fields have only interesting applications - such that you can (remotely) easily calculate which 32-bit number to add to the file so that it corresponds to the 32-bit CRC; or you can use properties to parallelize the calculation of crc or to implement multiplication of a signal with a Fourier-like transformation in finite fields (theoretical number of numbers).

+7
source

These types are used for intransitive multiplication. It is useful for cryptographic algorithms and CRC hashes. Here are some application docs (they study the x86 PCLMULQDQ instruction, but the same ideas apply to lossless migration on ARM processors):

+6
source

You have not described PMUL vs PMULL.

As I understand it (perhaps incorrectly) each per PMUL element works on two 8-bit source elements and generates one 8-bit result element.

Each of the PMUL elements generates 8 partial products and each PP is shifted to XORed, respectively. So, from lsb of the first PP to msb of the last PP. It seems that the result is 15 bits. PMUL can only store 8-bit results.

Are the most significant 7-bit 15-bit results discarded?

+2
source

As a reference, this is from the Cortex-A (v4) Series Programmer's Guide, chapter 7.2.2:

Polynomial arithmetic is useful when implementing specific cryptography or data integrity algorithms.

Adding two polynomials over {0,1} is the same as a bitwise OR exception. Polynomial complementary results in different meanings traditional complement.

Multiplication of two polynomials over {0,1} involves first determining partial products, as is done in ordinary multiplication, then partial products are exclusive ORed, and are not added traditionally. Polynomial multiplication leads to different values ​​for ordinary multiplication, since it requires polynomial addition of partial products.

+1
source

All Articles