Why isn't Intel developing SIMD ISA in a more compatible or universal way?

Intel has several ISA SIMDs such as SSE, AVX, AVX2, AVX-512, and IMCI on Xeon Phi. These ISAs are supported on different processors. For example, AVX-512 BW, AVX-512 DQ and AVX-512 VL are only supported on Skylake, but not on Xeon Phi. AVX-512F, AVX-512 CDI, AVX-512 ERI and AVX-512 PFI are supported on both Skylake and Xeon Phi.

Why isn’t Intel developing a more versatile SIMD ISA that can run on all of its modern processors?

In addition, Intel removes some features and adds new ones when developing ISA. Many guts have many tastes. For example, some work with packed 8-bit, and some work with packed 64-bit. Some flavors are not widespread. For example, Xeon Phi will not be able to handle packed 8-bit values. However, Skylake will have this.

Why is Intel changing its embedded SIMD in such an inconsistent manner?

If SIMD ISAs are more compatible with each other, existing AVX code can be ported to the AVX-512 with much less effort.

+5
source share
2 answers

I see the reason why three times.

(1) When they initially designed the MMX, they had very little work space, so it became as simple as possible. They also did it in a way that is fully compatible with the existing x86 ISA (precise interrupts + state preservation when switching contexts). They did not expect them to constantly increase the width of the SIMD registers and add so many instructions. Each generation, when they added wider SIMD registers and more complex instructions, they had to maintain the old ISA for compatibility.

(2) This strange thing that you see with the AVX-512 is that they are trying to combine two disparate product lines. Skylake from Intel PC / server line, so their path can be seen as MMX β†’ SSE / 2/3/4 β†’ AVX β†’ AVX2 β†’ AVX - 512. Xeon Phi was based on the x86-compatible Larrabee graphics card, which used a set of instructions that discusses this on Intel forums .

I would like to see a model that can be updated without having to recompile your code every time. For example, instead of defining the AVX register as 512-bit in ISA, it should be a parameter stored in the microarchitecture and retrieved by the programmer at run time. The user asks what is the maximum SIMD width available on this computer ?, the architecture returns XYZ , and the user has a common control flow to handle what XYZ . It would be much cleaner and more scalable than the current method, which uses multiple versions of the same function for each possible version of SIMD .: - /

+7
source

There is a convergence of SIMD ISA between Xeon and Xeon Phi, and they can eventually become identical. I doubt you will ever get the same SIMD ISA across the entire Intel processor line - keep in mind that it extends from the tiny Quark SOC to the Xeon Phi. Before the AVX-1024 is ported from Xeon Phi to Quark or to a low-performance Atom processor, there will be a long time, possibly endless.

To provide better portability between different processor families, including future ones, I advise you to use higher-level concepts than simple SIMD instructions or built-in functions. Use OpenCL, OpenMP, Cilk Plus, C ++ AMP and the auto-generating compiler. Quite often, they will make good instructions for SIMD commands for the platform.

0
source

All Articles