ARM guns for high-performance computing with its new vector instruction set

Intel and AMD (at times) have jointly owned the vast majority of the server and high performance computing markets for nearly 20 years, but ARM is gunning to attack them in the HPC (high performance computing) industry. At Hot Chips today, the CPU design firm unveiled plans for a new type of scaling vector instruction, dubbed scalar vector extensions (SVE).

Vector instruction sets are nothing new. SIMD (Single Instruction Multiple Data) instruction sets like SSE, AVX, AltiVec, and ARM’s own NEON are all instruction sets that allow processors to execute one-dimensional arrays rather than conventional scalar processors, which execute single instructions. Intel’s ownXeon PhiR has focused on improving vector performance by implementing large, specialized vector processors (VPUs) in hardware and with support for Intel’s AVX-512 instruction set (this means the vector processors can execute up to 512-bit instructions at a time).

What distinguishes this ARM effort from other products is that its explicitly designed to scale across hardware with as little as 128-bit SVE registers and as large as 2048-bit. To understand how this works, compare it with current x86 consumer processors. Both AMD and Intel support the AVX instruction set, but AMD’s Steamroller CPU (Kaveri) has 128-bit registers while Intel implemented 256-bit registers for AVX. Steamroller can execute one 256-bit AVX operation or 2×128-bit operations per cycle. Up until Skylake, Intel chips couldn’t execute 2×128-bit AVX instructions simultaneously, though it has added this capability in its latest CPU generation.

What ARM is describing is a hardware scheduling ability that’s much more flexible than what AMD or Intel have implemented to-date. Hand 2048-bit code to an ARM core with a 128-bit SVE, and the CPU will find a way to execute that code (albeit at a severe performance penalty). Similarly, hand 128-bit code to a CPU with 2048-bit vector processing, and it’ll try to execute the workload in a way that takes advantage of the SVE’s width.

Image courtesy of InsideHPC

This is the kind of feature that looks amazing on paper, but could be difficult to implement in reality. Agner Fog’s CPU manuals note the variety of fine details that can limit SIMD performance in CPUs — factors like instruction mix and size can matter a great deal, even when they shouldn’t make a difference on paper. That’s not to cast aspersions on ARM’s technology, which could be of considerable benefit — but we’ll have to wait and see how well the SVE can scale when confronted with corner cases.

Right now, ARM has just one announced customer, Fujitsu, which intends to use the SVE instructions in a new lineup ofsupercomputer processors. Earlier this year, Fujitsu announced it would transition to 64-bit ARM processors for future designs. Up until now, Fujitsu had relied on Sparc64 VIIIfx processors (pictured, top) to power the K supercomputer in Japan, which is currently the fifth fastest in the world. The new, Post-K computer is expected to come online in 2020. The Register has more details on how SVE is different from previous vector instruction sets and its capabilities.

An ARM game-changer?

As tempting as it might be to grab this information and sprint with it, there’s reason to be cautious. While this is obviously a huge effort for ARM and a major component of any push into theHPC space, it’s not yet clear that SVE will be the beachhead of a major new offensive against Intel. Five years ago, analysts confidently predicted that ARM’s lower costs and higher efficiency would result in the company rapidly taking market share away from Intel. Rory Read, AMD’s CEO, once confidently predicted that the server market would be at least 15% ARM by 2018. According to IDC, Intel currently holds 99.2% of the server market.

Winning Fujitsu’s business is a huge step forward for ARM, but the HPC market both is and isn’t a great place to see the future of computing. On the one hand, it’s true that technologies and features often debut in high-end markets before waterfalling into lower cost segments. On the other, the high cost and custom nature of HPC buildouts mean that these systems support some esoteric architectures that aren’t found in other markets. Intel’s Itanium once held a significant share of the TOP500, as shown above in purple, despite finding very limited success in most other markets.

Winning a TOP500 system design is a huge step forward for ARM. It’s absolutely indicative of the way ARM wants to challenge Intel in more segments, and SVE is an important step towards challenging Xeon Phi. But a single HPC win, in and of itself, won’t catapult ARM to server dominance or signal Intel’s inability to compete in the markets it has owned for decades.

Cse Study

Categories

About Me

Total Pageviews

Follow Us

Popular Posts