It feels somewhat unusual to simply agree with a headline slide in a product demo, however we won’t discover the lie right here.
A dramatic shot of an Epyc Rome processor mounted in a system, sans heatsink.
This half-delidded graphic exhibits off Rome’s “chiplet” system-on-chip design.
When AMD debuted the 7nm Ryzen 3000 collection desktop CPUs, they swept the sphere. For the primary time in a long time, AMD was capable of meet or beat its rival, Intel, throughout the product line in all main CPU standards—single-threaded efficiency, multi-threaded efficiency, energy/warmth effectivity, and worth. As soon as third-party outcomes confirmed AMD’s excellent benchmarks and retail supply was successful, the large remaining query was: may the corporate lengthen its 7nm success story to cellular and server CPUs?
Yesterday, AMD formally launched its new line of Epyc 7002 “Rome” collection CPUs—and it appears to have answered the server half of that query fairly totally. Having realized from the widespread FUD solid at its personal internally generated benchmarks on the Ryzen 3000 launch, this time AMD made sure to seed some assessment websites with analysis properly earlier than the launch.
The brief model of the story is, Epyc “Rome” is to the server what Ryzen 3000 was to the desktop—bringing considerably improved IPC, extra cores, and higher thermal effectivity than both its current-generation Intel equivalents or its first-generation Epyc predecessors.
Rome provides much more CPU threads per socket than Intel’s Xeon Scalable CPUs do. It additionally helps the next DDR4 clockrate and provides 128 PCIe four.zero lanes, every of which has twice the bandwidth of a PCIe three.zero lane. This turns into more and more essential in giant datacenter environments, which may often bottleneck on information ingest as a lot or greater than on uncooked CPU firepower. Rome additionally considerably improved upon Epyc’s unique NUMA design, growing effectivity and eradicating potential bottlenecks in multi-socket configuration.
Whereas Rome nonetheless cannot beat the highest-end Xeon elements for uncooked clock charge or single-threaded efficiency, it comes far nearer than the primary Epyc era did. That is largely because of a big array of structure enhancements, proven beneath in AMD’s launch-day slides, which cumulatively add as much as roughly 15% enchancment in directions executed per clock cycle (IPC).
The general story with Rome’s improved inner structure comes all the way down to extra directions executed with every CPU clock cycle.
Rome provides each extra DDR4 channels and better DDR4 clock charges than its Xeon opponents.
Rome improves on first-generation Epyc’s prediction, fetch and decode with a brand new L2 department prediction algorithm, extra buffers, and improved associativity.
Rome can schedule extra integer executions, farther forward, than its first-generation predecessor may.
Vector and floating level execution scheduling is improved with Zen 2 because of wider information paths and decreased latency.
Rome provides extra cache throughput and bigger constructions than first-generation Epyc did.
Epyc’s NUMA design improved considerably from first-generation to Rome, growing effectivity and eradicating potential bottlenecks in multiple-socket techniques.
Ars didn’t obtain assessment models for this product launch. So, the next efficiency evaluation depends on Rome benchmark information graciously offered by Michael Larabel, of well-known Linux-focused testing, evaluations, and information website Phoronix. We’ll largely be specializing in dual-socket builds utilizing Rome’s 64-core/128-thread Epyc 7742 and 32C/64T Epyc 7502, versus dual-socket builds of Intel’s 28C/56T Xeon Platinum 8280, and 20C/40T Xeon Gold 6138.
PyBench is a single-threaded benchmark, and the upper clock charge of the Xeon CPUs exhibits to good benefit right here. (Information courtesy of Phoronix)
Regardless of MKL-DNN being an Intel software program package deal closely optimized for Xeon CPUs, the Rome CPUs run neck and neck right here. (Information courtesy of Phoronix)
Intel’s home-ground software program optimization benefit for its MKL-DNN library exhibits closely on this deconvolution batch check. (Information courtesy of Phoronix)
On single-threaded benchmarks similar to PHPBench and PyBench, it is simple to see each AMD’s promised 15% enhance in IPC realized and the narrowed hole between their single-threaded efficiency and Intel’s. Though Epyc Rome nonetheless loses out to Xeon Scalable right here, the efficiency delta has shrunk from roughly 50% to 20%. Xeon Scalable additionally comes out on prime within the MKL-DNN video encoding exams—which should not be a shock, since MKL-DNN is a software program package deal written by Intel builders, using their Math Kernel Library for Deep Neural Networks.
Whereas it is simple to complain that Intel CPUs have an unfair benefit in MKL-DNN benchmarks, it’s consultant of the sort of entrenched benefit Intel enjoys—and it is an actual benefit. Somebody with a closely MKL-DNN targeted workload is unlikely to care about what’s or is not truthful.
On vendor-neutral and multithreading-friendly workloads similar to x265 video and OpenSSL, the Rome CPUs considerably outperformed the Xeons throughout the board. Datacenters are notoriously conservative in design, and extra proof against vendor-shopping than small enterprise or finish customers—however it’s more durable to disregard AMD’s more and more giant multi-threaded efficiency wins, when Intel’s single-threaded efficiency hole has been reduce in half.
Itemizing picture by AMD