As of 2014, the most powerful supercomputer in the world (Tianhe-2) clocks in at 33 Petaflops (1015 FLOPs), or just over 3% of the requirement for exascale, while using 17 megawatts of electricity, coming out to just under 2 GFLOPs/Watt
The most power efficent petascale supercomputer (Piz Daint) is a 7 petaflop system using 2.3MW of electricity, ranking it at 3 GFLOPs/Watt. The most efficent single processors alone are only 4 to 5 GFLOPs/watt. The US Department of Energy's annual power budget for a exascale system is targetted at no more than 20MW. Scaling a modern system to exascale would require over 250MW, and would cost over $300 million annually to operate
Current estimates for building an exascale system using todays components would cost well over $1 billion. When including electricity and operation costs over the 5 to 6 year lifetime of a system, more than twice the cost of the intial system would be spent on just operating it.
The Neo compute core is optimized for high-efficiency floating-point operations. Each core includes :
Each core is a full superscalar RISC core, capable of executing two instructions per cycle. The FPU is a pipelined Fused Multiply-Add, capable of either one IEEE 754-2008 double precision (binary64) operation per cycle, or two single precision (binary32) per cycle.Single core performance: 1 Double Precision GFLOPs, 2 Single Precision GFLOPs
The Neo processor incorporates 256 cores arranged in a 2D mesh network. The cores communicate via a low-latency NoC subsystem. The RISC-inspired load/store architecture enables memory operations to occur:
Local scratchpad memories are physically addressed as part of a flat global address space.
Unlike GPUs and other SIMD accelerators, Neo's MIMD processor design leverages independent program counters and instruction registers in each core to allow different operations to be performed in parallel on separate pieces of data.
With 256 cores, estimated power consumption per compute chip is 3 Watts.
A Neo node includes 16+ compute chips arranged in two-dimensional grid. Every chip can independently communicate with any other chip in the grid via an extremely high-bandwidth parallel interconnect, allowing for direct connections between adjacent peers. Each compute chip on the external edge of the grid has at least one parallel connection to the Grid and Memory Management Unit.
The node's Grid and Memory Management Unit (GaMMU) is a distinct chip designed to intelligently manage node-level resources and optionally direct data and control flow across this grid. The GaMMU integrates:
The GaMMU provides a common interface for the compute grid to access node-level DRAM and handles external I/O requests. The included Linux host serves as a familiar interface for user administration and debugging. It also allows for the optional execution of supplemental application logic that can accelerate the grid's access to memory and I/O resources. This logic may be specified by application developers explicitly or may be generated by REX development tools.
The GaMMU's external I/O capabilities support low-latency connections between nodes, as well as with external storage devices and other network-attached peripherals.
By removing legacy components unnecessary to HPC, and by developing a fresh compute architecture focused on power optimization, Neo is anticipated to become the most power-efficient HPC platform available upon its release. A single Neo compute rack based on the (non-density optimized) reference design, incorporating 90 Neo nodes, is expected to deliver 360 DP TFLOPS at only 7.2kW, equivalent to 50 GFLOPS/W. This will make Neo at least 10x more power efficient than today’s most efficient HPC installation (GSIC Center at the Tokyo Institute of Technology).
Moreover, an installation of only ten Neo compute racks would rank among the Top 10 highest performance HPC installations, according to data from top500.org (June 2014).
The Neo porting environment is designed to simplify and accelerate the process of migrating applications from other HPC platforms to Neo. The porting environment includes a set of commonly used libraries, like BLAS, that are optimized and pre-compiled for the Neo environment.
The full development environment includes a complete GCC and LLVM compiler toolchain, as well as standard C libraries optimized for Neo. It also provides a hardware abstraction layer API allowing developers to orchestrate and optimize data and control flow across the compute grid. Alternatively, developers can leverage supported programming models - such as the Actor model, CSP, and PGAS - which abstract and automate the deployment and execution of concurrent or parallel processes.
Neo's 'bare metal' runtime environment does not require a grid operating system, and will optionally support OpenMP, OpenCL and OpenACC
Bringing fresh ideas into a large industry
CEOThomas Sohmers is the founder and CEO of REX Computing. His experience includes working at the MIT Institute for Soldier Nanotechnologies for 3 years as both an end user of HPC systems, and later transitioning into designing and building them at the lab. This experience led to starting REX Computing in 2013 as a recipient of the Peter Thiel '20 under 20' Fellowship, where he leads the architectural design and business operations.
CTOPaul is the CTO of REX Computing and chief software architect. Originally studying CS at Georgia Tech, he crossed fields into biotechnology where he worked in computational biology and MEMS and nanofabrication. Paul dropped out in order to become part of the 2012 class of Thiel Fellows, where he founded a synthetic biology startup. He brings REX his extensive compiler and software development skills and end user scientific HPC knowledge.