Numascale, AS is headquartered in Oslo, Norway and develops cc-NUMA technology for HyperTransport called NumaConnect". NumaConnect" features fully cache coherent shared memory and virtualized I/O for Opteron systems. NumaConnect" uses a directory based cache coherence model that can be scaled to 4096 nodes with multiple processors and supports the full 48 bits physical address space of Opteron. NumaConnect" is ideal for high performance computing multiprocessing applications requiring large memory and for server virtualization. It supports SMP enabled operating systems like Linux, Solaris, Unix and Windows Server.
HyperTransport-enabled products from Numascale, AS
Product Type
|
Product Detail
|
Cluster Interconnect
|
NumaConnect SMP Adapter Card
Scalable Cache Coherent Shared Memory on your Cluster Budget Numascale's SMP Adapter is an HTX card made to be used with commodity servers with AMD processors that feature an HTX connector to its HyperTransport interconnect. Highlights Scalable, Directory Based Cache Coherence Protocol - Write-back cache for Remote Data: 2-4-8-(16)GB options, standard SDIMMs
- ECC protected with background scrubbing of soft errors
- 16 coherent + 16 non-coherent outstanding memory transactions
- Support for single-image or multi-image OS partitions
- 3-way on-chip distributed switching for 1D, 2D or 3D Torus topologies
- 30GB/s switching capacity per node
- Less than 20W power dissipation
NumaConnect OS Support - Linux, Windows Server, Solaris, Unix
NumaConnect Essentials - ccNuma and Numa low latency shared memory interconnect
- Virtualizes Everything, Including Memory and IO
- More than 10x price/performance benefit over proprietary solutions
- Seamless Scaling of Application Size and Performance - NO Porting Efforts
- Scalable, Cache Coherent, Shared Memory System Interconnect
- AMD processor nodes with Coherent HyperTransport
- Based on field proven design
- Enables commodity cost level for high-end servers
NumaConnect Features - Converts between snoop-based (broadcast) and directory based coherency protocols
- Write-back to Remote Cache
- Non-coherent transactions (for optimized MPI)
- Pipelined memory access (16 outstanding transactions + 16 non-coherent)
- Remote Cache size up to 16GBytes (remote data)
NumaConnect RAS Features - ECC for single bit correction and double bit detection
- Automatic scrubbing after single bit error detection
- Automatic background scrubbing to minimize probability of soft error accumulation
- Flexible micro-coded coherence processing engine
- Watch-bus for internal activity observation in real-time
- Built-in Performance Counters
NumaConnect Specifications Bandwidth to the node-local CPU - 1 cHT link (16+16) @800MHz DDR = 6.4GB/s over HTX
Latency for remote accesses - Short time in node by-pass FIFOs
- Few "hops" on an average access patterns
- Only one ot two dimension switch delays worst-case for 2-D or 3-D Torus topologies
Link Speed and capacity - 4 lane SerDes 4Gb/s per link, 6 links = 96Gb/s = 9.6GBytes/s x2 = 19.2GB/s
- Average throughput on a ring is about 1.6 times unidirectional link speed with random access patterns, total for 6 links = 30.7GBytes/s (multiple senders can be active simultaneously)
Remote Cache (RMC) - 2 or 4 GBytes per node, configurable
- System Performance expected to be more dependent on large size rather than faster access time => use of DRAM
- RMC access time will be close to neighbor CPU node-local memory access time
Address Range - 12 bits Node ID = 4k nodes max. (Multiple sockets per node possible)
- 48 bits address (256 Terabytes)
- Local Node address range:
|
|