Industry Article

Hybrid Memory Cubes: What They Are and How They Work

January 22, 2019 by Ivan Kuten, Promwad

In this article, the engineering team at Promwad examines hybrid memory cubes (HMCs), which can provide a 15-fold increase in performance with up to a 70% energy savings per bit compared to DDR3 DRAM.

In this article, the engineering team at Promwad examines hybrid memory cubes (HMCs), which provide a 15-fold increase in performance with 70% energy savings per bit compared to DDR3 DRAM.

While DDR4 and DDR5 represent an evolution of the standard, HMC is a memory technology that could affect the fields of specialized high-performance computing and consumer electronics, such as tablets and graphics cards, where form factor is important as well as energy efficiency and throughput.
 

HMC Architecture and Devices

HMCs consist of several layers connected by silicon. The upper layers are DRAM-memory crystals, the lower layer is a controller that controls the transfer of data.

The figure below shows the internal structure of the HMC chip:

 

internal structure of the HMC chip

The internal structure of an HMC.

 

HMC is used where speed and a small number of chips are necessary for the required amount of memory. HMC chips can be combined into a consistent chain of up to eight pieces. Chips are available in capacities of 2 GB and 4 GB. Data is transmitted via serial interfaces at a speed of 15 Gbit/s per line; the total number of lines can be from 32 to 64. Thus, the theoretical bandwidth can reach 240 Gbit/s, but is limited by the bandwidth of a DRAM chip at 160 Gbit/s.

The table below shows the consumption per data bit:
 

Table 1. HMC Comparison Chart, DDR4 (First Generation, 4 + 1 Memory Configuration)

HMC Comparison Chart, DDR4 (First Generation, 4 + 1 Memory Configuration)

Similar Memory Technologies

In addition to the HMC, there are several similar technologies available from other development companies.

 

Bandwidth Engine (BE) from MoSys

Bandwidth engine (BE) from MoSys is a chip designed to replace QDR-memory, works like SRAM. It uses serial transceivers at speeds up to 16 Gbit/s. The purpose of this type of memory is to provide a low latency buffer to store packet headers or look-up tables instead of storing whole packets.

 

Ternary Content Addressable Memory

Ternary content addressable memory (TCAM) is a special high-speed memory used in routers and network switches. TCAM comes at a higher price. High performance is achieved due to high power consumption. Data transfer is carried out in parallel. 

 

High Bandwidth Memory

High bandwidth memory (HBM) is a type of memory developed by Samsung. It is not available in the form of chips: if a hardware engineer wants to use this memory for electronic device design and manufacturing, they must contact the company to make them a silicon substrate to integrate into the user's chip. This memory is similar to a DDR and does not use serial transceivers for data transmission.

HMC Connection Examples

Physically, data is transmitted to the HMC sequentially over a SerDes interface at a speed of 15 Gbit/s. Soon there will be chips with a speed of 30 Gbit/s. Sixteen lines are combined into one logical channel. The channel can operate both in full-channel and in half-channel mode (8 lines are used). Usually, HMCs are available with 2 or 4 channels. Each channel can be both master and intermediate. Intermediate modes are used when it is necessary to combine several chips in a chain. The processor must configure each HMC chip.

Below is an example of combining HMC chips in a chain.

 

HMC chips in a chain

 

Another connection type is combining HMC chips with a star, with the possibility of multi-host mode. Below is an example:

 

Combining HMC chips with a star

Transmission of Data Over a Logical Channel

Below is an example of a channel transfer structure:

 

Channel transfer structure

Image courtesy of Micron (PDF).

 

Commands and data are transmitted in both directions using a packet protocol. Packages are made up of groups of 128 bits long, called a FLIT. They are transmitted sequentially through physical lines and then collected at the receiving side.

 

Levels of Package Service

There are three levels of package service:

  1. The physical layer provides reception, transmission, serialization and deserialization of data.
  2. The link layer provides low-level packet tracking.
  3. The transport layer determines the fields, packet headers, checks the integrity of the packets and the communication channel.


Organization of 128-bit FLIT transmission over physical lines in various modes:

Distribution of FLIT-package along the lines in full configuration (16 lines)

 

Distribution of FLIT-package along the lines in full configuration

Table courtesy of Micron.

 

Distribution of FLIT-package lines in half configuration (8 lines)

 

Distribution of FLIT-package lines in half configuration (8 lines)

Table courtesy of Micron.

Memory Addressing

The packet header contains 34 address bits, including a bank, and DRAM address. The current configuration allows you to address a maximum of 4 GB for a single chip, with the upper 2 bits being ignored, they are reserved for the future. Reading and writing data occurs with 16-byte granulation. The block size can be set to 16, 32, 64, 128 bytes.

Addressing in the HMC:

 

Table courtesy of Micron (PDF).
 

For more details on these HMC commands can be found can be found on the datasheet (PDF) provided by Micron.
 

A Typical HMC Connection to Xilinx Virtex Ultrascale FPGA and Power Requirements

Memory is connected to FPGA through GTX transceivers. You can use between 8-16 transceivers within one channel. There may be four such channels. To properly connect to FPGA transceivers, you must follow a few rules:

  • Transceivers within the channel must go in a row, skipping over the transceivers is not allowed.
  • For SSI (Stacked Silicon Interconnect) devices, the transceivers must be in the same SLR
  • FPGA banks must go in a row, skipping the banks is not allowed.


Below is a typical connection to an FPGA with two channels in full mode:

 

Typical connection to FPGA, two channels in full mode

 

More Information About HMC Technology

For a more in-depth study of this topic, you can go to the consortium of developers of HMC technology, hybridmemorycube.org, where the latest HMC version 2.1 specification is published.

Industry Articles are a form of content that allows industry partners to share useful news, messages, and technology with All About Circuits readers in a way editorial content is not well suited to. All Industry Articles are subject to strict editorial guidelines with the intention of offering readers useful news, technical expertise, or stories. The viewpoints and opinions expressed in Industry Articles are those of the partner and not necessarily those of All About Circuits or its writers.