Note

This article has been written back in 2011 and published on Juku.it, this was the SSD situation at the time, something has changed since then, yet I still believe that this is still actual and provides many insights on how SSDs work internally.

How SSDs work

First things first: there are a number of different devices called SSDs today, but I will focus only on NAND-based SSDs that are drop-in replacement for traditional hard drives commonly found in storage systems and even consumer devices.

What is NAND?

NAND is a nonvolatile solid state memory. Nonvolatile memory has the capability to hold and store data even when the power is turned off, it stores data in a large array of transistors. SLC (Single-level cells) transistors can store one bit of data while MLC (Multi-level cells) can store two or more bits of data in each cell. Compared to traditional NOR Flash memory, NAND Flash memory can pack a greater number of storage cells in a given area of silicon. This gives NAND Flash density and cost advantages over other nonvolatile memory. NAND achieves these advantages by sharing some of the common areas of the storage transistor, which creates strings of serially connected transistors (in NOR devices, each transistor stands alone). This serial cell architecture explains the device name: NAND (not AND) is the boolean logic reference to how information is read out of these cells.

SLC vs. MLC

Traditional, single-level cell (SLC) NAND Flash memory stores one bit of information per memory cell. This basic technology enables faster transfer speeds, lower power consumption, and increased endurance. Multiple-level cell (MLC) NAND, by comparison, stores two bits of information per memory cell, effectively doubling the amount of data that can be stored in a similar-size NAND Flash device, but that comes at a cost. As you can clearly see from the table below, MLC cells are subjected to a shorter lifespan and worse performance when compared to SLC cells:

Features MLC SLC
Density 32/64mbit 16mbit
Bits per Cell 2 1
Voltage 3.3V 3.3V, 1.8V
Page Size 2 / 4 K 2 K
Erase / Program Cycles < 10.000 < 100.000
Read Performance 50 µs 25 µs
Write Performance 600 / 900 µs 200 / 300 µs
Erase Performance 3 ms 1.5 / 2 ms

Until last year, SLC were the only enterprise grade SSD drives available on the market (notably the STEC ZeusIOPS), but thanks to the technology advancements, MLC drives recently made their appearance in the enterprise market with a new standard called E-MLC (enterprise-grade multilevel cell).

How SSDs operate internally

SSDs are subjected to many quirks and nuances from their NAND cells, for instance, NAND must be erased an entire block at a time (an operation that takes nearly 2,000µs) and a write (or program) must be to an erased block.

This leads to a phenomenon called Write Amplification, because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once.

This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.

When data is first written to an SSD, the cells all start in an erased state so data can be written directly using pages at a time (often 4-8 KB in size). The SSD controller on the SSD, which manages the Flash memory and interfaces with the host system, uses a logical to physical mapping system known as LBA (logical block addressing) and that is part of the Flash Abstraction Layer (more on that later).

When new data comes in replacing older data already written, the SSD controller will write the new data in a new location and update the logical mapping to point to the new physical location. The old location is no longer holding valid data, but it will eventually need to be erased before it can be written again.

FAL to the rescue!

The Flash Abstraction Layer (FAL) provides a high-level abstraction of the physical organization of NAND Flash memory devices. It emulates the rewriting of sectors in hard disks by remapping new data to another location in the memory array and marking the previous sector invalid.

To better wrap your head around this concept think of it as something that resembles a database log or the NetApp WAFL (Write Anywhere File Layout).

Tipically the SSD is Overprovisioned by the manufacturer, meaning that there is a difference between the physical capacity of the Flash memory and the logical capacity available for the user, this additional space is used by the FAL modules which normally are included in every SSD controller:

Translation Module

The Translation module, which is the primary interface in the FAL, provides the translation from virtual to physical addresses and converts the logical operations into physical operations on the Flash memory device. It also handles the exporting of all operations available on storage media (for example: write sector, read sector and format partition).

Wear Leveling Module

The Wear Leveling module ensures that the memory array is used uniformly by monitoring and evenly distributing the number of erase cycles per block. Each time a block is requested by the Translation module, the Wear Leveling module allocates the least used block. The “Program/Erase cycles” are the number of possible Write/Erase operations on a block that for a single level cell (SLC) NAND device it is equal to 100,000 cycles.

Garbage Collection module

As the FAL emulates rewriting sectors in hard disks by remapping new data to another location of the memory array and marking the previous sector invalid, eventually it may be necessary to free some of the invalid memory space to allow further data to be written. To do this, the FAL implements the Garbage Collection module, where the valid sectors are copied into a new free area and the old area is erased.

Bad Block management

The Bad Block Management module determines how to set a block as bad. Bad Blocks are blocks that contain one or more invalid bits whose reliability is not guaranteed. Bad Blocks may be present when the device is shipped, or may develop during the lifetime of the device. The Bad Block Management module hides the bad blocks from the FAL, preventing the FAL from accessing them.

Update from 2014…

After 3 years, not much has changed, the consumer market has seen a flooding of TLC (multi-level cell SSDs that store 3 bits on every cell), and improvements have been made to all the components of the FAL stack, SLC SSDs are still around but regular MLC are starting to get traction in the “enterprise” storage segment, especially with storage startups who are building their data layouts with flash in mind, easing the work of the garbage collector in the SSD, sometimes even replacing the SSD firmware with a more straightforward one and moving the FAL directly into the storage array.

Fabio Rapposelli Picture

About the author...

  StorageFundamentals

Comments