Re: [PATCH] edac: Improve the comments to better describe the memoryconcepts

From: Borislav Petkov
Date: Wed Mar 07 2012 - 09:24:34 EST


On Wed, Mar 07, 2012 at 08:37:04AM -0300, Mauro Carvalho Chehab wrote:
> The memory terms changed along the time,

I think you mean something along the lines of: "Computer memory terminology has
changed with time... "

> since when EDAC were originally written:

either "since EDAC drivers were originally written" or "since EDAC was
originally written"

> new concepts were introduced, and some things have different
> meanings, depending on the memory architecture.
>
> Better define those terms.

Better: "Improve the definition of all related terms."

>
> As some concepts are different for memory controllers used by certain
> memory stick types, add a per-memory type comment, and add a more verbose
> description for each memory type.

Simplify: "Also, describe each memory type in a more detailed fashion."
But, see my comment below.

> No functional changes. Just comments were touched.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
> ---
> include/linux/edac.h | 182 +++++++++++++++++++++++++++++++++++++------------
> 1 files changed, 137 insertions(+), 45 deletions(-)
>
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 055b248..96f996f 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -66,25 +66,91 @@ enum dev_type {
> #define DEV_FLAG_X32 BIT(DEV_X32)
> #define DEV_FLAG_X64 BIT(DEV_X64)
>
> -/* memory types */
> +/**
> + * enum mem_type - memory types
> + *
> + * @MEM_EMPTY Empty csrow
> + * @MEM_RESERVED: Reserved csrow type
> + * @MEM_UNKNOWN: Unknown csrow type
> + * @MEM_FPM: Fast page mode
> + * An old asyncronous memory technology, encapsulated as
> + * SIMM, popularly used between 1987-1995.
> + * @MEM_EDO: Extended data out
> + * Asynchronous memory technology used on early Pentium's
> + * (1995-1998).
> + * @MEM_BEDO: Burst Extended data out
> + * EDO memories with performance improvement gains.
> + * @MEM_SDR: Single data rate SDRAM
> + * First types of syncronous ram specified by JEDEC
> + * popular between 1998-2002. The JEDEC standards defined
> + * 3 types of sticks: PC66, PC100 and PC133. There were
> + * also some non-official overclocked sticks like PC150
> + * and PC166.
> + * There are 3 pins for chip select: Pins 0 and 2 are
> + * for rank 0; pins 1 and 3 are for rank 1, if the memory
> + * is dual-rank.
> + * @MEM_RDR: Registered single data rate SDRAM
> + * @MEM_DDR: Double data rate SDRAM, used between 2002 and 2005.
> + * The JEDEC standards defined sticks PC1600, PC2100,
> + * PC2700 and PC3200. Non-official overclocked sticks
> + * also exists.
> + * On DDR memories, there are one or two channels. A
> + * single-channel mode means that one x72 ECC dimm is
> + * accessed, in order to provide 64 bits of data. On
> + * dual-channel mode, two dimm's are used simultaneously,
> + * in order to provide a 128 bits of data. The "cschannel"
> + * concept used on EDAC refers to such channel type.
> + * @MEM_RDDR: Registered Double data rate SDRAM
> + * This is a variant of the DDR memories.
> + * A registered memory has a buffer inside it, hiding
> + * part of the memory details to the memory controller.
> + * @MEM_RMBS: Rambus DRAM
> + * Rambus uses a high-speed multi-drop serial bus to
> + * communicate with each RDRAM chip, used on some
> + * machines between 2000-2002 (Pentium III and IV).
> + * @MEM_DDR2: DDR2 RAM, as described at JEDEC JESD79-2F.
> + * Those memories are labed as "PC2-" instead of "PC" to
> + * differenciate from DDR.
> + * @MEM_FB_DDR2: Fully-Buffered DDR2, as described at JEDEC Std No. 205
> + * and JESD206.
> + * A FB-DIMM channel consists of 14 unidirectional signal
> + * pairs (Northbound path) from the memories to the MC
> + * and 10 unidirectional signal pairs (Southbond path)
> + * from the MC to the DIMM's.
> + * those memories are x72 ECC DIMMs. Up to 8 DIMMs can
> + * be connected per channel. When used with 128 bits,
> + * two channels are needed. The grouping of those two
> + * channels is called "branch".
> + * @MEM_RDDR2: Registered DDR2 RAM
> + * This is a variant of the DDR2 memories.
> + * @MEM_XDR: Rambus XDR
> + * It is an evolution of the original RAMBUS memories,
> + * created to compete with DDR2. Weren't used on any
> + * x86 arch, but cell_edac PPC memory controller uses it.
> + * @MEM_DDR3: DDR3 RAM
> + * Those memories are labed as "PC3-" to differenciate
> + * from DDR and DDR2.
> + * @MEM_RDDR3: Registered DDR3 RAM
> + * This is a variant of the DDR3 memories.
> + */

I don't think the kernel source should contain such verbose comments.
Wikipedia contains much more info: http://en.wikipedia.org/wiki/DRAM
which we simply cannot carry in comments even if we wanted to so best it
would be if this enum only maps the actual enum names to the names on
the wikipedia page so that interested entities can look them up. Then,
the comment should contain the wikipedia URL, of course.

> enum mem_type {
> - MEM_EMPTY = 0, /* Empty csrow */
> - MEM_RESERVED, /* Reserved csrow type */
> - MEM_UNKNOWN, /* Unknown csrow type */
> - MEM_FPM, /* Fast page mode */
> - MEM_EDO, /* Extended data out */
> - MEM_BEDO, /* Burst Extended data out */
> - MEM_SDR, /* Single data rate SDRAM */
> - MEM_RDR, /* Registered single data rate SDRAM */
> - MEM_DDR, /* Double data rate SDRAM */
> - MEM_RDDR, /* Registered Double data rate SDRAM */
> - MEM_RMBS, /* Rambus DRAM */
> - MEM_DDR2, /* DDR2 RAM */
> - MEM_FB_DDR2, /* fully buffered DDR2 */
> - MEM_RDDR2, /* Registered DDR2 RAM */
> - MEM_XDR, /* Rambus XDR */
> - MEM_DDR3, /* DDR3 RAM */
> - MEM_RDDR3, /* Registered DDR3 RAM */
> + MEM_EMPTY = 0,
> + MEM_RESERVED,
> + MEM_UNKNOWN,
> + MEM_FPM,
> + MEM_EDO,
> + MEM_BEDO,
> + MEM_SDR,
> + MEM_RDR,
> + MEM_DDR,
> + MEM_RDDR,
> + MEM_RMBS,
> + MEM_DDR2,
> + MEM_FB_DDR2,
> + MEM_RDDR2,
> + MEM_XDR,
> + MEM_DDR3,
> + MEM_RDDR3,
> };
>
> #define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
> @@ -162,8 +228,9 @@ enum scrub_type {
> #define OP_OFFLINE 0x300
>
> /*
> - * There are several things to be aware of that aren't at all obvious:
> + * Concepts used at the EDAC subsystem
> *
> + * There are several things to be aware of that aren't at all obvious:
> *
> * SOCKETS, SOCKET SETS, BANKS, ROWS, CHIP-SELECT ROWS, CHANNELS, etc..
> *
> @@ -172,36 +239,61 @@ enum scrub_type {
> * creating a common ground for discussion, terms and their definitions
> * will be established.
> *
> - * Memory devices: The individual chip on a memory stick. These devices
> - * commonly output 4 and 8 bits each. Grouping several
> - * of these in parallel provides 64 bits which is common
> - * for a memory stick.
> + * Memory devices: The individual DRAM chips on a memory stick. These
> + * devices commonly output 4 and 8 bits each (x4, x8).
> + * Grouping several of these in parallel provides the
> + * number of bits that the memory controller expects:
> + * typically 72 bits, in order to provide 64 bits of ECC
> + * corrected data.

...in order to provide 64 bits + 8 bits
of ECC data.


ECC checking and correction is done in the memory/DRAM controller.

> *
> * Memory Stick: A printed circuit board that aggregates multiple
> - * memory devices in parallel. This is the atomic
> - * memory component that is purchaseable by Joe consumer
> - * and loaded into a memory socket.
> + * memory devices in parallel. In general, this is the
> + * Field Replaceable Unit (FRU) that the final consumer

Linux is hopefully still not officially "sponsored" by a single company which
knows only "customers" so let's change that to:

... (FRU) which gets replaced, in
the case of excessive errors.

> + * cares to replace. It is typically encapsulated as DIMMs

... Most often it is also called DIMM (Dual Inline
Memory Module).

> *
> * Socket: A physical connector on the motherboard that accepts
> - * a single memory stick.
> + * a single memory stick. Also called as "slot" on several
> + * datasheets.

I would change that name to "Memory Socket" because "socket" is most
often used for the physical processor package connecting to the
mainboard.

> *
> - * Channel: Set of memory devices on a memory stick that must be
> - * grouped in parallel with one or more additional
> - * channels from other memory sticks. This parallel
> - * grouping of the output from multiple channels are
> - * necessary for the smallest granularity of memory access.
> - * Some memory controllers are capable of single channel -
> - * which means that memory sticks can be loaded
> - * individually. Other memory controllers are only
> - * capable of dual channel - which means that memory
> - * sticks must be loaded as pairs (see "socket set").
> + * Channel: A memory controller channel, responsible to communicate
> + * with a group of DIMM's. Each channel has its own

DIMMs.

> + * independent control (command) and data bus, and can
> + * be used independently or grouped.

with other channels.
> *
> - * Chip-select row: All of the memory devices that are selected together.
> - * for a single, minimum grain of memory access.
> - * This selects all of the parallel memory devices across
> - * all of the parallel channels. Common chip-select rows
> - * for single channel are 64 bits, for dual channel 128
> - * bits.
> + * Branch: Not all memory controllers have it.

^^^ No need for that sentence.

> It is the highest
> + * hierarchy on a Fully-Buffered DIMM memory controller.
> + * Typically, it contains two channels.
> + * Two channels at the same branch can be used in single
> + * mode or in lockstep mode.
> + * When lockstep is enabled, the cache line is higher,

What does "the cache line is higher" mean? You wanna rather say "the
cacheline is doubled"?

> + * but it generally brings some performance penalty.
> + * Also, it is generally not possible to point to just one
> + * memory stick when an error occurs, as the error
> + * correction code is calculated using two dimms instead

DIMMs

> + * of one. Due to that, it is capable of correcting more
> + * errors than on single mode.

.. in ..
> + *
> + * Single-channel: The data accessed by the memory controller is contained
> + * into one dimm only. E. g. if the data is 64 bits-wide,

DIMM bits wide

> + * the data flows to the CPU using one 64 bits parallel
> + * access.
> + * Typically used with SDR, DDR, DDR2 and DDR3 memories.
> + * FB-DIMM and RAMBUS use a different concept for channel,
> + * so this concept doesn't apply there.
> + *
> + * Double-channel: The data size accessed by the memory controller is
> + * interlaced into two dimms, accessed at the same time.

DIMMs

> + * E. g. if the DIMM is 64 bits-wide (72 bits with ECC),
> + * the data flows to the CPU using a 128 bits parallel
> + * access.
> + *
> + * Chip-select row: This is the name of the DRAM signal used to select the
> + * DRAM chips to be used. It may not be visible by the
> + * memory controller, as some DIMM types have a memory
> + * buffer that can hide direct access to it from the
> + * Memory Controller.

No need for the FBDIMM reference - simply keep the original comment
of single channel/dual channel width and that's it. You can add a
sentence at the end of this definition stating that there are memory
configurations/technologies which do not use chip selects or hide chip
selects behind buffering logic.

> On devices where it is visible, it
> + * selects the DIMM rank that it is accessed by the
> + * memory controller.
> *
> * Single-Ranked stick: A Single-ranked stick has 1 chip-select row of memory.
> * Motherboards commonly drive two chip-select pins to
> @@ -214,8 +306,8 @@ enum scrub_type {
> *
> * Double-sided stick: DEPRECATED TERM, see Double-Ranked stick.
> * A double-sided stick has two chip-select rows which
> - * access different sets of memory devices. The two
> - * rows cannot be accessed concurrently. "Double-sided"
> + * access different sets of memory devices. The two
> + * rows cannot be accessed concurrently. "Double-sided"
> * is irrespective of the memory devices being mounted
> * on both sides of the memory stick.
> *
> --
> 1.7.8
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/