[PATCH v2 0/7] rslib: RS decoder is severely broken

From: Ferdinand Blomqvist
Date: Thu Jun 20 2019 - 10:16:08 EST


This is the second revision of this series that fixes the bugs/flaws in
the RS decoder. This addresses all of Thomas' comments/suggestions.

Changes in v2:
- Replace code that requires the FPU (one small thing in test_rslib.c)
- More verbose changelog (patch 2/7)
- Clarifying comment (patch 7/7)
- Fixed some small whitespace issues (patches 1 and 5)

------

The Reed_Solomon library used in the kernel is based on Phil Karn's
fec library. When playing with this library I found a couple of bugs. It
turn out that all of these bugs, and some additional flaws, are present
in rslib.

The Reed-Solomon decoder has several bugs/flaws:

- Decoding of shortened codes is broken. The decoder only works as
expected if there are no erasures.

- The decoder sometimes fails silently, i.e. it announces success but
returns a word that is not a codeword.

- The return value of the decoder is incoherent with respect to how
fixed erasures are counted. If the word to be decoded is a codeword,
then the decoder always returns zero even if some erasures are given.
On the other hand, if the word to be decoded contains errors, then the
number of erasures is always included in the count of corrected
symbols. So the decoder handles erasures without symbol corruption
inconsistently. This inconsistency probably doesn't affect anyone
using the decoder, but it is inconsistent with the documentation.

- The error positions returned in eras_pos include all erasures, but the
corrections are only set in the correction buffer if there actually is
a symbol error. So if there are erasures without symbol corruption,
then the correction buffer will contain errors (unless initialized to
zero before calling the decoder) or some values will be unset (if the
correction buffer is uninitialized).

- Assumes that the syndromes provided by callers are in index form.
This is simply undocumented.

- When correcting data in-place the decoder does not correct errors in
the parity. On the other hand, when returning the errors in correction
buffers, errors in the parity are included.

This series provides a module with tests for rslib and fixes all the
bugs/flaws. I am not sure that the provided self tests are written in
the 'right way'. I just looked at other self tests in lib and
implemented something similar.

The fixes are tested with the self tests. They should probably also be
tested with drivers etc. that use rslib, but it is unclear to me how to
do this.

I looked a bit on two of the drivers that use rslib:

drivers/mtd/nand/raw/cafe_nand.c
drivers/mtd/nand/raw/diskonchip.c

Both of them seem to do some additional error checking after calling
decode_rs16. Maybe this is needed because of the bugs in the decoder?

Ferdinand Blomqvist (7):
rslib: Add tests for the encoder and decoder
rslib: Fix decoding of shortened codes
rslib: decode_rs: Fix length parameter check
rslib: decode_rs: code cleanup
rslib: Fix handling of of caller provided syndrome
rslib: Update documentation
rslib: Fix remaining decoder flaws

lib/Kconfig.debug | 12 +
lib/reed_solomon/Makefile | 2 +-
lib/reed_solomon/decode_rs.c | 115 +++++--
lib/reed_solomon/reed_solomon.c | 12 +-
lib/reed_solomon/test_rslib.c | 516 ++++++++++++++++++++++++++++++++
5 files changed, 622 insertions(+), 35 deletions(-)
create mode 100644 lib/reed_solomon/test_rslib.c

--
2.17.2