[BUG] Possible silent data corruption in filesystems/page cache

From: Barczak, Mariusz
Date: Wed Jun 01 2016 - 05:52:05 EST


Hi

We run data validation test for buffered workload on filesystems:
ext3, ext4, and XFS.
In context of flushing page cache block device driver returned IO error.
After dropping page cache our validation tool reported data corruption.

We provided a simple patch in order to inject IO error in device mapper.
We run test to verify md5sum of file during IO error.
Test shows checksum mismatch.

Attachments:
0001-drivers-md-dm-add-error-injection.patch - device mapper patch
dm-test.txt - validation test script

Regards,
Mariusz Barczak
Intel Technology Poland

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

Attachment: 0001-drivers-md-dm-add-error-injection.patch
Description: 0001-drivers-md-dm-add-error-injection.patch

#!/bin/bash

DISK=/dev/vda

FILE_SIZE=50

PART_NUM=10
PART_GB=2

prepare_test() {
parted -s $DISK mktable gpt
for i in `seq 1 1 $PART_NUM`; do
parted -s -a optimal $DISK mkpart primary \
$(( PART_GB * (i - 1) ))GiB $(( PART_GB * i ))GiB
done

for i in `seq 1 1 $PART_NUM`; do
SIZE=`blockdev --getsize ${DISK}${i}`
dmsetup create linear${i} --table \
"0 ${SIZE} linear ${DISK}${i} 0"
done

sleep 2

for i in `seq 1 1 $PART_NUM`; do
mkfs.xfs -f /dev/dm-$(( i - 1 ))
done

for i in `seq 1 1 $PART_NUM`; do
mkdir /mnt/p${i}
done

for i in `seq 1 1 $PART_NUM`; do
mount /dev/dm-$(( i - 1 )) /mnt/p${i}
done
}

cleanup_test() {
for i in `seq 1 1 $PART_NUM`; do
umount /mnt/p${i}
done

for i in `seq 1 1 $PART_NUM`; do
rm -rf /mnt/p${i}
done

for i in `seq 1 1 $PART_NUM`; do
dmsetup remove linear${i}
done
}

inject_error() {
while true; do
echo "Error = 0"
echo 0 > /sys/kernel/debug/dm_debug/error
echo 0 > /sys/kernel/debug/dm_debug/error_counter
echo 100 > /sys/kernel/debug/dm_debug/error_max
sleep 1
echo "Drop caches"
echo 1 > /proc/sys/vm/drop_caches
sleep 1
echo "Error = 1"
echo 1 > /sys/kernel/debug/dm_debug/error
sleep 3
done
}

inject_stop() {
echo 0 > /sys/kernel/debug/dm_debug/error
}

run_test() {
truncate -s 0 md5sum.log
inject_error &
PID=$!
for i in `seq 1 1 $PART_NUM`; do
dd if=/dev/urandom bs=1M count=$FILE_SIZE of=/mnt/p${i}/file
md5sum /mnt/p${i}/file >> md5sum.log
done
kill $PID
inject_stop
md5sum -c md5sum.log
}

prepare_test
run_test
cleanup_test