Re: [PATCH 2/2] loop: Test to detect a race condition between loop detach and open

From: Shinichiro Kawasaki
Date: Thu May 23 2024 - 07:15:30 EST


On May 21, 2024 / 22:42, Gulam Mohamed wrote:
> When one process opens a loop device partition and another process detaches
> it, there will be a race condition due to which stale loop partitions are
> created causing IO errors. This test will detect the race

Hello Gulam, thanks for the patch. I ran the new test case on my system and
observed it failed. It looks working as expected.

This is the patch not for Linux kernel but for blktests, then it's the better to
separate it out. I suggest to add the subject title prefix "[PATCH blktests]"
to this patch so that reviewers can tell the target repository easily.

I suggest to take a look in the "new" script. It has some guides for blktests,
such as "Indent with tabs" or the global variable "TMPDIR". Also, please run
"make check" which runs shellcheck.

$ make check
shellcheck -x -e SC2119 -f gcc check common/* \
tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
tests/loop/010:84:7: note: Double quote to prevent globbing and word splitting. [SC2086]
make: *** [Makefile:21: check] Error 1

Please address the shellcheck warnings above. It will help us to keep script
quality at some level.

>
> Signed-off-by: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>
> ---
> tests/loop/010 | 90 ++++++++++++++++++++++++++++++++++++++++++++++
> tests/loop/010.out | 2 ++
> 2 files changed, 92 insertions(+)
> create mode 100755 tests/loop/010
> create mode 100644 tests/loop/010.out
>
> diff --git a/tests/loop/010 b/tests/loop/010
> new file mode 100755
> index 000000000000..ea93a120d51a
> --- /dev/null
> +++ b/tests/loop/010
> @@ -0,0 +1,90 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2024, Oracle and/or its affiliates.
> +#
> +# Test to detect a race between loop detach and loop open which creates
> +# stale loop partitions when one process opens the loop partition and
> +# another process detaches the loop device
> +#
> +. tests/loop/rc
> +DESCRIPTION="check stale loop partition"
> +TIMED=1
> +
> +requires() {
> + _have_program parted
> + _have_program mkfs.xfs
> +}
> +
> +image_file="/tmp/loopImg"

$TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test run.

> +line1="partition scan of loop0 failed (rc=-16)"
> +
> +function create_loop()

Nit: the "function" keyword is not used in blktests (there is one exception
though).

> +{
> + while [ 1 ]
> + do
> + loop_device="$(losetup -P -f --show "${image_file}")"
> + blkid /dev/loop0p1 >> /dev/null 2>&1
> + done
> +}
> +
> +function detach_loop()
> +{
> + while [ 1 ]
> + do
> + if [ -e /dev/loop0 ]; then
> + losetup -d /dev/loop0 > /dev/null 2>&1
> + fi
> + done
> +}
> +
> +test() {
> + echo "Running ${TEST_NAME}"
> + local failure="/tmp/failure"

Nit: some local variables are declared with "local" keyword, and some other are
not.

> + touch $failure
> + echo 0 > $failure

Why the file is required to keep the fail/pass status? Just a bash variable
looks enough.

> + dmesg -c > /dev/null 2>&1

This kernel ring buffer clear will likely affect test harnesses which run
blktests. To check the kernel message generated during each test case run,
please use _dmesg_since_test_start(): ref nbd/004, nvme/003.

> +
> + truncate -s 2G "${image_file}"

Other test cases in loop group have 1G file size. Is 2G required for this test?
If not, 1G size would be the better to reduce disk shortage risk.

> + parted -a none -s "${image_file}" mklabel gpt
> + loop_device="$(losetup -P -f --show "${image_file}")"
> + parted -a none -s "${loop_device}" mkpart primary 64s 109051s
> +
> + udevadm settle
> + if [ ! -e "${loop_device}" ]; then
> + return 1
> + fi
> +
> + mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
> +
> + losetup -d "${loop_device}" > /dev/null 2>&1
> +
> + create_loop &
> + create_pid=$!
> + detach_loop &
> + detach_pid=$!
> +
> + sleep "${TIMEOUT:-180}"

The default 180 seconds sounds a bit long. How about 30 seconds?

> + {
> + kill -9 $create_pid
> + kill -9 $detach_pid
> + wait
> + sleep 1
> + } 2>/dev/null
> +
> + losetup -D > /dev/null 2>&1
> + dmesg | while IFS= read -r line2
> + do
> + match=$(echo "$line2" | grep -o "$line1")
> + if [ "$line1" == "$match" ]; then
> + echo 1 > "/tmp/failure"
> + break
> + fi
> + done
> + failed=$(cat $failure)
> + if [ $failed -eq 0 ]; then
> + echo "Test complete"
> + else
> + echo "Test failed"
> + fi
> + rm -f $failure
> +}
> diff --git a/tests/loop/010.out b/tests/loop/010.out
> new file mode 100644
> index 000000000000..64a6aee00b8a
> --- /dev/null
> +++ b/tests/loop/010.out
> @@ -0,0 +1,2 @@
> +Running loop/010
> +Test complete
> --
> 2.39.3
>