RE: [PATCH blktests] loop: Test to detect a race condition between loop detach and open
From: Gulam Mohamed
Date: Mon May 27 2024 - 05:56:49 EST
Hi,
> -----Original Message-----
> From: Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> Sent: Thursday, May 23, 2024 4:45 PM
> To: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>
> Cc: linux-block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> axboe@xxxxxxxxx; chaitanyak@xxxxxxxxxx; hch@xxxxxx
> Subject: Re: [PATCH 2/2] loop: Test to detect a race condition between loop
> detach and open
>
> On May 21, 2024 / 22:42, Gulam Mohamed wrote:
> > When one process opens a loop device partition and another process
> > detaches it, there will be a race condition due to which stale loop
> > partitions are created causing IO errors. This test will detect the
> > race
>
> Hello Gulam, thanks for the patch. I ran the new test case on my system and
> observed it failed. It looks working as expected.
>
> This is the patch not for Linux kernel but for blktests, then it's the better to
> separate it out. I suggest to add the subject title prefix "[PATCH blktests]"
> to this patch so that reviewers can tell the target repository easily.
>
Done. Changed the subject line prefix to "[PATCH blktests]"
> I suggest to take a look in the "new" script. It has some guides for blktests,
> such as "Indent with tabs" or the global variable "TMPDIR". Also, please run
> "make check" which runs shellcheck.
>
> $ make check
> shellcheck -x -e SC2119 -f gcc check common/* \
> tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
> tests/loop/010:23:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
> tests/loop/010:32:17: note: Instead of '[ 1 ]', use 'true'. [SC2161]
> tests/loop/010:84:7: note: Double quote to prevent globbing and word
> splitting. [SC2086]
> make: *** [Makefile:21: check] Error 1
>
> Please address the shellcheck warnings above. It will help us to keep script
> quality at some level.
Done. Resolved the warnings
>
> >
> > Signed-off-by: Gulam Mohamed <gulam.mohamed@xxxxxxxxxx>
> > ---
> > tests/loop/010 | 90
> ++++++++++++++++++++++++++++++++++++++++++++++
> > tests/loop/010.out | 2 ++
> > 2 files changed, 92 insertions(+)
> > create mode 100755 tests/loop/010
> > create mode 100644 tests/loop/010.out
> >
> > diff --git a/tests/loop/010 b/tests/loop/010 new file mode 100755
> > index 000000000000..ea93a120d51a
> > --- /dev/null
> > +++ b/tests/loop/010
> > @@ -0,0 +1,90 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2024, Oracle and/or its affiliates.
> > +#
> > +# Test to detect a race between loop detach and loop open which
> > +creates # stale loop partitions when one process opens the loop
> > +partition and # another process detaches the loop device # .
> > +tests/loop/rc DESCRIPTION="check stale loop partition"
> > +TIMED=1
> > +
> > +requires() {
> > + _have_program parted
> > + _have_program mkfs.xfs
> > +}
> > +
> > +image_file="/tmp/loopImg"
>
> $TMPDIR is recommended instead of /tmp. $TMPDIR is cleaned up after test
> run.
Done. Using $TMPDIR now.
>
> > +line1="partition scan of loop0 failed (rc=-16)"
> > +
> > +function create_loop()
>
> Nit: the "function" keyword is not used in blktests (there is one exception
> though).
>
Done. Removed the keyword "function" now
> > +{
> > + while [ 1 ]
> > + do
> > + loop_device="$(losetup -P -f --show "${image_file}")"
> > + blkid /dev/loop0p1 >> /dev/null 2>&1
> > + done
> > +}
> > +
> > +function detach_loop()
> > +{
> > + while [ 1 ]
> > + do
> > + if [ -e /dev/loop0 ]; then
> > + losetup -d /dev/loop0 > /dev/null 2>&1
> > + fi
> > + done
> > +}
> > +
> > +test() {
> > + echo "Running ${TEST_NAME}"
> > + local failure="/tmp/failure"
>
> Nit: some local variables are declared with "local" keyword, and some other
> are not.
Done. Using the "local" keyword for all local variables now
>
> > + touch $failure
> > + echo 0 > $failure
>
> Why the file is required to keep the fail/pass status? Just a bash variable looks
> enough.
In the beginning I tried with a bash variable but it was not working as the variable need to be accessed by other two child threads also.
>
> > + dmesg -c > /dev/null 2>&1
>
> This kernel ring buffer clear will likely affect test harnesses which run blktests.
> To check the kernel message generated during each test case run, please use
> _dmesg_since_test_start(): ref nbd/004, nvme/003.
Done. Used the above function
>
> > +
> > + truncate -s 2G "${image_file}"
>
> Other test cases in loop group have 1G file size. Is 2G required for this test?
> If not, 1G size would be the better to reduce disk shortage risk.
>
Using 1G is not an issue. Done
> > + parted -a none -s "${image_file}" mklabel gpt
> > + loop_device="$(losetup -P -f --show "${image_file}")"
> > + parted -a none -s "${loop_device}" mkpart primary 64s 109051s
> > +
> > + udevadm settle
> > + if [ ! -e "${loop_device}" ]; then
> > + return 1
> > + fi
> > +
> > + mkfs.xfs -f "${loop_device}p1" > /dev/null 2>&1
> > +
> > + losetup -d "${loop_device}" > /dev/null 2>&1
> > +
> > + create_loop &
> > + create_pid=$!
> > + detach_loop &
> > + detach_pid=$!
> > +
> > + sleep "${TIMEOUT:-180}"
>
> The default 180 seconds sounds a bit long. How about 30 seconds?
Using 180 seconds is just to give more time to reproduce the issue as 30 seconds may not be sufficient for the race condition to occur
Will send the version 2 as soon as possible, with all these changes
>
> > + {
> > + kill -9 $create_pid
> > + kill -9 $detach_pid
> > + wait
> > + sleep 1
> > + } 2>/dev/null
> > +
> > + losetup -D > /dev/null 2>&1
> > + dmesg | while IFS= read -r line2
> > + do
> > + match=$(echo "$line2" | grep -o "$line1")
> > + if [ "$line1" == "$match" ]; then
> > + echo 1 > "/tmp/failure"
> > + break
> > + fi
> > + done
> > + failed=$(cat $failure)
> > + if [ $failed -eq 0 ]; then
> > + echo "Test complete"
> > + else
> > + echo "Test failed"
> > + fi
> > + rm -f $failure
> > +}
> > diff --git a/tests/loop/010.out b/tests/loop/010.out new file mode
> > 100644 index 000000000000..64a6aee00b8a
> > --- /dev/null
> > +++ b/tests/loop/010.out
> > @@ -0,0 +1,2 @@
> > +Running loop/010
> > +Test complete
> > --
> > 2.39.3
> >