Re: [PATCH v2 1/3] kunit: Always run cleanup from a test kthread
From: Benjamin Berg
Date: Thu Apr 20 2023 - 03:35:11 EST
Hi David,
On Wed, 2023-04-19 at 16:54 +0800, David Gow wrote:
> KUnit tests run in a kthread, with the current->kunit_test pointer set
> to the test's context. This allows the kunit_get_current_test() and
> kunit_fail_current_test() macros to work. Normally, this pointer is
> still valid during test shutdown (i.e., the suite->exit function, and
> any resource cleanup). However, if the test has exited early (e.g., due
> to a failed assertion), the cleanup is done in the parent KUnit thread,
> which does not have an active context.
>
> Instead, in the event test terminates early, run the test exit and
> cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
> and better isolates the rest of KUnit from issues which arise in test
> cleanup.
>
> If a test cleanup function itself aborts (e.g., due to an assertion
> failing), there will be no further attempts to clean up: an error will
> be logged and the test failed.
>
> This should also make it easier to get access to the KUnit context,
> particularly from within resource cleanup functions, which may, for
> example, need access to data in test->priv.
>
> Signed-off-by: David Gow <davidgow@xxxxxxxxxx>
> ---
> This is an updated version of / replacement of "kunit: Set the current
> KUnit context when cleaning up", which instead creates a new kthread
> for cleanup tasks if the original test kthread is aborted. This protects
> us from failed assertions during cleanup, if the test exited early.
>
> Changes since v1:
> https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@xxxxxxxxxx/
> - Move cleanup execution to another kthread
> - (Thanks, Benjamin, for pointing out the assertion issues)
Nice, I think this is looking promising. After thinking about it a bit,
maybe one thing to improve is to always start the new cleanup kthread
from kunit_run_case_catch_errors.
That way there is only one codepath. But, more importantly, it means if
the cleanup fails the first time, we do not risk running it a second
time and we get slightly nicer error reporting. Not that this happening
would be a big issue.
Benjamin
> ---
> lib/kunit/test.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index e2910b261112..caeae0dfd82b 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -423,8 +423,51 @@ static void kunit_try_run_case(void *data)
> kunit_run_case_cleanup(test, suite);
> }
>
> +static void kunit_try_run_case_cleanup(void *data)
> +{
> + struct kunit_try_catch_context *ctx = data;
> + struct kunit *test = ctx->test;
> + struct kunit_suite *suite = ctx->suite;
> +
> + current->kunit_test = test;
> +
> + kunit_run_case_cleanup(test, suite);
> +}
> +
> +static void kunit_catch_run_case_cleanup(void *data)
> +{
> + struct kunit_try_catch_context *ctx = data;
> + struct kunit *test = ctx->test;
> + int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
> +
> + /* It is always a failure if cleanup aborts. */
> + kunit_set_failure(test);
> +
> + if (try_exit_code) {
> + /*
> + * Test case could not finish, we have no idea what state it is
> + * in, so don't do clean up.
> + */
> + if (try_exit_code == -ETIMEDOUT) {
> + kunit_err(test, "test case cleanup timed out\n");
> + /*
> + * Unknown internal error occurred preventing test case from
> + * running, so there is nothing to clean up.
> + */
> + } else {
> + kunit_err(test, "internal error occurred during test case cleanup: %d\n",
> + try_exit_code);
> + }
> + return;
> + }
> +
> + kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
> +}
> +
> +
> static void kunit_catch_run_case(void *data)
> {
> + struct kunit_try_catch cleanup;
> struct kunit_try_catch_context *ctx = data;
> struct kunit *test = ctx->test;
> struct kunit_suite *suite = ctx->suite;
> @@ -451,9 +494,16 @@ static void kunit_catch_run_case(void *data)
>
> /*
> * Test case was run, but aborted. It is the test case's business as to
> - * whether it failed or not, we just need to clean up.
> + * whether it failed or not, we just need to clean up. Do this in a new
> + * try / catch context, in case it asserts, too.
> */
> - kunit_run_case_cleanup(test, suite);
> + kunit_try_catch_init(&cleanup,
> + test,
> + kunit_try_run_case_cleanup,
> + kunit_catch_run_case_cleanup);
> + ctx->test = test;
> + ctx->suite = suite;
> + kunit_try_catch_run(&cleanup, ctx);
> }
>
> /*