Re: [PATCH printk v1] printk: ringbuffer: Add KUnit test

From: David Gow
Date: Fri Nov 22 2024 - 02:06:08 EST


On Thu, 21 Nov 2024 at 22:50, John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
>
> From: Thomas Weißschuh <thomas.weissschuh@xxxxxxxxxxxxx>
>
> The KUnit test validates the correct operation of the ringbuffer.
> A separate dedicated ringbuffer is used so that the global printk
> ringbuffer is not touched.
>
> Co-developed-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@xxxxxxxxxxxxx>
> ---
> For those not familiar with KUnit, you can easily run this test
> doing something like this:
>
> $ ./tools/testing/kunit/kunit.py run \
> --arch=x86_64 \
> --qemu_args "-enable-kvm -smp 8" \
> --kunitconfig kernel/printk
>

Awesome -- I'm glad to see this make it as a KUnit test.

It's a little unusual for a KUnit test -- particularly since it is
time-based and uses lots of threads. This isn't a problem, but it's
definitely a good thing that it's marked as slow. Additionally, KUnit
doesn't track any extra threads spawned, so it requires a bit more
care.

There are a couple of issues (e.g., it crashes on non-SMP systems, a
potential race, etc) and some minor suggestions below. In short, it'd
be a good idea to move some of the initialisation and checks into the
main test function, rather than the helper threads. Equally, it looks
like there are a bunch of variables shared between kthreads — do these
need to be checked with READ_ONCE()/WRITE_ONCE(), or made volatile, or
something?

In fact, I'm not sure why there's a separate start_test() and
test_readerwriter() function -- or indeed, a separate kthread? Am I
missing something, or could everything start_test() does be done from
the main test function/kthread?

Regardless, with a few of the tweaks below, this runs fine for me
here, and it's great to see it.

> init/Kconfig | 12 +
> kernel/printk/.kunitconfig | 3 +
> kernel/printk/Makefile | 2 +
> kernel/printk/printk_ringbuffer.c | 4 +
> kernel/printk/printk_ringbuffer_test.c | 350 +++++++++++++++++++++++++
> 5 files changed, 371 insertions(+)
> create mode 100644 kernel/printk/.kunitconfig
> create mode 100644 kernel/printk/printk_ringbuffer_test.c
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 3b6ca7cce03b..46d144908191 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1576,6 +1576,18 @@ config PRINTK
> very difficult to diagnose system problems, saying N here is
> strongly discouraged.
>
> +config PRINTK_RINGBUFFER_TEST
> + tristate "Test for the printk ringbuffer" if !KUNIT_ALL_TESTS
> + depends on PRINTK && KUNIT
> + default KUNIT_ALL_TESTS
> + help
> + This builds the printk ringbuffer KUnit test suite.
> +
> + For more information on KUnit and unit tests in general, please refer
> + to the KUnit documentation.
> +
> + If unsure, say N.
> +
> config BUG
> bool "BUG() support" if EXPERT
> default y
> diff --git a/kernel/printk/.kunitconfig b/kernel/printk/.kunitconfig
> new file mode 100644
> index 000000000000..8d31a5c19053
> --- /dev/null
> +++ b/kernel/printk/.kunitconfig
> @@ -0,0 +1,3 @@
> +CONFIG_KUNIT=y
> +CONFIG_SMP=y
> +CONFIG_PRINTK_RINGBUFFER_TEST=y
> diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
> index 39a2b61c7232..edb5a4cacf67 100644
> --- a/kernel/printk/Makefile
> +++ b/kernel/printk/Makefile
> @@ -7,3 +7,5 @@ obj-$(CONFIG_PRINTK_INDEX) += index.o
> obj-$(CONFIG_PRINTK) += printk_support.o
> printk_support-y := printk_ringbuffer.o
> printk_support-$(CONFIG_SYSCTL) += sysctl.o
> +
> +obj-$(CONFIG_PRINTK_RINGBUFFER_TEST) += printk_ringbuffer_test.o
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 88e8f3a61922..57b80d262cb7 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
>
> +#include <kunit/visibility.h>
> #include <linux/kernel.h>
> #include <linux/irqflags.h>
> #include <linux/string.h>
> @@ -1685,6 +1686,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
> memset(r, 0, sizeof(*r));
> return false;
> }
> +EXPORT_SYMBOL_IF_KUNIT(prb_reserve);
>
> /* Commit the data (possibly finalizing it) and restore interrupts. */
> static void _prb_commit(struct prb_reserved_entry *e, unsigned long state_val)
> @@ -1759,6 +1761,7 @@ void prb_commit(struct prb_reserved_entry *e)
> if (head_id != e->id)
> desc_make_final(e->rb, e->id);
> }
> +EXPORT_SYMBOL_IF_KUNIT(prb_commit);
>
> /**
> * prb_final_commit() - Commit and finalize (previously reserved) data to
> @@ -2181,6 +2184,7 @@ bool prb_read_valid(struct printk_ringbuffer *rb, u64 seq,
> {
> return _prb_read_valid(rb, &seq, r, NULL);
> }
> +EXPORT_SYMBOL_IF_KUNIT(prb_read_valid);
>
> /**
> * prb_read_valid_info() - Non-blocking read of meta data for a requested
> diff --git a/kernel/printk/printk_ringbuffer_test.c b/kernel/printk/printk_ringbuffer_test.c
> new file mode 100644
> index 000000000000..79331ea2b739
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer_test.c
> @@ -0,0 +1,350 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/delay.h>
> +#include <linux/init.h>
> +#include <linux/kthread.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/random.h>
> +#include <linux/sched/clock.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +
> +#include <kunit/test.h>
> +
> +#include "printk_ringbuffer.h"
> +
> +/*
> + * This KUnit tests the data integrity of the lockless printk_ringbuffer.
> + * From multiple CPUs it writes messages of varying length and content while
> + * a reader validates the correctness of the messages.
> + *
> + * IMPORTANT: The more CPUs you can use for this KUnit, the better!
> + *
> + * The test works by starting "num_online_cpus() - 1" writer threads, each
> + * pinned to their own CPU. Each writer thread loops, writing data of varying
> + * length into a printk_ringbuffer as fast as possible. The data content is
> + * an embedded data struct followed by string content repeating the byte:
> + *
> + * 'A' + CPUID
> + *
> + * A reader thread is started on the remaining online CPU and ensures that
> + * embedded struct content is consistent with the string and that the string
> + * is terminated and is composed of the same repeating byte as its first byte.
> + *
> + * Because the threads are running in such tight loops, they will call
> + * schedule() from time to time so the system stays functional.
> + *
> + * If the reader encounters an error, the test is aborted and some
> + * information about the error is provided via printk. The runtime of
> + * the test can be configured with the runtime_ms module parameter.
> + *
> + * Note that the test is performed on a separate printk_ringbuffer instance
> + * and not the instance used by printk().
> + */
> +
> +static unsigned long runtime_ms = 10000;
> +module_param(runtime_ms, ulong, 0400);
> +
> +/* used by writers to signal reader of new records */
> +static DECLARE_WAIT_QUEUE_HEAD(test_wait);
> +
> +/* test data structure */
> +struct rbdata {
> + unsigned int len;
> + char text[] __counted_by(len);
> +};
> +
> +#define MAX_RBDATA_LEN (0x7f + 1)
> +#define MAX_RECORD_SIZE (sizeof(struct rbdata) + MAX_RBDATA_LEN + 1)
> +
> +static struct test_running {
> + int runstate;
> + unsigned long num;
> + struct kunit *test;
> +} *test_running;
> +static int halt_test;
> +
> +static void fail_record(struct kunit *test, struct rbdata *dat, u64 seq)
> +{
> + char buf[MAX_RBDATA_LEN + 1];
> +
> + snprintf(buf, sizeof(buf), "%s", dat->text);
> + buf[sizeof(buf) - 1] = 0;
> +
> + KUNIT_FAIL(test, "BAD RECORD: seq=%llu len=%u text=%s\n",
> + seq, dat->len, dat->len < sizeof(buf) ? buf : "<invalid>");
> +}
> +
> +static bool check_data(struct rbdata *dat)
> +{
> + unsigned int len;
> +
> + len = strnlen(dat->text, MAX_RBDATA_LEN + 1);
> +
> + /* Sane length? */
> + if (len != dat->len || !len || len > MAX_RBDATA_LEN)
> + return false;
> +
> + /* String repeats with the same character? */
> + while (len) {
> + len--;
> + if (dat->text[len] != dat->text[0])
> + return false;
> + }
> +
> + return true;
> +}
> +
> +/* Equivalent to CONFIG_LOG_BUF_SHIFT=13 */
> +DEFINE_PRINTKRB(test_rb, 8, 5);
> +
> +static int prbtest_writer(void *data)
> +{
> + struct test_running *tr = data;
> + char text_id = 'A' + tr->num;
> + struct prb_reserved_entry e;
> + unsigned long count = 0;
> + struct printk_record r;
> + u64 min_ns = (u64)-1;
> + struct rbdata *dat;
> + u64 total_ns = 0;
> + u64 max_ns = 0;
> + u64 post_ns;
> + u64 pre_ns;
> + int len;
> +
> + set_cpus_allowed_ptr(current, cpumask_of(tr->num));
> +
> + kunit_info(tr->test, "start thread %03lu (writer)\n", tr->num);
> +
> + tr->runstate = 1;
> +
> + for (;;) {
> + /* +2 to ensure at least 1 character + terminator. */
> + len = sizeof(struct rbdata) + (get_random_u32() & 0x7f) + 2;
> +
> + /* specify the text sizes for reservation */
> + prb_rec_init_wr(&r, len);
> +
> + pre_ns = local_clock();
> +
> + if (prb_reserve(&e, &test_rb, &r)) {
> + r.info->text_len = len;
> +
> + len -= sizeof(struct rbdata) + 1;
> +
> + dat = (struct rbdata *)&r.text_buf[0];
> + dat->len = len;
> + memset(&dat->text[0], text_id, len);
> + dat->text[len] = 0;
> +
> + prb_commit(&e);
> +
> + post_ns = local_clock();
> +
> + wake_up_interruptible(&test_wait);
> +
> + post_ns -= pre_ns;
> + if (post_ns < min_ns)
> + min_ns = post_ns;
> + if (post_ns > max_ns)
> + max_ns = post_ns;
> + total_ns += post_ns;
> + }
> +
> + if ((count++ & 0x3fff) == 0)
> + schedule();
> +
> + if (READ_ONCE(halt_test) == 1)
> + break;
> + }
> +
> + kunit_info(tr->test, "end thread %03lu: wrote=%lu min_ns=%llu avg_ns=%llu max_ns=%llu\n",
> + tr->num, count, min_ns, total_ns / (u64)count, max_ns);
> +
> + tr->runstate = 2;
> +
> + return 0;
> +}
> +
> +static int prbtest_reader(void *data)
> +{
> + struct test_running *tr = data;
> + char text_buf[MAX_RECORD_SIZE];
> + unsigned long total_lost = 0;
> + unsigned long max_lost = 0;
> + unsigned long count = 0;
> + struct printk_info info;
> + struct printk_record r;
> + int did_sched = 1;
> + u64 seq = 0;
> +
> + set_cpus_allowed_ptr(current, cpumask_of(tr->num));
> +
> + prb_rec_init_rd(&r, &info, &text_buf[0], sizeof(text_buf));
> +
> + kunit_info(tr->test, "start thread %03lu (reader)\n", tr->num);
> +
> + tr->runstate = 1;
> +
> + while (!wait_event_interruptible(test_wait,
> + kthread_should_stop() ||
> + prb_read_valid(&test_rb, seq, &r))) {
> + bool error = false;
> +
> + if (kthread_should_stop())
> + break;
> + /* check/track the sequence */
> + if (info.seq < seq) {
> + KUNIT_FAIL(tr->test, "BAD SEQ READ: request=%llu read=%llu\n",
> + seq, info.seq);
> + error = true;
> + } else if (info.seq != seq && !did_sched) {
> + total_lost += info.seq - seq;
> + if (max_lost < info.seq - seq)
> + max_lost = info.seq - seq;
> + }
> +
> + if (!check_data((struct rbdata *)&r.text_buf[0])) {
> + fail_record(tr->test, (struct rbdata *)&r.text_buf[0], info.seq);
> + error = true;
> + }
> +
> + if (error)
> + WRITE_ONCE(halt_test, 1);
> +
> + did_sched = 0;
> + if ((count++ & 0x3fff) == 0) {
> + did_sched = 1;
> + schedule();
> + }
> +
> + if (READ_ONCE(halt_test) == 1)
> + break;
> +
> + seq = info.seq + 1;
> + }
> +
> + kunit_info(tr->test,
> + "end thread %03lu: read=%lu seq=%llu total_lost=%lu max_lost=%lu\n",
> + tr->num, count, info.seq, total_lost, max_lost);
> +
> + while (!kthread_should_stop())
> + msleep(1000);
> + tr->runstate = 2;
> +
> + return 0;
> +}
> +
> +static int module_test_running;
> +static struct task_struct *reader_thread;
> +
> +static int start_test(void *arg)
> +{
> + struct kunit *test = arg;
> + struct task_struct *thread;
> + unsigned long i;
> + int num_cpus;
> +
> + num_cpus = num_online_cpus();
> + if (num_cpus == 1)
> + kunit_skip(test, "need >1 CPUs for at least one reader and writer");

This check needs to be in the test_readerwriter() function to work
(or, at least, in the same kthread). Otherwise:
(a) KUnit doesn't know which thread to abort, and
(b) the other threads will continue anyway.

> +
> + test_running = kcalloc(num_cpus, sizeof(*test_running), GFP_KERNEL);
> + KUNIT_ASSERT_NOT_NULL(test, test_running);

Can we allocate this in the main test_readerwriter() function. I think
there can be a race otherwise. Equally, if we use something like
kunit_kcalloc() instead, KUnit will clean up the allocation for us
when the test exits (even if it fails).

> +
> + module_test_running = 1;

Should this be WRITE_ONCE()?

> +
> + kunit_info(test, "starting test\n");
> +
> + for (i = 0; i < num_cpus; i++) {
> + test_running[i].test = test;
> + test_running[i].num = i;
> + if (i < num_cpus - 1) {
> + thread = kthread_run(prbtest_writer, &test_running[i],
> + "prbtest writer");
> + } else {
> + thread = kthread_run(prbtest_reader, &test_running[i],
> + "prbtest reader");
> + reader_thread = thread;
> + }
> + if (IS_ERR(thread)) {
> + kunit_err(test, "unable to create thread %lu\n", i);
> + test_running[i].runstate = 2;

Again, WRITE_ONCE()?

> + }
> + }
> +
> + /* wait until all threads finish */
> + for (;;) {
> + msleep(1000);
> +
> + for (i = 0; i < num_cpus; i++) {
> + if (test_running[i].runstate < 2)

Again, READ_ONCE()?

> + break;
> + }
> + if (i == num_cpus)
> + break;
> + }
> +
> + kunit_info(test, "completed test\n");
> +
> + module_test_running = 0;

This, too, _might_ need the WRITE_ONCE() treatment.

> +
> + return 0;
> +}
> +
> +static void test_readerwriter(struct kunit *test)
> +{
> + static bool already_run;
> + int num_cpus;
> + int i;
> +
> + if (already_run)
> + KUNIT_FAIL_AND_ABORT(test, "test can only be run once");
> + already_run = true;

I'm not sure I fully understand why this couldn't be run multiple
times. I definitely see why you wouldn't want to have it running
concurrently with itself, but is there a way to have it 'reset' any
state to allow it to be re-run without a reboot.

Not a problem if it isn't, just curious. :-)

> +
> + kunit_info(test, "running for %lu ms\n", runtime_ms);
> +
> + kthread_run(start_test, test, "prbtest");
> +
> + /* wait until all threads active */
> + num_cpus = num_online_cpus();
> + for (;;) {
> + msleep(1000);
> +
> + for (i = 0; i < num_cpus; i++) {
> + if (test_running[i].runstate == 0)

I don't think there's any guarantee that test_running has been
allocated by this point.

Of course, given the msleep() above, it's very likely that it would
be, but it might make more sense to allocate it in this function
before spinning off any extra threads.

(As a bonus, you could then use kunit_kcalloc() to have it
automatically cleaned up when the test exits, regardless of whether or
not it succeeds.)

Also, you might want to check test_running[i].runstate using
READ_ONCE() or similar -- could a compiler perhaps "optimize out"
further reads to it as-is: it doesn't seem to be 'volatile' or
similar?

> + break;
> + }
> + if (i == num_cpus)
> + break;
> + }
> +
> + msleep(runtime_ms);
> +
> + if (reader_thread && !IS_ERR(reader_thread))
> + kthread_stop(reader_thread);
> +
> + WRITE_ONCE(halt_test, 1);
> +
> + while (module_test_running)
> + msleep(1000);

Again, should this be put under READ_ONCE() or an explicit atomic or
something? Or check that the thread has terminated?

> + kfree(test_running);

If the kunit_kzalloc() function is used above, this could be removed.

> +}
> +
> +static struct kunit_case prb_test_cases[] = {
> + KUNIT_CASE_SLOW(test_readerwriter),
> + {}
> +};
> +
> +static struct kunit_suite prb_test_suite = {
> + .name = "printk-ringbuffer",
> + .test_cases = prb_test_cases,
> +};
> +kunit_test_suite(prb_test_suite);
> +
> +MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
> +MODULE_AUTHOR("John Ogness <john.ogness@xxxxxxxxxxxxx>");
> +MODULE_DESCRIPTION("printk_ringbuffer test");
> +MODULE_LICENSE("GPL");
>
> base-commit: 4022ef25504db2fb79a2acac0afe9bac934f8dd6
> --
> 2.39.5
>

Cheers,
-- David

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature