Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter

From: Namhyung Kim
Date: Mon Sep 30 2024 - 00:33:17 EST


On Mon, Sep 30, 2024 at 12:24:52PM +0900, Hyeonggon Yoo wrote:
> On Mon, Sep 30, 2024 at 11:18 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > Hello Hyeonggon,
> >
> > On Sun, Sep 29, 2024 at 11:27:25PM +0900, Hyeonggon Yoo wrote:
> > > On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > +SEC("raw_tp/bpf_test_finish")
> > > > > +int BPF_PROG(check_task_struct)
> > > > > +{
> > > > > + __u64 curr = bpf_get_current_task();
> > > > > + struct kmem_cache *s;
> > > > > + char *name;
> > > > > +
> > > > > + s = bpf_get_kmem_cache(curr);
> > > > > + if (s == NULL) {
> > > > > + found = -1;
> > > > > + return 0;
> > > >
> > > > ... it cannot find a kmem_cache for the current task. This program is
> > > > run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> > > > the curr should point a task_struct in a slab cache.
> > > >
> > > > Am I missing something?
> > >
> > > Hi Namhyung,
> > >
> > > Out of curiosity I've been investigating this issue on my machine and
> > > running some experiments.
> >
> > Thanks a lot for looking at this!
> >
> > >
> > > When the test fails, calling dump_page() for the page the task_struct
> > > belongs to,
> > > shows that the page does not have the PGTY_slab flag set which is why
> > > virt_to_slab(current) returns NULL.
> > >
> > > Does the test always fails on your environment? On my machine, the
> > > test passed sometimes but failed some times.
> >
> > I'm using vmtest.sh but it succeeded mostly. I thought I couldn't
> > reproduce it locally, but I also see the failure sometimes. I'll take a
> > deeper look.
> >
> > >
> > > Maybe sometimes the value returned by 'current' macro belongs to a
> > > slab, but sometimes it does not.
> > > But that doesn't really make sense to me as IIUC task_struct
> > > descriptors are allocated from slab.
> >
> > AFAIK the notable exception is the init_task which lives in the kernel
> > data. I'm not sure the if the test is running by PID 1.
>
> I checked that the test is running under PID 0 (swapper) when it fails and
> non-0 PID when it succeeds. This makes sense as the task_struct for PID 0
> should be in the kernel image area, not in a slab.
>
> Phew, fortunately, it's not a bug! :)

Thanks for the test, I've seen the same now.

>
> Any plans on how to adjust the test program?

I thought the test runs in a separate task. I'll think about how to
test this more reliably.

Thanks,
Namhyung