Re: [PATCH bpf-next v5 1/2] bpf: support BPF_PROG_QUERY for progs attached to sockmap

From: Jakub Sitnicki
Date: Sat Jan 15 2022 - 14:10:09 EST


On Sat, Jan 15, 2022 at 03:53 AM CET, Andrii Nakryiko wrote:
> On Fri, Jan 14, 2022 at 6:38 PM zhudi (E) <zhudi2@xxxxxxxxxx> wrote:
>>
>> > On Thu, Jan 13, 2022 at 8:15 AM Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> wrote:
>> > >
>> > > On Thu, Jan 13, 2022 at 10:00 AM CET, Di Zhu wrote:

[...]

>> > > > +int sock_map_bpf_prog_query(const union bpf_attr *attr,
>> > > > + union bpf_attr __user *uattr)
>> > > > +{
>> > > > + __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
>> > > > + u32 prog_cnt = 0, flags = 0, ufd = attr->target_fd;
>> > > > + struct bpf_prog **pprog;
>> > > > + struct bpf_prog *prog;
>> > > > + struct bpf_map *map;
>> > > > + struct fd f;
>> > > > + u32 id = 0;
>> > > > + int ret;
>> > > > +
>> > > > + if (attr->query.query_flags)
>> > > > + return -EINVAL;
>> > > > +
>> > > > + f = fdget(ufd);
>> > > > + map = __bpf_map_get(f);
>> > > > + if (IS_ERR(map))
>> > > > + return PTR_ERR(map);
>> > > > +
>> > > > + rcu_read_lock();
>> > > > +
>> > > > + ret = sock_map_prog_lookup(map, &pprog, attr->query.attach_type);
>> > > > + if (ret)
>> > > > + goto end;
>> > > > +
>> > > > + prog = *pprog;
>> > > > + prog_cnt = !prog ? 0 : 1;
>> > > > +
>> > > > + if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
>> > > > + goto end;
>> > > > +
>> > > > + id = prog->aux->id;
>> > >
>> > > ^ This looks like a concurrent read/write.
>> >
>> > You mean that bpf_prog_load() might be setting it in a different
>> > thread? I think ID is allocated and fixed before prog FD is available
>> > to the user-space, so prog->aux->id is set in stone and immutable in
>> > that regard.
>>
>> What we're talking about here is that bpf_prog_free_id() will write the id
>> identifier synchronously.
>
> Hm.. let's say bpf_prog_free_id() happens right after we read id 123.
> It's impossible to distinguish that from reading valid ID (that's not
> yet freed), returning it to user-space and before user-space can do
> anything about that this program and it's ID are freed. User-space
> either way will get an ID that's not valid anymore. I don't see any
> use of READ_ONCE/WRITE_ONCE with prog->aux->id, which is why I was
> asking what changed.
>

You're right, READ_ONCE/WRITE_ONCE is not improving anything here.

I've suggested it not to make the query op more reliable, but rather to
mark the shared access.

But in this case annotating it with data_race() [1] would be a better
fit, I think, because we don't care if we get the old or the new value.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt#n58

>>
>> >
>> > >
>> > > Would wrap with READ_ONCE() and corresponding WRITE_ONCE() in
>> > > bpf_prog_free_id(). See [1] for rationale.
>> > >
>> > > [1]
>> > https://github.com/google/kernel-sanitizers/blob/master/other/READ_WRITE_O
>> > NCE.md
>> > >

[...]