Re: [PATCH 6/6] sysctl: pass kernel pointers to ->proc_handler

From: Matthew Wilcox
Date: Fri Apr 17 2020 - 15:50:28 EST


On Fri, Apr 17, 2020 at 12:39:10PM -0700, Andrey Ignatov wrote:
> Though it breaks tools/testing/selftests/bpf/test_sysctl.c. I spent some
> time debugging and found a couple of problems -- see below. But there is
> something else .. Still I figured it's a good idea to give an early
> heads-up.

"see below"? Really? You're going to say that and then make people
scroll through thousands of lines of quoted material to find your new
contributions? Please, learn to trim appropriately.

Here's about what you should have sent:

> > @@ -1156,52 +1153,41 @@ const struct bpf_verifier_ops cg_dev_verifier_ops = {
> > */
> > int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
> > struct ctl_table *table, int write,
> > - void __user *buf, size_t *pcount,
> > - loff_t *ppos, void **new_buf,
> > - enum bpf_attach_type type)
> > + void **buf, size_t *pcount,
> > + loff_t *ppos, enum bpf_attach_type type)
> > {
> > struct bpf_sysctl_kern ctx = {
> > .head = head,
> > .table = table,
> > .write = write,
> > .ppos = ppos,
> > - .cur_val = NULL,
> > + .cur_val = *buf,
>
>
> cur_val is allocated separately below to read current value of sysctl
> and not interfere with user-passed buffer.
>
> > .cur_len = PAGE_SIZE,
> > .new_val = NULL,
> > .new_len = 0,
> > .new_updated = 0,
> > };
> > struct cgroup *cgrp;
> > + loff_t pos = 0;
> > int ret;
> >
> > - ctx.cur_val = kmalloc_track_caller(ctx.cur_len, GFP_KERNEL);
> > - if (ctx.cur_val) {
> > - mm_segment_t old_fs;
> > - loff_t pos = 0;
> > -
> > - old_fs = get_fs();
> > - set_fs(KERNEL_DS);
> > - if (table->proc_handler(table, 0, (void __user *)ctx.cur_val,
> > - &ctx.cur_len, &pos)) {
> > - /* Let BPF program decide how to proceed. */
> > - ctx.cur_len = 0;
> > - }
> > - set_fs(old_fs);
> > - } else {
> > + if (table->proc_handler(table, 0, ctx.cur_val, &ctx.cur_len, &pos)) {
>
> This call reads current value of sysclt into cur_val buffer.
>
> Since you made cur_val point to kernel copy of user-passed buffer, this
> call will always override whatever is there in that kernel copy.
>
> For example, if user is writing to sysclt, then *buf is a pointer to new
> value, but this call will override this new value and, corresondingly
> new value will be lost.
>
> I think cur_val should still be allocated separately.
>
>
> > /* Let BPF program decide how to proceed. */
> > ctx.cur_len = 0;
> > }
> >
> > - if (write && buf && *pcount) {
> > + if (write && *pcount) {
> > /* BPF program should be able to override new value with a
> > * buffer bigger than provided by user.
> > */
> > ctx.new_val = kmalloc_track_caller(PAGE_SIZE, GFP_KERNEL);
> > - ctx.new_len = min_t(size_t, PAGE_SIZE, *pcount);
> > - if (!ctx.new_val ||
> > - copy_from_user(ctx.new_val, buf, ctx.new_len))
> > + if (ctx.new_val) {
> > + ctx.new_len = min_t(size_t, PAGE_SIZE, *pcount);
> > + memcpy(ctx.new_val, buf, ctx.new_len);
>
> This should be *buf, not buf. A typo I guess?
>
>
> I applied the whole patchset to bpf-next tree and run selftests. This
> patch breaks 4 of them:
>
> % cd tools/testing/selftests/bpf/
> % ./test_sysctl
> ...
> Test case: sysctl_get_new_value sysctl:write ok .. [FAIL]
> Test case: sysctl_get_new_value sysctl:write ok long .. [FAIL]
> Test case: sysctl_get_new_value sysctl:write E2BIG .. [FAIL]
> Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS]
> Test case: sysctl_set_new_value sysctl:write ok .. [FAIL]
> ...
> Summary: 36 PASSED, 4 FAILED
>
> I applied both changes I suggested above and it reduces number of broken
> selftests to one:
>
> Test case: sysctl_set_new_value sysctl:write ok .. [FAIL]
>
> I haven't debugged this last one though yet ..
>
> All these tests are available in
> tools/testing/selftests/bpf/test_sysctl.c.
>
> I think it's a good idea to run these tests locally before sending the
> next version of the patch set.
>