Re: [RFC][PATCH 2/7] kref: Add kref_read()

From: Daniel Borkmann
Date: Wed Nov 16 2016 - 05:12:26 EST


On 11/16/2016 09:21 AM, Greg KH wrote:
On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote:
On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote:
On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote:

--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req
/* Completion does it's own kref_put. If we are going to
* kref_sub below, we need req to be still around then. */
int at_least = k_put + !!c_put;
- int refcount = atomic_read(&req->kref.refcount);
+ int refcount = kref_read(&req->kref);
if (refcount < at_least)
drbd_err(device,
"mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n",

As proof of "things you should never do", here is one such example.

ugh.

--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio
/* Stop all the virtqueues. */
vdev->config->reset(vdev);

- refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount);
+ refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref);
put_disk(vblk->disk);
vdev->config->del_vqs(vdev);
kfree(vblk->vqs);

And this too, ugh, that's a huge abuse and is probably totally wrong...

thanks again for digging through this crap. I wonder if we need to name
the kref reference variable "do_not_touch_this_ever" or some such thing
to catch all of the people who try to be "too smart".

There's unimaginable bong hits involved in this stuff, in the end I
resorted to brute force and scripts to convert all this.

What should we do about things like this (bpf_prog_put() and callbacks
from kernel/bpf/syscall.c):

Just reading up on this series. Your question refers to converting bpf
prog and map ref counts to Peter's refcount_t eventually, right?

static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
{
struct user_struct *user = prog->aux->user;

atomic_long_sub(prog->pages, &user->locked_vm);

Oh that's scary. Let's just make one reference count rely on another
one and not check things...

Sorry, could you elaborate what you mean by 'check things', you mean for
wrap around? IIUC, back then accounting was roughly similar modeled after
perf event's one, and in this case accounts for pages used by progs and
maps during their life-time. Are you suggesting that this approach is
inherently broken?

free_uid(user);
}

static void __bpf_prog_put_rcu(struct rcu_head *rcu)
{
struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);

free_used_maps(aux);
bpf_prog_uncharge_memlock(aux->prog);
bpf_prog_free(aux->prog);
}

void bpf_prog_put(struct bpf_prog *prog)
{
if (atomic_dec_and_test(&prog->aux->refcnt))
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}


Not only do we want to protect prog->aux->refcnt, but I think we want
to protect user->locked_vm too ... I don't think it's sane for
user->locked_vm to be a stats_t ?

I don't think this is sane code...

greg k-h