Re: [PATCH v2 2/2] pid: Remove pidhash

From: Rik van Riel
Date: Mon Oct 02 2017 - 09:45:40 EST


On Mon, 2017-10-02 at 09:35 -0400, Rik van Riel wrote:
> On Wed, 2017-09-27 at 17:45 +0200, Oleg Nesterov wrote:
> > On 09/27, Gargi Sharma wrote:
> > >
> > > -#define find_next_offset(map, off)
> > >
> > > \
> > > - find_next_zero_bit((map)->page, BITS_PER_PAGE,
> > > off)
> > > -
> >
> > this should go into the previous patch, but this is minor...
> >
> > > @@ -208,12 +200,10 @@ struct pid *alloc_pid(struct pid_namespace
> > > *ns)
> > > Â
> > > Â upid = pid->numbers + ns->level;
> > > Â spin_lock_irq(&pidmap_lock);
> > > - if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
> > > + if (!(ns->pid_allocated & PIDNS_ADDING))
> > > Â goto out_unlock;
> > > Â for ( ; upid >= pid->numbers; --upid) {
> > > - hlist_add_head_rcu(&upid->pid_chain,
> > > - &pid_hash[pid_hashfn(upid->nr,
> > > upid->ns)]);
> > > - upid->ns->nr_hashed++;
> > > + upid->ns->pid_allocated++;
> >
> > No, this is wrong.
> >
> > It is too late to check PIDNS_HASH_ADDING/PIDNS_ADDING and
> > increment
> > pid_allocated,
> > once we call idr_alloc_cyclic() this pid is already "hashed" in
> > that
> > it can be found
> > by find_pid_ns() with this patch applied.
> >
> > And of course, it is too late to do atomic_set(&pid->count, 1) and
> > initialize
> > pid->tasks[type] lists by the same reason.
>
> Hi Oleg,
>
> Gargi and I are looking at that code, and trying to figure out
> exactly what needs to be done to make all of this correct.
>
> We are thinking something along these lines:

.... and we looked at the code some more. Some of the way the
code is laid out currently looks like an artifact of history.

Lets try this again:

1) pid = kmem_cache_alloc(...)

2) get_pid_ns(ns);
atomic_set(&pid->count, 1);
for (type = 0; ...)
INIT_HLIST(...)


> 3) First, check if this is a new namespace (PIDNS_ADDING), and
> ÂÂÂdo the call to pid_ns_prepare_proc, before we even call idr_alloc.
> ÂÂÂMaybe something like:
>
> ÂÂÂif (unlikely(ns->nr_allocated == PIDNS_ADDING)) {
> ÂÂÂÂÂÂif (pid_ns_prepare_proc(ns)) {
> ÂÂÂÂÂÂÂÂÂdisable_pid_allocations(ns);
> ÂÂÂÂÂÂÂÂÂgoto out_free_ns;
> ÂÂÂ}
>
> 4) With pid_ns_prepare_proc out of the way, we can put all the code
> ÂÂÂfrom below where the call to pid_ns_prepare_proc is now (except
> ÂÂÂerror handing) into the main loop of pid allocation, so we can
> ÂÂÂdo all that stuff under the pidmap_lock:
>
> ÂÂÂfor (i = ns->level; i >= 0; i--) {
> ÂÂÂÂÂÂÂ...
> ÂÂÂÂÂÂÂidr_alloc_cyclic(...)

> ÂÂÂÂÂÂÂns->nr_allocated++;
> ÂÂÂÂÂÂÂ...
> Â }
>
> Would that resolve your objection, or are we barking up the wrong
> tree?

Now the per-pid struct stuff is done once, before the loop, and
the loop deals only with allocated numbers inside each parent
namespace.

The error path would unwind stuff done earlier.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part