Re: hit a KASan bug related to Perf during stress test

From: Peter Zijlstra
Date: Mon Oct 24 2016 - 08:38:21 EST


On Mon, Oct 24, 2016 at 02:29:42PM +0200, Oleg Nesterov wrote:
> On 10/24, Peter Zijlstra wrote:
> >
> > On Mon, Oct 24, 2016 at 02:10:31PM +0200, Oleg Nesterov wrote:
> > > --- x/kernel/pid.c
> > > +++ x/kernel/pid.c
> > > @@ -526,8 +526,11 @@ pid_t __task_pid_nr_ns(struct task_struc
> > > if (!ns)
> > > ns = task_active_pid_ns(current);
> > > if (likely(pid_alive(task))) {
> > > - if (type != PIDTYPE_PID)
> > > + if (type != PIDTYPE_PID) {
> > > + if (type == PIDTYPE_TGID)
> > > + type = PIDTYPE_PID;
> > > task = task->group_leader;
> > > + }
> >
> > Aah, that makes much more sense ;-)
> >
> > > nr = pid_nr_ns(rcu_dereference(task->pids[type].pid), ns);
> > > }
> > > rcu_read_unlock();
> >
> >
> > Still, I wonder if returning 0 is the right thing. 0 is a 'valid' PID
> > for the init/idle task.
>
> Yes, now I think that -1 would make more sense. Unfortunately we can't
> just change __task_pid_nr_ns(), it already has the users which assume
> it returns zero... attach_to_pi_state() for example.

Indeed. And I have a patch that assumes task_pid_vnr(&init_task) == 0,
is that true because of this !alive case or true in general?

No worries though, we can revert to your earlier explicit test and
return -1 while adding a comment to explain details? I'll go write one
up in a bit, but I need to run an errand first.

> > And we still have the re-use issue for the TID, because when we get here
> > TID is already unhashed too afaict,
>
> Yes, so perf_event_tid() will report zero.

Ah, ok. So whould we change that to match pid and return (explicit) -1
there too?