Re: [PATCH 1/3] sched: add sched_task_call()

From: Josh Poimboeuf
Date: Thu Feb 19 2015 - 16:43:05 EST


On Thu, Feb 19, 2015 at 09:40:36PM +0100, Vojtech Pavlik wrote:
> On Thu, Feb 19, 2015 at 11:32:55AM -0600, Josh Poimboeuf wrote:
> > On Thu, Feb 19, 2015 at 06:19:29PM +0100, Vojtech Pavlik wrote:
> > > On Thu, Feb 19, 2015 at 11:03:53AM -0600, Josh Poimboeuf wrote:
> > > > On Thu, Feb 19, 2015 at 05:33:59PM +0100, Vojtech Pavlik wrote:
> > > > > On Thu, Feb 19, 2015 at 10:24:29AM -0600, Josh Poimboeuf wrote:
> > > > >
> > > > > > > No, these tasks will _never_ make syscalls. So you need to guarantee
> > > > > > > they don't accidentally enter the kernel while you flip them. Something
> > > > > > > like so should do.
> > > > > > >
> > > > > > > You set TIF_ENTER_WAIT on them, check they're still in userspace, flip
> > > > > > > them then clear TIF_ENTER_WAIT.
> > > > > >
> > > > > > Ah, that's a good idea. But how do we check if they're in user space?
> > > > >
> > > > > I don't see the benefit in holding them in a loop - you can just as well
> > > > > flip them from the syscall code as kGraft does.
> > > >
> > > > But we were talking specifically about HPC tasks which never make
> > > > syscalls.
> > >
> > > Yes. I'm saying that rather than guaranteeing they don't enter the
> > > kernel (by having them spin) you can flip them in case they try to do
> > > that instead. That solves the race condition just as well.
> >
> > Ok, gotcha.
> >
> > We'd still need a safe way to check if they're in user space though.
>
> Having a safe way would be very nice and actually quite useful in other
> cases, too.
>
> For this specific purpose, however, we don't need a very safe way,
> though. We don't require atomicity in any way, we don't mind even if it
> creates false negatives, only false positives would be bad.
>
> kGraft looks at the stacktrace of CPU hogs and if it finds no kernel
> addresses there, it assumes userspace. Not very nice, but does the job.

So I've looked at kgr_needs_lazy_migration(), but I still have no idea
how it works.

First of all, I think reading the stack while its being written to could
give you some garbage values, and a completely wrong nr_entries value
from save_stack_trace_tsk().

But also, how would you walk a stack without knowing its stack pointer?
That function relies on the saved stack pointer in
task_struct.thread.sp, which, AFAICT, was last saved during the last
call to schedule(). Since then, the stack could have been completely
rewritten, with different size stack frames, before the task exited the
kernel.

Am I missing something?

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/