Re: RFC: THE OFFLINE SCHEDULER

From: Gregory Haskins
Date: Thu Aug 27 2009 - 18:22:37 EST


Thomas Gleixner wrote:
> On Thu, 27 Aug 2009, Christoph Lameter wrote:
>> On Thu, 27 Aug 2009, Chris Friesen wrote:
>>
>>> I just went and read the docs. One of the things I noticed is that it
>>> says that the offlined cpu cannot run userspace tasks. For our
>>> situation that's a showstopper, unfortunately.
>> It needs to be implemented the right way. Essentially this is a variation
>> on the isolcpu kernel boot option. We probably need some syscall to move
>> a user space process to a bare metal cpu since the cpu cannot be
>> considered online in the regular sense.
>
> It can. It needs to be flagged as reserved for special tasks and you
> need a separate mechanism to move and pin a task to such a CPU.
>
>> An isolated cpu can then only execute one process at a time. A process
>> would do all initialization and lock itsresources in memory before going
>> to the isolated processor. Any attempt to use OS facilities need to cause
>> the process to be moved back to a cpu with OS services.
>
> You are creating a "one special case" operation mode which is not
> justified in my opinion. Let's look at the problem you want to solve:
>
> Run exactly one thread on a dedicated CPU w/o any disturbance by the
> scheduler tick.
>
> You can move away anything else than the scheduler tick from a CPU
> today already w/o a single line of code change.
>
> But you want to impose restrictions like resource locking and moving
> back to another CPU in case of a syscall. What's the purpose of this ?
> It does not buy anything except additional complexity.
>
> That's just the wrong approach. All you need is a way to tell the
> kernel that CPUx can switch off the scheduler tick when only one
> thread is running and that very thread is running in user space. Once
> another thread arrives on that CPU or the single thread enters the
> kernel for a blocking syscall the scheduler tick has to be
> restarted.
>
> It's not rocket science to fix the well known issues of stopping and
> eventually restarting the scheduler tick, the CPU time accounting and
> some other small details. Such a modification would be of general use
> contrary to your proposed solution which is just a hack to solve your
> particular special case of operation.

I wonder if it makes sense to do something along the lines of the
sched-class...

IOW: What if we adopted one of the following models:

1) Create a new class that is higher prio than FIFO/RR and, when
selected, disables the tick.

2) Modify FIFO so that it disables tick by default...update accounting
info at next reschedule event.

3) Variation of 2..leave FIFO+tick as is by default, but have some kind
of parameter to optionally disable tick if desired.

In a way, we should probably consider (2) independent of this particular
thread. FIFO doesn't need a tick anyway afaict...only a RESCHED+IPI
truly ever matter here....or am I missing something obvious (probably
w.r.t accounting)?

You could then couple this solution with cpusets (possibly with a little
work to get rid of any pesky per-cpy kthreads) to achieve the desired
effect of interference-free operation. You wouldn't even have to have
funky rules eluded to above w.r.t. making sure only one userspace thread
is running on the core.

Thoughts?
-Greg

Attachment: signature.asc
Description: OpenPGP digital signature