Re: regression introduced by - timers: fix itimer/many thread hang

From: Peter Zijlstra
Date: Mon Nov 24 2008 - 04:33:52 EST


On Mon, 2008-11-24 at 09:46 +0100, Petr Tesarik wrote:
> Peter Zijlstra pÃÅe v Ne 23. 11. 2008 v 15:24 +0100:
> > On Fri, 2008-11-21 at 19:42 +0100, Petr Tesarik wrote:
> >
> > > > > In any event, while this particular implementation may not be optimal,
> > > > > at least it's _right_. Whatever happened to "make it right, then make
> > > > > it fast?"
> > > >
> > > > Well, I'm not thinking you did it right ;-)
> > > >
> > > > While I agree that the linear loop is sub-optimal, but it only really
> > > > becomes a problem when you have hundreds or thousands of threads in your
> > > > application, which I'll argue to be insane anyway.
> > >
> > > This is just not true. I've seen a very real example of a lockup with a very
> > > sane number of threads (one per CPU), but on a very large machine (1024 CPUs
> > > IIRC). The application set per-process CPU profiling with an interval of 1
> > > tick, which translates to 1024 timers firing off with each tick...
> > >
> > > Well, yes, that was broken, too, but that's the way one quite popular FORTRAN
> > > compiler works...
> >
> > I'm not sure what side you're arguing...
>
> In this particular case I'm arguing against both, it seems. The old
> behaviour is broken and the new one is not better. :(

OK, then we agree ;-)

> > The current (per-cpu) code is utterly broken on large machines too, I've
> > asked SGI to run some tests on real numa machines (something multi-brick
> > altix) and even moderately small machines with 256 cpus in them grind to
> > a halt (or make progress at a snails pace) when the itimer stuff is
> > enabled.
> >
> > Furthermore, I really dislike the per-process-per-cpu memory cost, it
> > bloats applications and makes the new per-cpu alloc work rather more
> > difficult than it already is.
> >
> > I basically think the whole process wide itimer stuff is broken by
> > design, there is no way to make it work on reasonably large machines,
> > the whole problem space just doesn't scale. You simply cannot maintain a
> > global count without bouncing cachelines like mad, so you might as well
> > accept it and do the process wide counter and bounce only a single line,
> > instead of bouncing a line per-cpu.
>
> Very true. Unfortunately per-process itimers are prescribed by the
> Single Unix Specification, so we have to cope with them in some way,
> while not permitting a non-privileged process a DoS attack. This is
> going to be hard, and we'll probably have to twist the specification a
> bit to still conform to its wording. :((

Feel like reading the actual spec and trying to come up with a creative
interpretation? :-)

> I really don't think it's a good idea to set a per-process ITIMER_PROF
> to one timer tick on a large machine, but the kernel does allow any
> process to do it, and then it can even cause hard freeze on some
> hardware. This is _not_ acceptable.
>
> What is worse, we can't just limit the granularity of itimers, because
> threads can come into being _after_ the itimer was set.

Currently it has jiffy granularity, right? And jiffies are different
depending on some compile time constant (HZ), so can't we, for the sake
of per-process itimers, pretend to have a 1 minute jiffie?

That should be as compliant as we are now, and utterly useless for
everybody, thereby discouraging its use, hmm? :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/