Re: Ugh in 2.1.60

Andi Kleen (ak@muc.de)
31 Oct 1997 17:59:26 +0100


Linus Torvalds <torvalds@transmeta.com> writes:

> On Thu, 30 Oct 1997, Bill Hawes wrote:
>
> > Linus Torvalds wrote:
> >
> > > Getting a "ugh" in do_page_fault indicates that somebody does a copy
> > > to/from user space with interrupts disabled. That is a _bug_.
> > >
> > > I suspect it is some specific subsystem that does this, because I've
> > > never seen it myself. It would be good if people who see this try to
> > > notice what they tend to be doing when it happens, so that we might be
> > > able to find out exactly _what_ it is that tries to take a page fault
> > > with interrupts disabled..
> >
> > Oh, well in that case ...
> >
> > There are quite a few places in various drivers (serial, IDSN, etc.)
> > that do a copy_xx_user with interrupts turned off.
> >
> > Looks like it would be good for the driver maintainers to check the
> > files for such occurrences and review what changes are needed.
>
> Right. One thing I've also considered is to make a debugging version of
> "copy_to_user()" and friends, that _verify_ that the local interrupts are
> not blocked. That's a lot more likely to catch things than the page
> faulting behaviour (because page faults are reasonably rare, after all,
> especailly if you have enough memory like I do..)

It should be also considered to move most copy_*_user()/get/put_user() calls
out-of-line to save code bloat: there is evidence(1) that most of the 2.1
binary image bloat (my 2.1 kernel is over 100k bigger than the equivalent 2.0
one) comes from these functions. Actually the kernel would not only become
smaller but faster too: what is a single subroutine call compared to a
L1/L2 cache miss? For the real critical paths a fast_copy_*_user() could be
defined, but I think it would be sufficient to leave the __{put,get}_user()
functions inline and move the versions without "__" out-of-line - the fast
paths use them mostly anyways and the others are mostly ioctls and other
stuff that isn't performance critical anyways.

The problem of returning two values in get_user() can be simply solved
by using asm wrappers and the same trick that the syscalls use - just
set the carry bit on error and fetch the error from another register
then.

Please rethink this issue. I think it's really worth it.

-Andi

(1) Get ftp://ftp.firstfloor.org/pub/ak/bloat-o-meter and run it with
bloat-o-meter vmlinux-2.0 vmlinux-2.1 .You'll notice that most of the growage
comes from the "giant ioctl switch" routines in many subsystems.