Re: %u-order allocation failed

From: Mikulas Patocka (mikulas@artax.karlin.mff.cuni.cz)
Date: Sat Oct 06 2001 - 14:07:31 EST


> > Of course vmalloc space can overflow - but it overflows only when the
> > machine is overloaded with too many processes, too many processes with
> > many filedescriptors etc. On the other hand, the buddy allocator fails
> > *RANDOMLY*. Totally randomly, depending on cache access patterns and
> > page allocation times.
>
> vmalloc space is also much worse for tlb usage when the main kernel mapping
> uses large hardware ptes. Ingo and davem pointed this out to me recently
> when I wanted to allocate the pagecache hash using vmalloc (at the
> moment it maxes out at order 10 which is much to small for machines
> with large memory).

OK, but my patch uses vmalloc only as a fallback when buddy fails. The
probability that buddy fails is small. It is slower but with very small
probability.

It is perfectly OK to have a bit slower access to task_struct with
probability 1/1000000.

But it is ***BAD*BUG*** if allocation of task_struct fails with
probability 1/1000000.

> If you could get away with a single page stack, then you could allocate
> the task struct separately and avoid any order 1 allocation. But you
> would probably need interrupt stacks to get away with a single page
> stack.

Yes, but there are still other dangerous usages of kmalloc and
__get_free_pages. (The most offending one is in select.c)

It is sad that core VM developers did not write any documentation that
explains that high-order allocations can fail any time and the caller must
not abort his operation when it happens. Instead - they are trying to make
high-order allocations fail less often :-/ How should random
Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is
not?

Now parts of a kernel written by people who know about buddy allocator
(page/buffer/dentry/inode hash allocations, filedescriptor array
allocation) are written correctly with the assumption that high-order
allocation may fail.

Other parts of kernel written by people who do not know about buddy
allocator (task_struct allocation, select and probably a lot of drivers)
assume that high-order allocation always succeeds. task_struct and select
can be fixed easily, but cleaning the shit in drivers will be real pain
and it will probably never be finished :-(

Mikulas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:42 EST