> > Of course vmalloc space can overflow - but it overflows only when the
> > machine is overloaded with too many processes, too many processes with
> > many filedescriptors etc. On the other hand, the buddy allocator fails
> > *RANDOMLY*. Totally randomly, depending on cache access patterns and
> > page allocation times.
>
> vmalloc space is also much worse for tlb usage when the main kernel mapping
> uses large hardware ptes. Ingo and davem pointed this out to me recently
> when I wanted to allocate the pagecache hash using vmalloc (at the
> moment it maxes out at order 10 which is much to small for machines
> with large memory).
OK, but my patch uses vmalloc only as a fallback when buddy fails. The
probability that buddy fails is small. It is slower but with very small
probability.
It is perfectly OK to have a bit slower access to task_struct with
probability 1/1000000.
But it is ***BAD*BUG*** if allocation of task_struct fails with
probability 1/1000000.
> If you could get away with a single page stack, then you could allocate
> the task struct separately and avoid any order 1 allocation. But you
> would probably need interrupt stacks to get away with a single page
> stack.
Yes, but there are still other dangerous usages of kmalloc and
__get_free_pages. (The most offending one is in select.c)
It is sad that core VM developers did not write any documentation that
explains that high-order allocations can fail any time and the caller must
not abort his operation when it happens. Instead - they are trying to make
high-order allocations fail less often :-/ How should random
Joe-driver-developer know, that kmalloc(4096) is safe and kmalloc(4097) is
not?
Now parts of a kernel written by people who know about buddy allocator
(page/buffer/dentry/inode hash allocations, filedescriptor array
allocation) are written correctly with the assumption that high-order
allocation may fail.
Other parts of kernel written by people who do not know about buddy
allocator (task_struct allocation, select and probably a lot of drivers)
assume that high-order allocation always succeeds. task_struct and select
can be fixed easily, but cleaning the shit in drivers will be real pain
and it will probably never be finished :-(
Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:42 EST