Re: Problem with >1 page GFP_KERNEL kmalloc()

Marek Michalkiewicz (marekm@i17linuxb.ists.pwr.wroc.pl)
Sun, 13 Apr 1997 09:03:19 +0200 (MET DST)


Hi,

(sorry for the delay - my net access is somewhat limited at the moment)

Gerard Roudier:
> Any kernel software shall never try to allocate a physical memory area
> larger that 1 PAGE a long time after the system has been booted up.
> There is no guarantee that is always possible, and when it is, this may
> cause lots of recently cached objects to be trashed or swapped out.

OK, but this is not a disaster. The consequences of allocation failure
may be much worse than that (like no nightly backup with ftape if the
module fails to load).

> I prefer such silly softwares to make problems to users, in order they
> will be fixed more quickly.

OK, but these "silly softwares" have been in the standard kernel for
a few years, and so far nobody fixed them, so it must be non-trivial
(and unlikely to be fixed in 2.0.xx). Instead, various ugly hacks
are used to partially work around this limitation (see svgatextmode
for an example - allocating and freeing lots of user memory in an
attempt to create more free physical pages, resizing via 1x1 screen).

And if the purpose of these kmalloc failures is to encourage fixing
bad kernel code more quickly, then an occasional "insmod ftape" or
svgatextmode failure is not good enough - ALL >1 page kmallocs should
always fail and be logged via printk (in 2.1.xx only, of course!).

> On the other hand, try_to_free_pages() and friends do not care
> too much with needs of contiguous physical memory larger than 1 PAGE.
>
> So, in my opinion, your patch is a bomb and I would prefer it to be
> disgarded, at least for the moment.

Well, I use it on my system (now running 2.0.30) and it hasn't
exploded yet ;-). As far as I can tell, the patch has no effect
with correctly written kernel code (it doesn't change anything for
allocations smaller than 1 page; it only slows down kmalloc in rare
cases where it would previously fail; such big kmallocs are not done
very often so it won't degrade performance very much). Bad kernel
code should still be fixed, but it works in the meantime.

> It is always possible to use physical memory area not larger than 1 PAGE,
> on a running system for the following reasons:
>
> - If such memory is to be used for DMA, it is possible to use
> scatter/gather technics, or to fall to multiple IOs with device
> drivers that do not support scatter/gather.

OK, but not in 2.0.xx - too many changes for a "stable" kernel.

> - If such memory is only accessed by the CPU, it is possible to virtually
> remap severall PAGES in a way they can be addressed contiguously.

OK, this is already possible with vmalloc(). But again, the console
code may need too many changes for 2.0.xx.

> - For very rare situations in which large physically contiguous memory
> chunks are needed or wished, it is possible to allocate them at boot up or
> very early after the system has been booted up.

Waste of memory. Better allocate memory on device open, and free on
close - but that requires that allocation doesn't fail unless we are
_really_ low on memory (and not just because it is fragmented). The
ftape driver uses 96K for its buffers, even when the driver is not
being used at the time. It can be freed when using ftape as a module,
but then you can never know if the next "insmod" succeeds.

So, my proposal: by all means, let's have the proper fix in 2.1.xx.
But until then, let's have a workaround (especially for 2.0.xx -
"stable" kernels shouldn't have problems like this IMHO). Please...

The patch (just in case anyone missed it) is available from
ftp://ftp.ists.pwr.wroc.pl/pub/linux/patches/linux-2.0.30-alloc-patch
(should work with older 2.0.xx kernels too).

(BTW, in the same directory there is also a patch for 2.1.20 - may
need to be updated for current kernels - that fixes setresuid() to
match the HP-UX man page, resets fsuid and dumpable flag when euid
is changed [important!], and adds setresgid().)

Regards,

Marek