Re: [TEST] page tables filling non-highmem

From: Andrea Arcangeli (andrea@suse.de)
Date: Mon Feb 18 2002 - 08:15:45 EST


On Mon, Feb 18, 2002 at 01:59:42PM +0100, Daniel Phillips wrote:
> On February 18, 2002 01:27 pm, Andrea Arcangeli wrote:
> > Agreed, this is why I fighted with Linus and Marcelo trying to convince
> > them not to reintroduce the loop crap into the allocator that leads to
> > all sort of oom deadlocks because we lack the knowledge on the amount of
> > freeable pages (I even re-read the emails about such stuff in the thread
> > "VM tweaks" to be sure I was remembering right). OTOH, I really cannot
> > complain, they included so much stuff from my tree that even if we
> > disagreed on something at the end I don't mind :). And this is probably
> > also why I don't like very much to restart those threads about oom
> > deadlocks, I know my way is the only right way (i.e. non deadlock prone)
> > possible, and I live with it just fine.
> >
> > The only way we can learn if a page or a mapping is freeable or not, is
> > by trying to free it and by checking if we failed or not. We cannot know
> > in another manner, only checking the size of the caches or the amount of
> > the swap still unused is totally meaningless and broken. That's
> > unfortunate but that's how all linux kernels I know of works, and what I
> > did in my tree at the moment is the only possible way to avoid deadlocks
> > without having to do a major rework on the accounting side.
>
> Could you describe your page table deadlock-avoidance algorithm in more
> detail please?

There is nothing specific with the pagetables. If the lowmem was eat by
skb instead of ptes you'd deadlock the very same way. The kernel will
just see lots of cache in highmem and of swap available (not to tell the
kernel never knows how much of such cache is really freeable or how much
of the mappings are swappable and that's the very next problem that will
leads to the same deadlock) and it will think there's "freeable" memory
available and it will keep looping. That's simply plain broken. The
only way if there's something freeable is to try to free it and if we
fail we say "oom". You cannot say if there's something freeable by
checking the cache size or the number of free swap pages, no-way.

If in 2.5 we want perfect accounting of freeable resources instead, fine
with me (that would math guarantee to never fail allocations if there's
at least one page freeable, while right now you only can calculate a
probabilistic measure), but it has to be _perfect_, and with 2.4 there
isn't such perfect accounting, so we definitely cannot rely on cache
size and swap available to know if to trigger oom or not. That's totally
broken and it will deadlock. I care about those minor theorical things
too, I want everything calculated and under control, I hate
approximations that can leads to deadlocks, and it pays off eventually.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Feb 23 2002 - 21:00:12 EST