Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines)

From: Andrea Arcangeli
Date: Mon Mar 08 2004 - 19:36:53 EST


On Mon, Mar 08, 2004 at 04:10:46PM -0800, Andrew Morton wrote:
> Andrea Arcangeli <andrea@xxxxxxx> wrote:
> >
> > > btw, mincore() has always been broken with nonlinear vma's. If you could
> > > fix that up some time using that pagetable walker it would be nice. It's
> > > not very important though.
> >
> > Ok! I'm still late at this though, I wish I would be working on the
> > nonlinear stuff by now ;), I'm still stuck at the anon_vma_chain...
>
> As I say, broken mincore() on nonlinear mappings isn't a showstopper ;)
>
> > If I understand well, vmtruncate will also need the pagetable walker to
> > nuke all mappings of the last pages of the files before we free them
> > from the pagecache. So it should be a library call that mincore can use
> > too then, I don't see problems.
>
> If we want to bother with the traditional truncate-causes-SIGBUS semantics
> on nonlinear mappings, yes. I guess it would be best to do that if
> possible.

yes, this was my object.


btw, this reminds me another trouble, that is what to do in the case
where in 2.4 we convert file-mapped-pages into anonymous pages while
they're still mapped (I don't remeber exactly what could do that but it
could happen, do you remember the details? I think this is the case that
Hugh calls the Morton pages, he also had troubles in his anobjrmap
attempt but I think it was more a fixme comment). In 2.4 the swap_out
had to deal with that somehow, but with my anobjrmap the vm will now
lose track of those pages, so they will become unswappable. Not sure if
they were unswappable in 2.4 too and/or if 2.6-rmap could leave them
visible to the vm or not.

Also these pages should be swapped to the swap device, if something,
they lost reference of the inode.

Input on the Morton pages is appreciated ;)

> > btw (for completeness), about the cpu consumption concerns about objrmap
> > w.r.t. security (that was Ingo's only argument against objrmap),
> > whatever malicious waste of cpu that could happen during paging, can be
> > already triggered in any kernel out there by using truncate on the same
> > mappings instead of swapping them out.
>
> Yes, malicious apps can DoS the machine in many ways. I'm more concerned
> about non-malicious ones getting hurt by the new search activity. Say, a
> single-threaded app which uses a huge number of vma's to map discontiguous
> parts of a file. The 2.4-style virtual scan would handle that OK, and the
> 2.6-style pte_chain walk would handle it OK too. People do weird things.

That's the db normal scenario and it's running fine on 16G boxes today
in 2.4 with objrmap. Note that we're talking about swapping here, and
compared to swap_out that cpu load in the vma chains is little compared
to throwing away hundred gigs of address space before swapping the first
4k (infact objrmap was the 1st showstopper fix to make huge-shm-swap work
properly). So I'm not very concerned. Also for some db those pages
should be mlocked, so to be optimal we should remove them from the lru
while they're mlocked as Martin suggested. and normal pure db (not
applications) don't swap, so they will only run faster.

Longer term on 64bit those weird setups will be all but common as far as
I can tell.

Overall it sounds the best trade-off.

> (objrmap could perhaps terminate the vma walk after it sees the page->count
> fall to a value which means there are no more pte's mapping the page - that
> would halve the search cost on average).

It really should! Agreed. Feel free to go ahead and fix it. I checked
the page_count before starting the loop in my 2.4 implementation, but I
forgot to do that during the core of the loop. I was only breaking the
loop at the first pte_young I would find (my objrmap isn't capable of
clearing the pte_young bit, I leave that task to the pagetable walk, you
know my 2.4 objrmap is an hybrid between objrmap and the swap_out loop,
2.6 handles all this differently).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/