Re: [PATCH] anobjrmap 1/6 objrmap

From: Andrea Arcangeli
Date: Sat Mar 20 2004 - 07:49:53 EST


On Fri, Mar 19, 2004 at 07:08:26AM +0000, Hugh Dickins wrote:
> On Fri, 19 Mar 2004, Andrea Arcangeli wrote:
> > On Thu, Mar 18, 2004 at 11:21:07PM +0000, Hugh Dickins wrote:
> > > + if (!spin_trylock(&mm->page_table_lock))
> > > + return 1;
> > > +
> > [..]
> > > + if (down_trylock(&mapping->i_shared_sem))
> > > + return 1;
> > > +
> >
> > those two will hang your kernel in the workload I posted to the list a
> > few days ago.
>
> I missed the actual workload, will search the archives later.
> Fear I won't reproduce it exactly, and more anxious to plug
> the mremap-move and non-linear holes.
>
> > With previous kernels the above didn't matter, but starting with
> > 2.6.5-rc1 it does matter, if we cannot know if it's referenced or not,
> > we must assume it's not and return 0 or it lives locks hard with all
> > tasks stuck and one must click reboot.
>
> I don't much care whether we return 1 or 0 in that case, be happy to
> make the change if we understand _why_ it's suddenly become necessary.
> I don't remember seeing an explanation from you (and fair enough, you
> didn't want to get stuck on that detail) or anyone else.

it's the changes in the 2.6.5-rc1 page_referenced usage that requires us
to return 0, Andrew may want to elaborate those details.

if you don't fix it your set of patches will hang the box hard if you
hit swap with shared memory swap load.

> > I recommend you to share my objrmap patch, the objrmap should be exactly
> > the same for both of us.
>
> I can't take its mm/mmap.c (and if Martin keeps that page_table_lock
> avoidance in his tree, then I think he shouldn't have followed your
> advice to skip Dave's mmap_sem in unuse_process). But of course,
> I could have started from exactly yours and then a patch to change
> those back. Just so long as we're aware they're not identical.
>
> Hmm, where's page_test_and_clear_dirty gone in your final objrmap.c?

There's no such thing in Dave's objrmap patch.

>
> There's a lot that could be shared between the two approaches.
> Nice if we kept to the same struct page layout: I put int mapcount
> after atomic_t count because almost all arches have atomic_t as an
> int, so won't that placing save us 4 bytes on the 64-bit arches?

my mapcount is an unsigned long, so it doesn't matter, but I think I can
make it an unsigned int, that sounds a good idea since I doubt anybody
will ever fork >4G processes with 2.6. Only after making it an unsigned
it it will matter to position it near the atomic_t on 64bit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/