Re: Atomic operation for physically moving a page (formemory defragmentation)

From: Dave Hansen
Date: Thu Jun 24 2004 - 06:33:36 EST


On Thu, 2004-06-24 at 00:19, IWAMOTO Toshihiro wrote:
> At Wed, 23 Jun 2004 13:56:30 -0700,
> Dave Hansen wrote:
> >
> > On Wed, 2004-06-23 at 04:59, Hirokazu Takahashi wrote:
> > > We should know that many part of kernel code will access the page
> > > without holding a lock_page(). The lock_page() can't block them.
> >
> > No, but it will block them from establishing a new PTE to the page. You
> > need to:
> >
> > 1. make sure no new PTEs can be established to the page
> > 2. make sure there are no valid PTEs to the page.
> > 3. do the move
> >
> > My suggestion relates to 1, only.
>
> I wonder if you are talking exclusively about swap (anonymous) pages,
> where lock_page() might work.

I was talking about access to the pages through the user page tables,
only. You can't really fully prevent other access to them, because some
other kernel user could always do something like kmap() and write to the
page. There's probably some handy-dandy way to trap these kinds of
accesses in hardware, but Linux itself certainly can't provide that
guarantee without some restructuring to check for these areas any time
that a set_pte() is done.

Remember, we don't do things like rmap for the *kernel* users of pages.

> (I wonder why lock_page() is needed in do_swap_page(), btw.)
>
> For page caches, usually lock_page() cannot prevent accesses to them,
> and there are several kernel functions which don't need PTE mappings
> for access. One of such functions is do_generic_mapping_read().

You'll also have a generic problem with anything that does DMA, or that
uses the kernel page tables of any kind (kmap, vmalloc, etc...).

The DMA problem is a lot easier when there's an IOMMU, and even easier
on a partitioned ppc64 system where we have a virtualization layer to
take care of any areas under DMA that might undergo remapping.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/