Re: mapping user space buffer to kernel address space

From: Andrea Arcangeli (andrea@suse.de)
Date: Mon Oct 16 2000 - 16:45:33 EST


On Mon, Oct 16, 2000 at 11:29:27AM -0700, Linus Torvalds wrote:
> The page count is (or should be) sufficient, and if it weren't sufficient
> that would be a bug in the swap-out handling of anonymous or shm memory. I

If the page isn't locked swap_out will unmap it from the pte and anybody will
be able to start any kind of regular VM I/O on the page. This isn't what the
programmer expects and that's not what I consider pinning. It just looks much
saner to completly pin it and to keep it mapped as if it would be kernel memory
marked as reserved and then mapped to userspace via remap_page_range. I don't
see any particular benefit in allowing swap_out to mess with pinned pages that
would make worthwhile not to lock them.

Said that (and the above arguments are more a design issue) there's also a
pratical problem that I can see in not taking the page locked. What will
happen without the lock on the page is that swapout will unmap the pte while
adding the page to the swap cache and while writing it to disk. Then the page
will be clean and in the swap cache with reference count 2 because we take it
pinned and because it's part of the swap cache. Then we do the DMA from disk
to the page but the page is still considered clean from the VM
because it's an unlocked swap cache page. But instead the contents of the page
aren't in sync anymore with the contents of the on-disk counterpart of the swap
cache after DMA completed. So that will again generate corruption because it
breaks VM assumptions on the swap cache (even if it's more subtle to trigger
than the other problems). Maybe even more subtle issue could arise by
not taking the lock on the page while it's pinned.

> refuse to see page locking or similar for this kind of pinning (counts are
> recursive and re-entrant, a "lock bit" is not).

Infact swapin_readahead was deadlocking exactly because lock bit isn't
reentrant. But the fix for that deadlock looks worthwhile regardless of the
pinning-user-memory-with-lock-bit issue.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:10 EST