Re: [PATCH] mm: Revert pinned_vm braindamage

From: Christoph Lameter
Date: Fri Jun 21 2013 - 10:44:44 EST

On Thu, 20 Jun 2013, Roland Dreier wrote:

> Christoph, your argument would be a lot more convincing if you stopped
> repeating this nonsense. Sure, in a strict sense, it might be true

Well this is regarding tracking of pages that need to stay resident and
since the kernel does the pinning through the IB subsystem it is trackable
right there. No nonsense and no need for a separate pinning system call.

> that the IB subsystem in the kernel is the code thatactually pins
> memory, but given that unprivileged userspace can tell the kernel to
> pin arbitrary parts of its memory for any amount of time, is that
> relevant? And in fact taking your "initiate" word choice above, I
> don't even think your statement is true -- userspace initiates the
> pinning by, for example, doing an IB memory registration (libibverbs
> ibv_reg_mr() call), which turns into a system call, which leads to the
> kernel trying to pin pages. The pages aren't unpinned until userspace
> unregisters the memory (or causes a cleanup by closing the context
> fd).

In some sense userspace initiates everything since the kernels purpose
is to run applications. So you can say that everything is user initated if
you wanted.

However, the user visible mechanism here is a registration of memory with
the IB subsystem for RDMA. The primary intend is not to pin the pages but
to make memory available for remote I/O. The pages are pinned *because*
otherwise remote RDMA operations could corrupt memory due to the kernel
moving/evicting memory.

> Here's an argument by analogy. Would it make any sense for me to say
> userspace can't mlock memory, because only the kernel can set
> VM_LOCKED on a vma? Of course not. Userspace has the mlock() system
> call, and although the actual work happens in the kernel, we clearly
> want to be able to limit the amount of memory locked by the kernel ON

I would think that mlock is a memory management function and therefore the
app/user directly says that the memory is not to be evicted from memory.

This is different for the IB subsystem which is dealing with I/O and only
indirectly with memory. Would we have a different mechanism to prevent
reclaim etc the we would not need to pin the pages.

Actual there is such a mechanism that could be used here. If you had a
reserved memory region that is not mapped by the kernel (boot time alloc,
device memory) then you can use VM_PFNMAP to refer to that region and the
kernel would not be able to do reclaim on that memory. No pinning
necessary if the IB subsystem would register that type of memory.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at