Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall

From: Arnd Bergmann
Date: Mon Sep 11 2023 - 18:47:25 EST


On Sun, Sep 10, 2023, at 14:52, Gregory Price wrote:
> On Sat, Sep 09, 2023 at 05:18:13PM +0200, Arnd Bergmann wrote:
>>
>> I think a pointer to '__u64' is the most appropriate here,
>> that is compatible between 32-bit and 64-bit architectures
>> and covers all addresses until we get architectures with
>> 128-bit addressing.
>>
>> Thinking about it more, I noticed an existing bug in
>> both sys_move_pages and your current version of
>> sys_move_phys_pages: the 'pages' array is in fact not
>> suitable for compat tasks and needs an is_compat_task
>> check to load a 32-bit address from compat userspace on
>> the "if (get_user(p, pages + i))" line.
>>
>
> I'll clean up the current implementation for what I have on a v2 of an
> RFC, and then look at adding some pull-ahead patches to fix both
> move_pages and move_phys_pages for compat processes. Might take me a
> bit, I've only done compat work once before and I remember it being
> annoying to get right.

I think what you want is roughly this (untested):

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2159,6 +2159,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
const int __user *nodes,
int __user *status, int flags)
{
+ struct compat_uptr_t __user *compat_pages = (void __user *)pages;
int current_node = NUMA_NO_NODE;
LIST_HEAD(pagelist);
int start, i;
@@ -2171,8 +2172,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
int node;

err = -EFAULT;
- if (get_user(p, pages + i))
- goto out_flush;
+ if (in_compat_syscall() {
+ compat_uptr_t cp;
+
+ if (get_user(cp, compat_pages + i))
+ goto out_flush;
+
+ p = compat_ptr(cp);
+ } else {
+ if (get_user(p, pages + i))
+ goto out_flush;
+ }
if (get_user(node, nodes + i))
goto out_flush;

alternatively you could use the get_compat_pages_array()
helper that is already used in the do_pages_stat()
function.

> I did see other work on migrate.c hanging around on the list, I'll
> double check this hasn't already been discovered/handled.

It looks like I broke it, and it was working before my own
5b1b561ba73c8 ("mm: simplify compat_sys_move_pages"), which
only handled the nodes=NULL path.

I suppose nobody noticed the regression because there are very
few 32-bit NUMA systems, and even fewer cases in which one
would run compat userspace to manage a 64-bit NUMA machine.

>
> This only requires plumbing new 2 flags through do_pages_move, and no
> new user-exposed types or information.
>
> Is there an ick-factor with the idea of adding the following?
>
> MPOL_MF_PHYS_ADDR : Treat page migration addresses as physical
> MPOL_MF_PFN : Treat page migration addresses as PFNs

I would strongly prefer supporting only one of the two, and
a 64-bit physical address seems like the logical choice here.

>> These do not seem to be problematic from the ASLR perspective, so
>> I guess it may still be useful without CAP_SYS_ADMIN.
>
> After reviewing the capabilities documentation it seems like
> CAP_SYS_NICE is the appropriate capability. My last meassage I said
> CAP_SYS_ADMIN was probably correct, but I think using CAP_SYS_NICE
> is more appropriate unless there are strong feelings due to the use of
> PFN and Physcall Address.
>
> I'm not sure rowhammer is of great concern in this interface because you
> can't choose the destination address, only the destination node. Though
> I suppose someone could go nuts and try to "massage" a node in some way
> to get a statistical likelihood of placement (similar heap grooming).

I agree that this doesn't introduce any additional risk for rowhammer
attacks, but it seems slightly more logical to me to use CAP_SYS_ADMIN
if that is what the other interfaces use that handle physical addresses
and may leak address information.

Arnd