Re: [RFC PATCH 0/2] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe
Date: Tue Sep 01 2020 - 14:14:19 EST
On Tue, Sep 01, 2020 at 07:40:20PM +0200, Gerald Schaefer wrote:
> +/*
> + * With dynamic page table levels on s390, the static pXd_addr_end() functions
> + * will not return corresponding dynamic boundaries. This is no problem as long
> + * as only pXd pointers are passed down during page table walk, because
> + * pXd_offset() will simply return the given pointer for folded levels, and the
> + * pointer iteration over a range simply happens at the correct page table
> + * level.
> + * It is however a problem with gup_fast, or other places walking the page
> + * tables w/o locks using READ_ONCE(), and passing down the pXd values instead
> + * of pointers. In this case, the pointer given to pXd_offset() is a pointer to
> + * a stack variable, which cannot be used for pointer iteration at the correct
> + * level. Instead, the iteration then has to happen by going up to pgd level
> + * again. To allow this, provide pXd_addr_end_folded() functions with an
> + * additional pXd value parameter, which can be used on s390 to determine the
> + * folding level and return the corresponding boundary.
> + */
> +#ifndef pgd_addr_end_folded
> +#define pgd_addr_end_folded(pgd, addr, end) pgd_addr_end(addr, end)
> +#endif
> +
> +#ifndef p4d_addr_end_folded
> +#define p4d_addr_end_folded(p4d, addr, end) p4d_addr_end(addr, end)
> +#endif
> +
> +#ifndef pud_addr_end_folded
> +#define pud_addr_end_folded(pud, addr, end) pud_addr_end(addr, end)
> +#endif
> +
> +#ifndef pmd_addr_end_folded
> +#define pmd_addr_end_folded(pmd, addr, end) pmd_addr_end(addr, end)
> +#endif
Feels like it would be cleaner to globally change pmd_addr_end() /etc
to require the extra pmd input rather that introduce this special rule
when *_folded needs to be used? NOP on all arches execpt s390..
There are not so many call sites that it seems too scary, and I
wouldn't be surprised if there are going to be more cases beyond GUP
that *should* be using the READ_ONCE trick.
Jason