Re: [PATCH] x86/mm/ptdump: Fix soft lockup in page table walker.

From: Andrey Ryabinin
Date: Fri Feb 10 2017 - 09:38:25 EST




On 02/10/2017 05:29 PM, Mark Rutland wrote:
> Hi,
>
> On Fri, Feb 10, 2017 at 04:56:19PM +0300, Andrey Ryabinin wrote:
>> On 02/10/2017 04:02 PM, Dmitry Vyukov wrote:
>>> On Fri, Feb 10, 2017 at 1:15 PM, Andrey Ryabinin
>>> <aryabinin@xxxxxxxxxxxxx> wrote:
>>>> On 02/10/2017 02:18 PM, Thomas Gleixner wrote:
>>>>> On Fri, 10 Feb 2017, Dmitry Vyukov wrote:
>
>> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
>> index 8aa6bea..1599a5c 100644
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -373,6 +373,11 @@ static inline bool is_hypervisor_range(int idx)
>> #endif
>> }
>>
>> +static bool pgd_already_checked(pgd_t *prev_pgd, pgd_t *pgd, bool checkwx)
>> +{
>> + return checkwx && prev_pgd && (pgd_val(*prev_pgd) == pgd_val(*pgd));
>> +}
>> +
>> static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
>> bool checkwx)
>> {
>> @@ -381,6 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
>> #else
>> pgd_t *start = swapper_pg_dir;
>> #endif
>> + pgd_t *prev_pgd = NULL;
>> pgprotval_t prot;
>> int i;
>> struct pg_state st = {};
>> @@ -396,7 +402,8 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
>>
>> for (i = 0; i < PTRS_PER_PGD; i++) {
>> st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
>> - if (!pgd_none(*start) && !is_hypervisor_range(i)) {
>> + if (!pgd_none(*start) && !is_hypervisor_range(i) &&
>> + !pgd_already_checked(prev_pgd, start, checkwx)) {
>
> This means we'll fall into the else case...
>
>> if (pgd_large(*start) || !pgd_present(*start)) {
>> prot = pgd_flags(*start);
>> note_page(m, &st, __pgprot(prot), 1);
>> @@ -408,6 +415,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
>> note_page(m, &st, __pgprot(0), 1);
>
> ... i.e. the note_page() here, where we'll claim that the's nothing
> present due to the empty prot.
>
> That'll give erroneous output for the userspace pagetable dumps, so I do
> not think this is quite right, even though it gives a boot-time speedup.
>

For userspace pagetable dumps checkwx is false, so page_already_checked() will return false and will not
go into else. userspace pagetable dumps works as before.


> Thanks,
> Mark.
>