Re: a racy access flag clearing warning when calling mmap system call
From: chenjiankang
Date: Thu Dec 07 2017 - 22:20:42 EST
在 2017/12/7 21:23, Will Deacon 写道:
> On Thu, Dec 07, 2017 at 09:46:59AM +0800, Yisheng Xie wrote:
>> On 2017/12/1 21:18, Will Deacon wrote:
>>> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>>> ------------[ cut here ]------------
>>>> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
>>>
>>> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
>>> failure with something more recent, please? We've fixed many bugs since
>>> then, some of them involving huge pages.
>>
>> Yeah, this is and old kernel, but I find a scene that will cause this warn_on:
>> When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access Flag.
>> dup_mmap
>> -> copy_page_range
>> -> copy_pud_range
>> -> copy_pmd_range
>> -> copy_huge_pmd
>> -> pmd_mkold
>>
>> If we do not have any access after dup_mmap, and start to split this thp,
>> it will cause this call trace in the old kernel, right?
>>
>> It seems this is normal scene but will cause call trace for this old kernel,
>> therefore, for this old kernel, we should just remove this WARN_ON_ONCE, right?
>
> Whilst racy clearing of the access flag should be safe in practice, I like
> having the warning around because it does indicate that we're setting
> something to old which could immediately be made young again by the CPU.
>
> In this case, it looks like the mm isn't even live, so a better approach
> would probably be to predicate that conditional on mm == current->active_mm
> or something like that. That also avoids us getting false positive for
> the dirty bit case, which would be harmful if the table was installed.
>
> diff below. It's still racy with concurrent fork, but I don't want this
> check to become a generic "does my caller hold all the locks to protect
> against a concurrent walk" predicate and it just means we won't catch all
> possible races.
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 149d05fb9421..8fe103b1e101 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,6 +42,8 @@
> #include <asm/cmpxchg.h>
> #include <asm/fixmap.h>
> #include <linux/mmdebug.h>
> +#include <linux/mm_types.h>
> +#include <linux/sched.h>
>
> extern void __pte_error(const char *file, int line, unsigned long val);
> extern void __pmd_error(const char *file, int line, unsigned long val);
> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> }
> }
>
> -struct mm_struct;
> -struct vm_area_struct;
> -
> extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>
> /*
> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> * hardware updates of the pte (ptep_set_access_flags safely changes
> * valid ptes without going through an invalid entry).
> */
> - if (pte_valid(*ptep) && pte_valid(pte)) {
> + if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> + (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
> VM_WARN_ONCE(!pte_young(pte),
> "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
> __func__, pte_val(*ptep), pte_val(pte));
>
>
> .
>
Hi will
From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
And the the PTE access bit is all zero.
diff below. Whether the access bit of the new ptep should be judged to eliminate the
false positive?
Thanks
Jiankang
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2987d5a..3c1b0c6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
* valid ptes without going through an invalid entry).
*/
if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
- VM_WARN_ONCE(!pte_young(pte),
+ VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
"%s: racy access flag clearing: %016llx -> %016llx",
__func__, pte_val(*ptep), pte_val(pte));
VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),
--
1.7.12.4