Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping

From: Ard Biesheuvel
Date: Wed Jan 26 2022 - 03:37:15 EST


On Wed, 26 Jan 2022 at 05:21, Justin He <Justin.He@xxxxxxx> wrote:
>
> Hi Catalin
>
> > -----Original Message-----
> > From: Catalin Marinas <catalin.marinas@xxxxxxx>
> > Sent: Friday, January 7, 2022 6:43 PM
> > To: Jianyong Wu <Jianyong.Wu@xxxxxxx>
> > Cc: will@xxxxxxxxxx; Anshuman Khandual <Anshuman.Khandual@xxxxxxx>;
> > akpm@xxxxxxxxxxxxxxxxxxxx; david@xxxxxxxxxx; quic_qiancai@xxxxxxxxxxx;
> > ardb@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> > kernel@xxxxxxxxxxxxxxxxxxx; gshan@xxxxxxxxxx; Justin He
> > <Justin.He@xxxxxxx>; nd <nd@xxxxxxx>
> > Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create
> > pud mapping
> >
> > On Fri, Jan 07, 2022 at 09:10:57AM +0000, Jianyong Wu wrote:
> > > Hi Catalin,
> > >
> > > I roughly find the root cause.
> > > alloc_init_pud will be called at the very beginning of kernel boot in
> > create_mapping_noalloc where no memory allocator is initialized. But
> > lockdep check may need allocate memory. So, kernel take exception when
> > acquire lock.(I have not found the exact code that cause this issue)
> > that's say we may not be able to use a lock so early.
> > >
> > > I come up with 2 methods to address it.
> > > 1) skip dead lock check at the very beginning of kernel boot in lockdep
> > code.
> > > 2) provided 2 two versions of __create_pgd_mapping, one with lock in
> > > it and the other without. There may be no possible of race for memory
> > > mapping at the very beginning time of kernel boot, thus we can use the
> > > no lock version of __create_pgd_mapping safely.
> > > In my test, this issue is gone if there is no lock held in
> > > create_mapping_noalloc. I think create_mapping_noalloc is called early
> > > enough to avoid the race conditions of memory mapping, however, I have
> > > not proved it.
> >
> > I think method 2 would work better but rather than implementing new
> > nolock functions I'd add a NO_LOCK flag and check it in
> > alloc_init_pud() before mutex_lock/unlock. Also add a comment when
> > passing the NO_LOCK flag on why it's needed and why there wouldn't be
> > any races at that stage (early boot etc.)
> >
> The problematic code path is:
> __primary_switched
> early_fdt_map->fixmap_remap_fdt
> create_mapping_noalloc->alloc_init_pud
> mutex_lock (with Jianyong's patch)
>
> The problem seems to be that we will clear BSS segment twice if kaslr
> is enabled. Hence, some of the static variables in lockdep init process were
> messed up. That is to said, with kaslr enabled we might initialize lockdep
> twice if we add mutex_lock/unlock in alloc_init_pud().
>

Thanks for tracking that down.

Note that clearing the BSS twice is not the root problem here. The
root problem is that we set global state while the kernel runs at the
default link time address, and then refer to it again after the entire
kernel has been shifted in the kernel VA space. Such global state
could consist of mutable pointers to statically allocated data (which
would be reset to their default values after the relocation code runs
again), or global pointer variables in BSS. In either case, relying on
such a global variable after the second relocation performed by KASLR
would be risky, and so we should avoid manipulating global state at
all if it might involve pointer to statically allocated data
structures.

> In other ways, if we invoke mutex_lock/unlock in such a early booting stage.
> It might be unsafe because lockdep inserts lock_acquire/release as the complex
> hooks.
>
> In summary, would it better if Jianyong splits these early boot and late boot
> case? e.g. introduce a nolock version for create_mapping_noalloc().
>
> What do you think of it?
>

The pre-KASLR case definitely doesn't need a lock. But given that
create_mapping_noalloc() is only used to map the FDT, which happens
very early one way or the other, wouldn't it be better to move the
lock/unlock into other callers of __create_pgd_mapping()? (and make
sure no other users of the fixmap slots exist)