Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
From: Michel Lespinasse
Date: Wed Jun 17 2020 - 02:30:19 EST
On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne <shorne@xxxxxxxxx> wrote:
> On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > > This makes me wonder actually - maybe there is a latent bug that got
> > > exposed after my change added the rwsem_is_locked assertion to the
> > > lockdep_assert_held one. If that is the case, it may be helpful to
> > > bisect when that issue first appeared, by testing before my patchset
> > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> >
> > Hello,
> >
> > I tried to bisect it, but I think this issue goes much further back.
> >
> > Just with the below patch booting fails all the way back to v5.7.
> >
> > What does this mean by they way, why would mmap_assert_locked() want to assert
> > that the rwsem_is_locked() is not true?
It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in
other words it asserts that cond is false. Yeah, I agree it is kinda
confusing. But in our case, it asserts that the rwsem is locked, which
is what we want.
> The openrisc code that was walking the page ranges was not locking mm. I have
> added the below patch to v5.8-rc1 and it seems to work fine. I will send a
> better patch in a bit.
>
> iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> index c152a68811dd..bd5f05dd9174 100644
> --- a/arch/openrisc/kernel/dma.c
> +++ b/arch/openrisc/kernel/dma.c
> @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
> * We need to iterate through the pages, clearing the dcache for
> * them and setting the cache-inhibit bit.
> */
> + mmap_read_lock(&init_mm);
> error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
> NULL);
> + mmap_read_unlock(&init_mm);
> if (error)
> return ERR_PTR(error);
> return cpu_addr;
> @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
> {
> unsigned long va = (unsigned long)cpu_addr;
>
> + mmap_read_lock(&init_mm);
> /* walk_page_range shouldn't be able to fail here */
> WARN_ON(walk_page_range(&init_mm, va, va + size,
> &clear_nocache_walk_ops, NULL));
> + mmap_read_unlock(&init_mm);
> }
Thanks a lot for getting to the bottom of this. I think this is the proper fix.