Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug

From: Uladzislau Rezki
Date: Thu Sep 07 2023 - 14:24:28 EST


On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
> On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
> > On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > > > some contexts. Add a safer trylock version of the same function to do a
> > > > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > > > >
> > > > > It'd be nice to have more details as to precisely which contexts and what this
> > > > > resolves.
> > > >
> > > > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > > > context interrupting a lock-held region) but you're right.
> > > >
> > > > > > [applied test robot feedback on unused function fix.]
> > > > > > [applied Uladzislau feedback on locking.]
> > > > > >
> > > > > > Reported-by: Zhen Lei <thunder.leizhen@xxxxxxxxxxxxxxx>
> > > > > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > > > > Cc: rcu@xxxxxxxxxxxxxxx
> > > > > > Cc: Zqiang <qiang.zhang1211@xxxxxxxxx>
> > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > > mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > > > > 1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > > > > #ifdef CONFIG_PRINTK
> > > > > > bool vmalloc_dump_obj(void *object)
> > > > > > {
> > > > > > - struct vm_struct *vm;
> > > > > > void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > > > + const void *caller;
> > > > > > + struct vm_struct *vm;
> > > > > > + struct vmap_area *va;
> > > > > > + unsigned long addr;
> > > > > > + unsigned int nr_pages;
> > > > > >
> > > > > > - vm = find_vm_area(objp);
> > > > > > - if (!vm)
> > > > > > + if (!spin_trylock(&vmap_area_lock))
> > > > > > + return false;
> > > > >
> > > > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > > > also concerned that in the past this function would return false only if the
> > > > > address was not a vmalloc one, but now it might just return false due to lock
> > > > > contention and the user has no idea which it is?
> > > > >
> > > > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > > > vs. the below cannot find case.
> > > >
> > > > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > > > if the vmalloc look up fails.
> > >
> > > No, you output exactly what was output before, only changing what it
> > > means and in no way differentiating between couldn't find vmalloc
> > > area/couldn't get lock.
> >
> > 2/2 does this:
> > - if (virt_addr_valid(object))
> > + if (is_vmalloc_addr(object))
> > + type = "vmalloc memory";
> > + else if (virt_addr_valid(object))
> > type = "non-slab/vmalloc memory";
> >
> > This code is executed only if vmalloc_dump_obj() returns false. The
> > is_vmalloc_addr() was added by 2/2 which is newly added right?
> >
> > You are right we are not differentiating between trylock failure and failure to
> > find the vmalloc area. I was just saying, even though we don't differentiate,
> > we do print "vmalloc memory" right? That wasn't being printed before.
> >
> > > > Also the reporter's usecase is not a common one. We only attempt to dump
> > > > information if there was a debug objects failure (example if somebody did a
> > > > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > > > still print something about the address.
> > >
> > > Right, but the function still purports to do X but does Y.
> > >
> > > >
> > > > > Under heavy lock contention aren't you potentially breaking the ability to
> > > > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > > > contexts under which acquiring this spinlock is not appropriate?
> > > >
> > > > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > > > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > > > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > > > that also using trylock.
> > >
> > > Right so somebody now has to find this email to realise that. I hate
> > > implicit knowledge like this, it needs a comment. It also furthers the
> > > point that it'd be useful to differentiate between the two.
> >
> > This is a valid point, and I acknowledged it in last email. A code comment could
> > indeed be useful.
>
> Thanks, yeah this may seem trivial, but I am quite sensitive about things
> being added to the code base that are neither described in commit msg nor
> in a comment or elsewhere and become 'implicit' in a sense.
>
> So just a simple comment here would be helpful, and I'm glad we're in
> agreement on that, will leave to you to do a follow up patch.
>
> >
> > So I guess from an agreement standpoint, I agree:
> >
> > 1/2 could use an additional comment explaining why we need trylock (sighting
> > the RT sleeping lock issue).
> >
> > 2/2 could update the existing code to convert "non-slab/vmalloc" to
> > "non-slab/non-vmalloc". Note: that's an *existing* issue.
>
> Yeah sorry this whole thing was rather confusing, it did indeed (unclearly)
> specify non-/non- in the past (on assumption dumping function would work),
> addition of vmalloc check now makes that correct again, the phrasing is the
> issue.
>
> You can leave this as-is as yeah, you're right, this was a pre-existing issue.
>
> virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is
> checked above so already been ruled out, now you ruled out vmalloc.
>
> Just a bit tricksy.
>
> >
> > The issue in 2/2 is not a new one so that can certainly be a separate patch.
> > And while at it, we could update the comment in that patch as well.
> >
> > But the whole differentiating between trylock vs vmalloc area lookup failure
> > is not that useful -- just my opinion fwiw! I honestly feel differentiating
> > between trylock vs vmalloc area lookup failure complicates the code because
> > it will require passing this information down from vmalloc_dump_obj() to the
> > caller AFAICS and I am not sure if the person reading the debug will really
> > care much. But I am OK with whatever the -mm community wants and I am happy
> > to send out a new patch on top with the above that I agree on since Andrew
> > took these 2 (but for the stuff I don't agree, I would appreciate if you
> > could send a patch for review and I am happy to review it!).
>
> Ah right, I think maybe I wasn't clear, all I meant to suggest is to output
> log output rather than feed anything back to caller, something like:-
>
> if (!spin_trylock(&vmap_area_lock)) {
> pr_cont(" [couldn't acquire vmap lock]\n");
> ...
> }
>
> My concern is that in the past this function would only return false if it
> couldn't find the address in a VA, now it returns false also if you happen
> to call it when the spinlock is locked, which might be confusing for
> somebody debugging this.
>
> HOWEVER, since you now indicate that the address is vmalloc anyway, and you
> _absolutely cannot_ give any further details safely, perhaps this
> additional information is indeed not that usful.
>
> My concern was just feeling antsy that we suddenly don't do something
> because a lock happens to be applied but as you say that cannot be helped
> in certain contexts.
>
> So actually, leave this.
>
> >
> > As you mentioned, this series is a stability fix and we can put touch-ups on
> > top of it if needed, and there is also plenty of time till the next merge
> > window. Allow me a few days and I'll do the new patch on top (I'd say dont
> > bother to spend your time on it, I'll do it).
>
> Ack, I was just a little frustrated we didn't reach a resolution on review
> (either deciding things could be deferred or having changes) before
> merge. Obviously fine to prioritise, but would be good to have that
> explicitly stated.
>
> >
> > thanks,
> >
> > - Joel
> >
> >
> > >
> > >
>
> Anyway, so TL;DR:-
>
> 1. As we both agree, add a comment to explain why you need the spin trylock.
> (there are no further steps :P)
>
> And I don't believe this actually needs any further changes after this
> discussion*, so if you fancy doing a follow up to that effect that will
> suffice for me thanks!
>
For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non
slepable context, this is just reality, because we use a sleep type of
spinlock.

I am not sure how urgent we need this fix. But to me it looks like
debuging and corner case. Probably i am wrong and miss something.
But if it is correct, i would just bailout for RT kernel and rework
later in a more proper way. For example implement a safe way of RCU
scan but this is also another story.

--
Uladzislau Rezki