Re: [PATCH] Revert "mm/vunmap: add cond_resched() in vunmap_pmd_range"
From: Minchan Kim
Date:  Mon Nov 16 2020 - 12:53:29 EST
On Fri, Nov 13, 2020 at 08:25:29AM -0800, Minchan Kim wrote:
> On Thu, Nov 12, 2020 at 02:49:19PM -0800, Andrew Morton wrote:
> > On Thu, 12 Nov 2020 12:01:01 -0800 Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > 
> > > 
> > > On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote:
> > > > Hi Andrew,
> > > > 
> > > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote:
> > > > > On Thu,  5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > > > 
> > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e.
> > > > > > 
> > > > > > While I was doing zram testing, I found sometimes decompression failed
> > > > > > since the compression buffer was corrupted. With investigation,
> > > > > > I found below commit calls cond_resched unconditionally so it could
> > > > > > make a problem in atomic context if the task is reschedule.
> > > > > > 
> > > > > > Revert the original commit for now.
> > >
> > > How should we proceed this problem?
> > >
> > 
> > (top-posting repaired - please don't).
> > 
> > Well, we don't want to reintroduce the softlockup reports which
> > e47110e90584a2 fixed, and we certainly don't want to do that on behalf
> > of code which is using the unmap_kernel_range() interface incorrectly.
> > 
> > So I suggest either
> > 
> > a) make zs_unmap_object() stop doing the unmapping from atomic context or
> 
> It's not easy since the path already hold several spin_locks as well as
> per-cpu context. I could pursuit the direction but it takes several
> steps to change entire locking scheme in the zsmalloc, which will
> take time(we couldn't leave zsmalloc broken until then) and hard to
> land on stable tree.
> 
> > 
> > b) figure out whether the vmalloc unmap code is *truly* unsafe from
> >    atomic context - perhaps it is only unsafe from interrupt context,
> >    in which case we can rework the vmalloc.c checks to reflect this, or
> 
> I don't get the point. I assume your suggestion would be "let's make the
> vunmap code atomic context safe" but how could it solve this problem?
>      
> The point from e47110e90584a2 was softlockup could be triggered if
> vunamp deal with large mapping so need *explict reschedule* point
> for CONFIG_PREEMPT_VOLUNTARY. However, CONFIG_PREEMPT_VOLUNTARY
> doesn't consider peempt count so even though we could make vunamp
> atomic safe to make a call under spin_lock:
> 
> spin_lock(&A);
> vunmap
>   vunmap_pmd_range
>     cond_resched <- bang
>  
> Below options would have same problem, too.
> Let me know if I misunderstand something.
> 
> > 
> > c) make the vfree code callable from all contexts.  Which is by far
> >    the preferred solution, but may be tough.
> > 
> > 
> > Or maybe not so tough - if the only problem in the vmalloc code is the
> > use of mutex_trylock() from irqs then it may be as simple as switching
> > to old-style semaphores and using down_trylock(), which I think is
> > irq-safe.
> > 
> > However old-style semaphores are deprecated.  A hackyish approach might
> > be to use an rwsem always in down_write mode and use
> > down_write_trylock(), which I think is also callable from interrrupt
> > context.
> > 
> > But I have a feeling that there are other reasons why vfree shouldn't
> > be called from atomic context, apart from the mutex_trylock-in-irq
> > issue.
How about this?