Re: [PATCH] fs/mbcache: make count_objects more robust.
From: Jan Kara
Date: Mon Jan 08 2018 - 04:21:23 EST
On Fri 05-01-18 08:54:56, jiang.biao2@xxxxxxxxxx wrote:
> > On Mon 27-11-17 11:30:19, Jiang Biao wrote:
> > > When running ltp stress test for 7*24 hours, the vmscan occasionally
> > > complains the following warning continuously,
> >>
> >> mb_cache_scan+0x0/0x3f0 negative objects to delete
> >> nr=-9232265467809300450
> >> ...
> >>
> >> The tracing result shows the freeable(mb_cache_count returns)
> >> is -1, which causes the continuous accumulation and overflow of
> >> total_scan.
> >>
> >> This patch make sure the mb_cache_count not return negative value,
> >> which make the mbcache shrinker more robust.
> >>
> >> Signed-off-by: Jiang Biao <jiang.biao2@xxxxxxxxxx>
> >
> > Going through some old email...
> > a) c_entry_count is unsigned so your patch is a nop as Coverity properly
> > noticed.
> Indeed, would the following casting be good?
> + if (unlikely((int)(cache->c_entry_count) < 0))
> + return 0;
That check would at least have a chance of hitting but still it is just
hiding the real problem.
> > b) c_entry_count being outside 0..2*cache->c_max_entries is a plain bug. I
> > went through the logic and cannot find out how that could happen though.
> Is there any possibility that decreasing c_entry_count from 0 to -1
> in mb_cache_entry_delete?
If we think we have -1 entries in a list, we have a larger problem than
just the wrong behavior of the shrinker. This is just a plain counter of
entries protected by a spinlock so there isn't space for accounting errors
or anything like that. If you can reproduce the problem on some reasonably
recent kernel, I'd be interested in debugging this.
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR