Re: dm bufio: Reduce dm_bufio_lock contention
From: Michal Hocko
Date: Fri Jun 15 2018 - 03:32:20 EST
On Thu 14-06-18 14:34:06, Mikulas Patocka wrote:
> On Thu, 14 Jun 2018, Michal Hocko wrote:
> > On Thu 14-06-18 15:18:58, jing xia wrote:
> > [...]
> > > PID: 22920 TASK: ffffffc0120f1a00 CPU: 1 COMMAND: "kworker/u8:2"
> > > #0 [ffffffc0282af3d0] __switch_to at ffffff8008085e48
> > > #1 [ffffffc0282af3f0] __schedule at ffffff8008850cc8
> > > #2 [ffffffc0282af450] schedule at ffffff8008850f4c
> > > #3 [ffffffc0282af470] schedule_timeout at ffffff8008853a0c
> > > #4 [ffffffc0282af520] schedule_timeout_uninterruptible at ffffff8008853aa8
> > > #5 [ffffffc0282af530] wait_iff_congested at ffffff8008181b40
> > This trace doesn't provide the full picture unfortunately. Waiting in
> > the direct reclaim means that the underlying bdi is congested. The real
> > question is why it doesn't flush IO in time.
> I pointed this out two years ago and you just refused to fix it:
Let me be evil again and let me quote the old discussion:
: > I agree that mempool_alloc should _primarily_ sleep on their own
: > throttling mechanism. I am not questioning that. I am just saying that
: > the page allocator has its own throttling which it relies on and that
: > cannot be just ignored because that might have other undesirable side
: > effects. So if the right approach is really to never throttle certain
: > requests then we have to bail out from a congested nodes/zones as soon
: > as the congestion is detected.
: > Now, I would like to see that something like that is _really_ necessary.
: Currently, it is not a problem - device mapper reports the device as
: congested only if the underlying physical disks are congested.
: But once we change it so that device mapper reports congested state on its
: own (when it has too many bios in progress), this starts being a problem.
So has this changed since then? If yes then we can think of a proper
solution but that would require to actually describe why we see the
congestion, why it does help to wait on the caller rather than the
Throwing statements like ...
> I'm sure you'll come up with another creative excuse why GFP_NORETRY
> allocations need incur deliberate 100ms delays in block device drivers.
... is not really productive. I've tried to explain why I am not _sure_ what
possible side effects such a change might have and your hand waving
didn't really convince me. MD is not the only user of the page
E.g. why has 41c73a49df31 ("dm bufio: drop the lock when doing GFP_NOIO
allocation") even added GFP_NOIO request in the first place when you
keep retrying and sleep yourself? The changelog only describes what but
doesn't explain why. Or did I misread the code and this is not the
allocation which is stalling due to congestion?