Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during sync()
From: Josef Bacik
Date: Mon Feb 11 2019 - 16:33:58 EST
On Mon, Feb 11, 2019 at 09:40:29PM +0100, Andrea Righi wrote:
> On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote:
> > On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote:
> > > This is an attempt to mitigate the priority inversion problem of a
> > > high-priority blkcg issuing a sync() and being forced to wait the
> > > completion of all the writeback I/O generated by any other low-priority
> > > blkcg, causing massive latencies to processes that shouldn't be
> > > I/O-throttled at all.
> > >
> > > The idea is to save a list of blkcg's that are waiting for writeback:
> > > every time a sync() is executed the current blkcg is added to the list.
> > >
> > > Then, when I/O is throttled, if there's a blkcg waiting for writeback
> > > different than the current blkcg, no throttling is applied (we can
> > > probably refine this logic later, i.e., a better policy could be to
> > > adjust the throttling I/O rate using the blkcg with the highest speed
> > > from the list of waiters - priority inheritance, kinda).
> > >
> > > This topic has been discussed here:
> > > https://lwn.net/ml/cgroups/20190118103127.325-1-righi.andrea@xxxxxxxxx/
> > >
> > > But we didn't come up with any definitive solution.
> > >
> > > This patch is not a definitive solution either, but it's an attempt to
> > > continue addressing this issue and handling the priority inversion
> > > problem with sync() in a better way.
> > >
> > > Signed-off-by: Andrea Righi <righi.andrea@xxxxxxxxx>
> > Talked with Tejun about this some and we agreed the following is probably the
> > best way forward
> First of all thanks for the update!
> > 1) Track the submitter of the wb work to the writeback code.
> Are we going to track the cgroup that originated the dirty pages (or
> maybe dirty inodes) or do you have any idea in particular?
The guy doing the sync(), so that way we can accomplish #3. But really this is
an implementation detail, however you want to accomplish it is fine by me.
> > 2) Sync() defaults to the root cg, and and it writes all the things as the root
> > cg.
> > 3) Add a flag to the cgroups that would make sync()'ers in that group only be
> > allowed to write out things that belong to its group.
> So, IIUC, when this flag is enabled a cgroup that is doing sync() would
> trigger the writeback of the pages that belong to that cgroup only and
> it waits only for these pages to be sync-ed, right? In this case
> writeback can still go at cgroup's speed.
> Instead when the flag is disabled, sync() would trigger writeback I/O
> globally, as usual, and it goes at full speed (root cgroup's speed).
> Am I understanding correctly?
Yup that's exactly it. Thanks,