Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup

From: Jan Kara
Date: Wed Apr 29 2020 - 06:25:47 EST

Next message: Rafael J. Wysocki: "Re: [net-next PATCH v2 0/3] Introduce new APIs to support phylink and phy layers"
Previous message: Arend Van Spriel: "Re: [PATCH] brcmfmac: no need to check return value of debugfs_create functions"
In reply to: Johannes Weiner: "Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup"
Next in thread: Tejun Heo: "Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 29-04-20 07:47:34, Dave Chinner wrote:
> On Tue, Apr 28, 2020 at 12:13:46PM -0400, Dan Schatzberg wrote:
> > The loop device runs all i/o to the backing file on a separate kworker
> > thread which results in all i/o being charged to the root cgroup. This
> > allows a loop device to be used to trivially bypass resource limits
> > and other policy. This patch series fixes this gap in accounting.
>
> How is this specific to the loop device? Isn't every block device
> that offloads work to a kthread or single worker thread susceptible
> to the same "exploit"?
>
> Or is the problem simply that the loop worker thread is simply not
> taking the IO's associated cgroup and submitting the IO with that
> cgroup associated with it? That seems kinda simple to fix....
>
> > Naively charging cgroups could result in priority inversions through
> > the single kworker thread in the case where multiple cgroups are
> > reading/writing to the same loop device.
>
> And that's where all the complexity and serialisation comes from,
> right?
>
> So, again: how is this unique to the loop device? Other block
> devices also offload IO to kthreads to do blocking work and IO
> submission to lower layers. Hence this seems to me like a generic
> "block device does IO submission from different task" issue that
> should be handled by generic infrastructure and not need to be
> reimplemented multiple times in every block device driver that
> offloads work to other threads...

Yeah, I was thinking about the same when reading the patch series
description. We already have some cgroup workarounds for btrfs kthreads if
I remember correctly, we have cgroup handling for flush workers, now we are
adding cgroup handling for loopback device workers, and soon I'd expect
someone comes with a need for DM/MD worker processes and IMHO it's getting
out of hands because the complexity spreads through the kernel with every
subsystem comming with slightly different solution to the problem and also
the number of kthreads gets multiplied by the number of cgroups. So I
agree some generic solution how to approach IO throttling of kthreads /
workers would be desirable.

OTOH I don't have a great idea how the generic infrastructure should look
like...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR

Next message: Rafael J. Wysocki: "Re: [net-next PATCH v2 0/3] Introduce new APIs to support phylink and phy layers"
Previous message: Arend Van Spriel: "Re: [PATCH] brcmfmac: no need to check return value of debugfs_create functions"
In reply to: Johannes Weiner: "Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup"
Next in thread: Tejun Heo: "Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]