Re: [PATCH 02/15] blkcg: delay blkg destruction until after writeback has finished
From: Josef Bacik
Date: Fri Aug 31 2018 - 11:27:12 EST
On Thu, Aug 30, 2018 at 09:53:43PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <dennisszhou@xxxxxxxxx>
>
> Currently, blkcg destruction relies on a sequence of events:
> 1. Destruction starts. blkcg_css_offline() is called and blkgs
> release their reference to the blkcg. This immediately destroys
> the cgwbs (writeback).
> 2. With blkgs giving up their reference, the blkcg ref count should
> become zero and eventually call blkcg_css_free() which finally
> frees the blkcg.
>
> Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
> and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
> on the completion of all writeback associated with the blkcg. A count of
> the number of cgwbs is maintained and once that goes to zero, blkg
> destruction can follow. This should prevent premature blkg destruction.
>
> The new process for blkcg cleanup is as follows:
> 1. Destruction starts. blkcg_css_offline() is called which offlines
> writeback. Blkg destruction is delayed on the nr_cgwbs count to
> avoid punting potentially large amounts of outstanding writeback
> to root while maintaining any ongoing policies.
> 2. When the nr_cgwbs becomes zero, blkcg_destroy_blkgs() is called and
> handles destruction of blkgs. This is where the css reference held
> by each blkg is released.
> 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
> This finally frees the blkg.
>
> It seems in the past blk-throttle didn't do the most understandable
> things with taking data from a blkg while associating with current. So,
> the simplification and unification of what blk-throttle is doing caused
> this.
>
So the general approach is correct, but it's sort of confusing because you are
using nr_cgwbs as a reference counter, because it's set at 1 at blkg creation
time regardless of wether or not there's an assocated wb cg. So instead why not
just have a refcount_t ref, set it to 1 on creation and make the wb cg take a
ref when it's attached, and then just do the get/put like normal and cleanup as
you have below? What you are doing is a reference counter masquerading as a
count of the wb cg's, just add full ref counting to the blkcg and call it a day,
it'll be much less confusing. Thanks,
Josef