Re: [PATCH v3 0/6] hugetlb_cgroup: Add hugetlb_cgroup reservation limits
From: Mina Almasry
Date: Thu Sep 05 2019 - 16:07:45 EST
On Tue, Sep 3, 2019 at 4:46 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 9/3/19 10:57 AM, Mike Kravetz wrote:
> > On 8/29/19 12:18 AM, Michal Hocko wrote:
> >> [Cc cgroups maintainers]
> >>
> >> On Wed 28-08-19 10:58:00, Mina Almasry wrote:
> >>> On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Mon 26-08-19 16:32:34, Mina Almasry wrote:
> >>>>> mm/hugetlb.c | 493 ++++++++++++------
> >>>>> mm/hugetlb_cgroup.c | 187 +++++--
> >>>>
> >>>> This is a lot of changes to an already subtle code which hugetlb
> >>>> reservations undoubly are.
> >>>
> >>> For what it's worth, I think this patch series is a net decrease in
> >>> the complexity of the reservation code, especially the region_*
> >>> functions, which is where a lot of the complexity lies. I removed the
> >>> race between region_del and region_{add|chg}, refactored the main
> >>> logic into smaller code, moved common code to helpers and deleted the
> >>> duplicates, and finally added lots of comments to the hard to
> >>> understand pieces. I hope that when folks review the changes they will
> >>> see that! :)
> >>
> >> Post those improvements as standalone patches and sell them as
> >> improvements. We can talk about the net additional complexity of the
> >> controller much easier then.
> >
> > All such changes appear to be in patch 4 of this series. The commit message
> > says "region_add() and region_chg() are heavily refactored to in this commit
> > to make the code easier to understand and remove duplication.". However, the
> > modifications were also added to accommodate the new cgroup reservation
> > accounting. I think it would be helpful to explain why the existing code does
> > not work with the new accounting. For example, one change is because
> > "existing code coalesces resv_map entries for shared mappings. new cgroup
> > accounting requires that resv_map entries be kept separate for proper
> > uncharging."
> >
> > I am starting to review the changes, but it would help if there was a high
> > level description. I also like Michal's idea of calling out the region_*
> > changes separately. If not a standalone patch, at least the first patch of
> > the series. This new code will be exercised even if cgroup reservation
> > accounting not enabled, so it is very important than no subtle regressions
> > be introduced.
>
> While looking at the region_* changes, I started thinking about this no
> coalesce change for shared mappings which I think is necessary. Am I
> mistaken, or is this a requirement?
>
No coalesce is a requirement, yes. The idea is that task A can reseve
range [0-1], and task B can reserve range [1-2]. We want the code to
put in 2 regions:
1. [0-1], with cgroup information that points to task A's cgroup.
2. [1-2], with cgroup information that points to task B's cgroup.
If coalescing is happening, then you end up with one region [0-2] with
cgroup information for one of those cgroups, and someone gets
uncharged wrong when the reservation is freed.
Technically we can still coalesce if the cgroup information is the
same and I can do that, but the region_* code becomes more
complicated, and you mentioned on an earlier patchset that you were
concerned with how complicated the region_* functions are as is.
> If it is a requirement, then think about some of the possible scenarios
> such as:
> - There is a hugetlbfs file of size 10 huge pages.
> - Task A has reservations for pages at offset 1 3 5 7 and 9
> - Task B then mmaps the entire file which should result in reservations
> at 0 2 4 6 and 8.
> - region_chg will return 5, but will also need to allocate 5 resv_map
> entries for the subsequent region_add which can not fail. Correct?
> The code does not appear to handle this.
>
I thought the code did handle this. region_chg calls
allocate_enough_cache_for_range_and_lock(), which in this scenario
will put 5 entries in resv_map->region_cache. region_add will use
these 5 region_cache entries to do its business.
I'll add a test in my suite to test this case to make sure.
> BTW, this series will BUG when running libhugetlbfs test suite. It will
> hit this in resv_map_release().
>
> VM_BUG_ON(resv_map->adds_in_progress);
>
Sorry about that, I've been having trouble running the libhugetlbfs
tests, but I'm still on it. I'll get to the bottom of this by next
patch series.
> --
> Mike Kravetz