Re: [PATCH v9 0/8] stg mail -e --version=v9 \

From: Alexander Duyck
Date: Thu Sep 12 2019 - 11:42:13 EST

Next message: Christophe Leroy: "Re: [PATCH v1 3/4] powerpc: Add support for GENERIC_EARLY_IOREMAP"
Previous message: Nathan Lynch: "Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline"
In reply to: Michal Hocko: "Re: [PATCH v9 0/8] stg mail -e --version=v9 \"
Next in thread: Mel Gorman: "Re: [PATCH v9 0/8] stg mail -e --version=v9 \"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2019-09-12 at 11:19 +0200, Michal Hocko wrote:
> On Wed 11-09-19 08:12:03, Alexander Duyck wrote:
> > On Wed, Sep 11, 2019 at 4:36 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > On Tue 10-09-19 14:23:40, Alexander Duyck wrote:
> > > [...]
> > > > We don't put any limitations on the allocator other then that it needs to
> > > > clean up the metadata on allocation, and that it cannot allocate a page
> > > > that is in the process of being reported since we pulled it from the
> > > > free_list. If the page is a "Reported" page then it decrements the
> > > > reported_pages count for the free_area and makes sure the page doesn't
> > > > exist in the "Boundary" array pointer value, if it does it moves the
> > > > "Boundary" since it is pulling the page.
> > >
> > > This is still a non-trivial limitation on the page allocation from an
> > > external code IMHO. I cannot give any explicit reason why an ordering on
> > > the free list might matter (well except for page shuffling which uses it
> > > to make physical memory pattern allocation more random) but the
> > > architecture seems hacky and dubious to be honest. It shoulds like the
> > > whole interface has been developed around a very particular and single
> > > purpose optimization.
> >
> > How is this any different then the code that moves a page that will
> > likely be merged to the tail though?
>
> I guess you are referring to the page shuffling. If that is the case
> then this is an integral part of the allocator for a reason and it is
> very well obvious in the code including the consequences. I do not
> really like an idea of hiding similar constrains behind a generic
> looking feature which is completely detached from the allocator and so
> any future change of the allocator might subtly break it.
>
> > In our case the "Reported" page is likely going to be much more
> > expensive to allocate and use then a standard page because it will be
> > faulted back in. In such a case wouldn't it make sense for us to want
> > to keep the pages that don't require faults ahead of those pages in
> > the free_list so that they are more likely to be allocated?
>
> OK, I was suspecting this would pop out. And this is exactly why I
> didn't like an idea of an external code imposing a non obvious constrains
> to the allocator. You simply cannot count with any ordering with the
> page allocator. We used to distinguish cache hot/cold pages in the past
> and pushed pages to the specific end of the free list but that has been
> removed. There are other potential changes like that possible. Shuffling
> is a good recent example.
>
> Anyway I am not a maintainer of this code. I would really like to hear
> opinions from Mel and Vlastimil here (now CCed - the thread starts
> http://lkml.kernel.org/r/20190907172225.10910.34302.stgit@xxxxxxxxxxxxxxxxxxxxxx

One alternative I could do if we are wanting to make things more obvious
would be to add yet another add_to_free_list_XXX function that would be
used specifically for reported pages. The only real requirement I have is
that we have to insert reported pages such that we generate a continuous
block without interleaving non-reported pages in between. So as long as
reported pages are always inserted at the boundary/iterator when we are
actively reporting on a section then I can guarantee the list won't have
gaps formed.

Also as far as the concerns about this being an external user, one thing I
can do is break up the headers a bit and define an internal header in mm/
that defines all the items used by the page allocator, and another in
include/linux/ that defines what is used by devices when receiving the
notifications. It would then help to reduce the likelihood of an outside
entity messing with the page allocator too much.

Next message: Christophe Leroy: "Re: [PATCH v1 3/4] powerpc: Add support for GENERIC_EARLY_IOREMAP"
Previous message: Nathan Lynch: "Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline"
In reply to: Michal Hocko: "Re: [PATCH v9 0/8] stg mail -e --version=v9 \"
Next in thread: Mel Gorman: "Re: [PATCH v9 0/8] stg mail -e --version=v9 \"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]