Re: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

From: Michal Hocko
Date: Thu May 24 2018 - 07:26:21 EST


On Wed 23-05-18 16:07:16, Huaisheng HS1 Ye wrote:
> From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> Sent: Wednesday, May 23, 2018 2:37 AM
> >
> > On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> > > From: Huaisheng Ye <yehs1@xxxxxxxxxx>
> > >
> > > Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> > >
> > > Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> > > the bottom three bits of GFP mask is reserved for storing encoded
> > > zone number.
> > >
> > > The encoding method is XOR. Get zone number from enum zone_type,
> > > then encode the number with ZONE_NORMAL by XOR operation.
> > > The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> > > the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> > > can be used as before.
> > >
> > > Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> > > a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> > > for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> > > __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> > > __GFP_ZONE_MOVABLE is created to realize it.
> > >
> > > With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> > > enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> > > GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> > >
> > > Decode zone number directly from bottom three bits of flags in gfp_zone.
> > > The theory of encoding and decoding is,
> > > A ^ B ^ B = A
> >
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
>
> Dear Michal,
>
> Let me try to reply your questions.
> Exactly, GFP_ZONE_TABLE is too complicated. I think there are two advantages
> from the series of patches.
>
> 1. XOR operation is simple and efficient, GFP_ZONE_TABLE/BAD need to do twice
> shift operations, the first is for getting a zone_type and the second is for
> checking the to be returned type is a correct or not. But with these patch XOR
> operation just needs to use once. Because the bottom 3 bits of GFP bitmask have
> been used to represent the encoded zone number, we can say there is no bad zone
> number if all callers could use it without buggy way. Of course, the returned
> zone type in gfp_zone needs to be no more than ZONE_MOVABLE.

But you are losing the ability to check for wrong usage. And it seems
that the sad reality is that the existing code do screw up.

> 2. GFP_ZONE_TABLE has limit with the amount of zone types. Current GFP_ZONE_TABLE
> is 32 bits, in general, there are 4 zone types for most ofX86_64 platform, they
> are ZONE_DMA, ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE. If we want to expand the
> amount of zone types to larger than 4, the zone shift should be 3.

But we do not want to expand the number of zones IMHO. The existing zoo
is quite a maint. pain.

That being said. I am not saying that I am in love with GFP_ZONE_TABLE.
It always makes my head explode when I look there but it seems to work
with the current code and it is optimized for it. If you want to change
this then you should make sure you describe reasons _why_ this is an
improvement. And I would argue that "we can have more zones" is a
relevant one.
--
Michal Hocko
SUSE Labs