Re: [PATCH v7 0/7] Introduce ZONE_CMA

From: Vlastimil Babka
Date: Thu May 04 2017 - 08:33:48 EST


On 05/02/2017 03:03 PM, Michal Hocko wrote:
> On Tue 02-05-17 10:06:01, Vlastimil Babka wrote:
>> On 04/27/2017 05:06 PM, Michal Hocko wrote:
>>> On Tue 25-04-17 12:42:57, Joonsoo Kim wrote:
>>>> On Mon, Apr 24, 2017 at 03:09:36PM +0200, Michal Hocko wrote:
>>>>> On Mon 17-04-17 11:02:12, Joonsoo Kim wrote:
>>>>>> On Thu, Apr 13, 2017 at 01:56:15PM +0200, Michal Hocko wrote:
>>>>>>> On Wed 12-04-17 10:35:06, Joonsoo Kim wrote:
>>> [...]
>>>>> not for free. For most common configurations where we have ZONE_DMA,
>>>>> ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE all the 3 bits are already
>>>>> consumed so a new zone will need a new one AFAICS.
>>>>
>>>> Yes, it requires one more bit for a new zone and it's handled by the patch.
>>>
>>> I am pretty sure that you are aware that consuming new page flag bits
>>> is usually a no-go and something we try to avoid as much as possible
>>> because we are in a great shortage there. So there really have to be a
>>> _strong_ reason if we go that way. My current understanding that the
>>> whole zone concept is more about a more convenient implementation rather
>>> than a fundamental change which will solve unsolvable problems with the
>>> current approach. More on that below.
>>
>> I don't see it as such a big issue. It's behind a CONFIG option (so we
>> also don't need the jump labels you suggest later) and enabling it
>> reduces the number of possible NUMA nodes (not page flags). So either
>> you are building a kernel for android phone that needs CMA but will have
>> a single NUMA node, or for a large server with many nodes that won't
>> have CMA. As long as there won't be large servers that need CMA, we
>> should be fine (yes, I know some HW vendors can be very creative, but
>> then it's their problem?).
>
> Is this really about Android/UMA systems only? My quick grep seems to disagree
> $ git grep CONFIG_CMA=y
> arch/arm/configs/exynos_defconfig:CONFIG_CMA=y
> arch/arm/configs/imx_v6_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/keystone_defconfig:CONFIG_CMA=y
> arch/arm/configs/multi_v7_defconfig:CONFIG_CMA=y
> arch/arm/configs/omap2plus_defconfig:CONFIG_CMA=y
> arch/arm/configs/tegra_defconfig:CONFIG_CMA=y
> arch/arm/configs/vexpress_defconfig:CONFIG_CMA=y
> arch/arm64/configs/defconfig:CONFIG_CMA=y
> arch/mips/configs/ci20_defconfig:CONFIG_CMA=y
> arch/mips/configs/db1xxx_defconfig:CONFIG_CMA=y
> arch/s390/configs/default_defconfig:CONFIG_CMA=y
> arch/s390/configs/gcov_defconfig:CONFIG_CMA=y
> arch/s390/configs/performance_defconfig:CONFIG_CMA=y
> arch/s390/defconfig:CONFIG_CMA=y
>
> I am pretty sure s390 and ppc support NUMA and aim at supporting really
> large systems.

I don't see ppc there, and s390 commit adding CMA as default provides no
info. Heiko/Martin, could you share what does s390 use CMA for? Thanks.

> I can imagine that we could make ZONE_CMA configurable in a way that
> only very well defined use cases would be supported so that we can save
> page flags space. But this alone sounds like a maintainability nightmare
> to me. Especially when I consider ZONE_DMA situation. There is simply
> not an easy way to find out whether my HW really needs DMA zone or
> not. Most probably not but it still is configured and hidden behind
> config ZONE_DMA
> bool "DMA memory allocation support" if EXPERT
> default y
> help
> DMA memory allocation support allows devices with less than 32-bit
> addressing to allocate within the first 16MB of address space.
> Disable if no such devices will be used.
>
> If unsure, say Y.
>
> Are we really ready to add another thing like that? How are distribution
> kernels going to handle that?

I still hope that generic enterprise/desktop distributions can disable
it, and it's only used for small devices with custom kernels.

The config burden is already there in any case, it just translates to
extra migratetype and fastpath hooks, not extra zone and potentially
less nodes.