Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

From: Christophe LEROY
Date: Mon Jan 22 2018 - 02:53:17 EST




Le 20/01/2018 Ã 18:56, Segher Boessenkool a ÃcritÂ:
Hi!

On Sat, Jan 20, 2018 at 09:22:50AM +0100, christophe leroy wrote:
On PPC32, the address space is limited to 4Gbytes, hence only the
low
slices will be used. As of today, the code uses
SLICE_LOW_TOP (0x100000000ul) and compares it with addr to determine
if addr refers to low or high space.
On PPC32, such a (addr < SLICE_LOW_TOP) test is always false because
0x100000000ul degrades to 0. Therefore, the patch modifies
SLICE_LOW_TOP to (0xfffffffful) and modifies the tests to
(addr <= SLICE_LOW_TOP) which will then always be true on PPC32
as addr has type 'unsigned long' while not modifying the PPC64
behaviour.

It should work to define SLICE_LOW_TOP as 0x100000000ull and keep
everything else the same, no?

great, yes it works indeed.


I don't think so. When I had the missing prototype, the compilation goes
ok, including the final link. Which means at the end the code is not
included since radix_enabled() evaluates to 0.

Many many parts of the kernel are based on this assumption.

Segher, what is your opinion on the above ? Can we consider that a ' if
(nbits)' will always be compiled out when nbits is a #define constant,
or should we duplicate the macros as suggested in order to avoid
unneccessary 'if' test on platforms where 'nbits' is always not null by
definition ?

Doing things like

if (nbits)
some_undeclared_function();

will likely work in practice if the condition evaluates to false at
compile time, but a) it will warn; b) it is just yuck; and c) it will
not always work (for example, you get the wrong prototype in this case,
not lethal here with most ABIs, but ugh).

Just make sure to declare all functions, or define it to some empty
thing, or #ifdeffery if you have to. There are many options, it is
not hard, and if it means you have to pull code further apart that is
not so bad: you get cleaner, clearer code.

Ok, if I understand well, your comment applies to the following indeed, so you confirm the #ifdef is necessary.

--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -553,9 +553,11 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
struct hstate *hstate = hstate_file(file);
int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));

+#ifdef CONFIG_PPC_RADIX_MMU
if (radix_enabled())
return radix__hugetlb_get_unmapped_area(file, addr, len,
pgoff, flags);
+#endif
return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
}
#endif


However, my question was related to another part of the current patchset, where the functions are always refined:


On PPC32 we set:

+#define SLICE_LOW_SHIFT 28
+#define SLICE_HIGH_SHIFT 0

On PPC64 we set:

#define SLICE_LOW_SHIFT 28
#define SLICE_HIGH_SHIFT 40

We define:

+#define slice_bitmap_zero(dst, nbits) \
+ do { if (nbits) bitmap_zero(dst, nbits); } while (0)


We have a function with:
{
slice_bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
}

So the question is to find the better approach. Is the above approach correct, including performance wise ?
Or should we define two sets of the macro slice_bitmap_zero(), one for CONFIG_PPC32 with the 'if (nbits)' test and one for CONFIG_PPC64 without the unnecessary test ?

Or should we avoid this macro entirely and instead do something like:

{
bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
#if SLICE_NUM_HIGH != 0
bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
#endif
}

And if we say the 'macro' approach is OK, should it be better the use a static inline function instead ?

Thanks,
Christophe