Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures

From: Arunpravin Paneer Selvam
Date: Fri Mar 10 2023 - 08:15:46 EST

Next message: Horatiu Vultur: "Re: [PATCH v3 3/5] net: Let the active time stamping layer be selectable."
Previous message: Konrad Dybcio: "Re: [PATCH 09/15] arm64: dts: qcom: sm6375: Add CPUCP L3 node"
In reply to: Paneer Selvam, Arunpravin: "RE: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures"
Next in thread: Luís Mendes: "Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/9/2023 3:42 PM, Luís Mendes wrote:

Hi,

Ping? This is actually a regression.
If there is no one available to work this, maybe I can have a look in
my spare time, in accordance with your suggestion.

Regards,
Luís

On Tue, Jan 3, 2023 at 8:44 AM Christian König <christian.koenig@xxxxxxx> wrote:

Am 25.12.22 um 20:39 schrieb Luís Mendes:

Re-sending with the correct linux-kernel mailing list email address.
Sorry for the inconvenience.

The proposed patch fixes the issue and allows amdgpu to work again on
armhf with a AMD RX 550 card, however it may not be the best solution
for the issue, as detailed below.

include/log2.h defined macros rounddown_pow_of_two(...) and
roundup_pow_of_two(...) do not handle 64-bit values on 32-bit
architectures (tested on armv9 armhf machine) causing
drm_buddy_init(...) to fail on BUG_ON with an underflow on the order
value, thus impeding amdgpu to load properly (no GUI).

One option is to modify rounddown_pow_of_two(...) to detect if the
variable takes 32 bits or less and call __rounddown_pow_of_two_u32(u32
n) or if the variable takes more space than 32 bits, then call
__rounddown_pow_of_two_u64(u64 n). This would imply renaming
__rounddown_pow_of_two(unsigne
d long n) to
__rounddown_pow_of_two_u32(u32 n) and add a new function
__rounddown_pow_of_two_u64(u64 n). This would be the most transparent
solution, however there a few complications, and they are:
- that the mm subsystem will fail to link on armhf with an undefined
reference on __aeabi_uldivmod
- there a few drivers that directly call __rounddown_pow_of_two(...)
- that other drivers and subsystems generate warnings

So this alternate solution was devised which avoids touching existing
code paths, and just updates drm_buddy which seems to be the only
driver that is failing, however I am not sure if this is the proper
way to go. So I would like to get a second opinion on this, by those
who know.

/include/linux/log2.h
/drivers/gpu/drm/drm_buddy.c

Signed-off-by: Luís Mendes <luis.p.mendes@xxxxxxxxx>

8------------------------------------------------------8<

diff -uprN linux-next/drivers/gpu/drm/drm_buddy.c
linux-nextLM/drivers/gpu/drm/drm_buddy.c
--- linux-next/drivers/gpu/drm/drm_buddy.c 2022-12-25
16:29:26.000000000 +0000
+++ linux-nextLM/drivers/gpu/drm/drm_buddy.c 2022-12-25
17:04:32.136007116 +0000
@@ -128,7 +128,7 @@ int drm_buddy_init(struct drm_buddy *mm,
unsigned int order;
u64 root_size;

- root_size = rounddown_pow_of_two(size);
+ root_size = rounddown_pow_of_two_u64(size);
order = ilog2(root_size) - ilog2(chunk_size);

I think this can be handled much easier if keep around the root_order
instead of the root_size in the first place.

Cause ilog2() does the right thing even for non power of two values and
so we just need the order for the offset subtraction below.

Could you try with ilog2() and see if you are getting the right value for size as suggested
by Christian.

Thanks,
Arun

Arun can you take a closer look at this?

Regards,
Christian.

root = drm_block_alloc(mm, NULL, order, offset);
diff -uprN linux-next/include/linux/log2.h linux-nextLM/include/linux/log2.h
--- linux-next/include/linux/log2.h 2022-12-25 16:29:29.000000000 +0000
+++ linux-nextLM/include/linux/log2.h 2022-12-25 17:00:34.319901492 +0000
@@ -58,6 +58,18 @@ unsigned long __roundup_pow_of_two(unsig
}

/**
+ * __roundup_pow_of_two_u64() - round up to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: value to round up
+ */
+static inline __attribute__((const))
+u64 __roundup_pow_of_two_u64(u64 n)
+{
+ return 1ULL << fls64(n - 1);
+}
+
+
+/**
* __rounddown_pow_of_two() - round down to nearest power of two
* @n: value to round down
*/
@@ -68,6 +80,17 @@ unsigned long __rounddown_pow_of_two(uns
}

/**
+ * __rounddown_pow_of_two_u64() - round down to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: value to round down
+ */
+static inline __attribute__((const))
+u64 __rounddown_pow_of_two_u64(u64 n)
+{
+ return 1ULL << (fls64(n) - 1);
+}
+
+/**
* const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
* @n: parameter
*
@@ -163,6 +186,7 @@ unsigned long __rounddown_pow_of_two(uns
__ilog2_u64(n) \
)

+
/**
* roundup_pow_of_two - round the given value up to nearest power of two
* @n: parameter
@@ -181,6 +205,25 @@ unsigned long __rounddown_pow_of_two(uns
)

/**
+ * roundup_pow_of_two_u64 - round the given value up to nearest power of two
+ * (unsgined 64-bits precision version)
+ * @n: parameter
+ *
+ * round the given value up to the nearest power of two
+ * - the result is undefined when n == 0
+ * - this can be used to initialise global variables from constant data
+ */
+#define roundup_pow_of_two_u64(n) \
+( \
+ __builtin_constant_p(n) ? ( \
+ ((n) == 1) ? 1 : \
+ (1ULL << (ilog2((n) - 1) + 1)) \
+ ) : \
+ __roundup_pow_of_two_u64(n) \
+ )
+
+
+/**
* rounddown_pow_of_two - round the given value down to nearest power of two
* @n: parameter
*
@@ -195,6 +238,22 @@ unsigned long __rounddown_pow_of_two(uns
__rounddown_pow_of_two(n) \
)

+/**
+ * rounddown_pow_of_two_u64 - round the given value down to nearest
power of two
+ * (unsgined 64-bits precision version)
+ * @n: parameter
+ *
+ * round the given value down to the nearest power of two
+ * - the result is undefined when n == 0
+ * - this can be used to initialise global variables from constant data
+ */
+#define rounddown_pow_of_two_u64(n) \
+( \
+ __builtin_constant_p(n) ? ( \
+ (1ULL << ilog2(n))) : \
+ __rounddown_pow_of_two_u64(n) \
+ )
+
static inline __attribute_const__
int __order_base_2(unsigned long n)
{

Next message: Horatiu Vultur: "Re: [PATCH v3 3/5] net: Let the active time stamping layer be selectable."
Previous message: Konrad Dybcio: "Re: [PATCH 09/15] arm64: dts: qcom: sm6375: Add CPUCP L3 node"
In reply to: Paneer Selvam, Arunpravin: "RE: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures"
Next in thread: Luís Mendes: "Re: [PATCH] [RFC] drm/drm_buddy fails to initialize on 32-bit architectures"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]