Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

From: Vlastimil Babka
Date: Thu Aug 08 2024 - 03:49:09 EST


On 8/8/24 03:07, Guenter Roeck wrote:
> On 8/6/24 16:24, Thomas Gleixner wrote:
>> Cc+: Helge, parisc ML
>>
>> We're chasing a weird failure which has been tracked down to the
>> placement of the division library functions (I assume they are imported
>> from libgcc).
>>
>> See the thread starting at:
>>
>> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@xxxxxxxxxxxx
>>
>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>>
>>>> So this change adds 16 bytes to __softirq() which moves the division
>>>> functions up by 16 bytes. That's all it takes to make the stupid go
>>>> away....
>>>
>>> Heh I was actually wondering if the division is somhow messed up because
>>> maxobj = order_objects() and order_objects() does a division. Now I suspect
>>> it even more.
>>
>> check_slab() calls into that muck, but I checked the disassembly of a
>> working and a broken kernel and the only difference there is the
>> displacement offset when the code calculates the call address, but
>> that's as expected a difference of 16 bytes.
>>
>> Now it becomes interesting.
>>
>> I added a unused function after __do_softirq() into the softirq text
>> section and filled it with ASM nonsense so that it occupies exactly one
>> page. That moves $$divoI, which is what check_slab() calls, exactly one
>> page forward:
>>
>
> With the above added to my tree, I can also play around with the code.
> Here is the next weird one:
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 4927edec6a8c..b8a33966d858 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
> }
>
> maxobj = order_objects(slab_order(slab), s->size);
> +
> + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
> +
> if (slab->objects > maxobj) {
> slab_err(s, slab, "objects %u > max %u",
> slab->objects, maxobj);
>
> results in:
>
> ##### slab->objects=21 maxobj=21
> =============================================================================
> BUG kmem_cache_node (Not tainted): objects 21 > max 16

But is this printed from the same attempt? The pr_info_once() might have
printed earlier and then stopped (as it's _once) and the error case might
have happened only later, and there was nothing printed in between as the
kmalloc caches are created in a loop.

> As Thomas noticed, this only happens if the divide assembler code is within a certain
> address range.
>
> Ok, now I am really lost.
>
> Guenter
>