Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

From: Christian Theune
Date: Thu Sep 19 2024 - 02:35:21 EST

Next message: Ben Dooks: "[PATCH] riscv: make riscv_isa_vendor_ext_andes[] static"
Previous message: Armin Wolf: "[PATCH] platform/x86: dell-laptop: Fix crash when unregistering battery hook"
In reply to: Chris Mason: "Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)"
Next in thread: Linus Torvalds: "Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On 19. Sep 2024, at 05:12, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> I think we should just do the simple one-liner of adding a
>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>> xas_split_alloc()).
>
> .. and obviously that should be actually *verified* to fix the issue
> not just with the test-case that Chris and Jens have been using, but
> on Christian's real PostgreSQL load.
>
> Christian?

Happy to! I see there’s still some back and forth on the specific patches. Let me know which kernel version and which patches I should start trying out. I’m loosing track while following the discussion.

In preparation: I’m wondering whether the known reproducer gives insight how I might force my load to trigger it more easily? Would running the reproducer above and combining that with a running PostgreSQL benchmark make sense?

Otherwise we’d likely only be getting insight after weeks of not seeing crashes …

Christian

--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

Next message: Ben Dooks: "[PATCH] riscv: make riscv_isa_vendor_ext_andes[] static"
Previous message: Armin Wolf: "[PATCH] platform/x86: dell-laptop: Fix crash when unregistering battery hook"
In reply to: Chris Mason: "Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)"
Next in thread: Linus Torvalds: "Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]