On Tue, May 21, 2024 at 8:58 AM Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> wrote:
URL: https://lore.kernel.org/linux-mm/20240412064353.133497-1-zhaoyang.huang@xxxxxxxxxx/
On Tue, May 21, 2024 at 3:42 AM Marcin Wanat <private@xxxxxxxxxxxxxx> wrote:
Could you please try this one which could be applied on 6.6 directly. Thank you!
On 15.04.2024 03:50, Zhaoyang Huang wrote:
I have around 50 hosts handling high I/O (each with 20Gbps+ uplinks
and multiple NVMe drives), running RockyLinux 8/9. The stock RHEL
kernel 8/9 is NOT affected, and the long-term kernel 5.15.X is NOT affected.
However, with long-term kernels 6.1.XX and 6.6.XX,
(tested at least 10 different versions), this lockup always appears
after 2-30 days, similar to the report in the original thread.
The more load (for example, copying a lot of local files while
serving 20Gbps traffic), the higher the chance that the bug will appear.
I haven't been able to reproduce this during synthetic tests,
but it always occurs in production on 6.1.X and 6.6.X within 2-30 days.
If anyone can provide a patch, I can test it on multiple machines
over the next few days.