On 11/21/24 4:30 AM, Phil Auld wrote:Apologies for those receiving this twice, but resending due to mail client not sending it as text content causing it to be rejected by the lists.
Hi,Interesting. Might explain some regressions I've seen too related to
On Wed, Nov 20, 2024 at 06:20:12PM -0700 Jens Axboe wrote:
On 11/20/24 5:00 PM, Chaitanya Kulkarni wrote:I was just going to ask how confident we are in that bisect result.
On 11/20/24 13:35, Saeed Mirzamohammadi wrote:There's no way that commit is involved, the test as quoted doesn't even
Hi,Thanks a lot for the report, to narrow down this problem can you
I?m reporting a performance regression of up to 9-10% with FIO randomwrite benchmark on ext4 comparing 6.12.0-rc2 kernel and v5.15.161. Also, standard deviation after this change grows up to 5-6%.
Bisect root cause commit
===================
- commit 63dfa1004322 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard?)
Test details
=========
- readwrite=randwrite bs=4k size=1G ioengine=libaio iodepth=16 direct=1 time_based=1 ramp_time=180 runtime=1800 randrepeat=1 gtod_reduce=1
- Test is on ext4 filesystem
- System has 4 NVMe disks
please :-
1. Run the same test on the raw nvme device /dev/nvme0n1 that you
have used for this benchmark ?
2. Run the same test on the XFS formatted nvme device instead of ext4 ?
This way we will know if there is an issue only with the ext4 or
with other file systems are suffering from this problem too or
it is below the file system layer such as block layer and nvme pci driver ?
It will also help if you can repeat these numbers for io_uring fio io_engine
to narrow down this problem to know if the issue is ioengine specific.
Looking at the commit [1], it only sets the max value to write zeroes
sectors
if NVME_QUIRK_DEALLOCATE_ZEROES is set, else uses the controller max
write zeroes value.
touch write zeroes. Hence if there really is a regression here, then
it's either not easily bisectable, some error was injected while
bisecting, or the test itself is bimodal.
I suspect this is the same issue I've been fighting here:
https://urldefense.com/v3/__https://lore.kernel.org/lkml/20241101124715.GA689589@xxxxxxxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!PXJXp0zosonkV7jeW9yE0YL-uPElcYI-G-bvm69COWR1Tbl9w9puGc1tLR_ccsDoYPBb9Bs3waNVuuf9Lg$
Saeed, can you try your randwrite test after
"echo NO_DELAY_DEQUEUE > /sys/kernel/debug/sched/features"
please?
We don't as yet have a general fix for it as it seems to be a bit of
a trade off.
performance.