Re: [PATCH 17/24] sched/fair: Implement delayed dequeue
From: Dietmar Eggemann
Date: Mon Nov 11 2024 - 06:30:56 EST
On 08/11/2024 19:16, Phil Auld wrote:
> On Fri, Nov 08, 2024 at 03:53:26PM +0100 Dietmar Eggemann wrote:
>> On 04/11/2024 13:50, Phil Auld wrote:
>>>
>>> Hi Dietmar,
>>>
>>> On Mon, Nov 04, 2024 at 10:28:37AM +0100 Dietmar Eggemann wrote:
>>>> Hi Phil,
>>>>
>>>> On 01/11/2024 13:47, Phil Auld wrote:
[...]
>> One reason I don't see the difference between DELAY_DEQUEUE and
>> NO_DELAY_DEQUEUE could be because of the affinity of the related
>> nvme interrupts:
>>
>> $ cat /proc/interrupts
>>
>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 ...
>> 132: 0 0 1523653 0 0 0 0 0 0 ... IR-PCI-MSIX-0000:01:00.0 1-edge nvme0q1
>> 133: 0 0 0 0 0 1338451 0 0 0 ... IR-PCI-MSIX-0000:01:00.0 2-edge nvme0q2
>> 134: 0 0 0 0 0 0 0 0 2252297 ... IR-PCI-MSIX-0000:01:00.0 3-edge nvme0q3
>>
>> $ cat /proc/irq/132/smp_affinity_list
>> 0-2
>> cat /proc/irq/133/smp_affinity_list
>> 3-5
>> cat /proc/irq/134/smp_affinity_list
>> 6-8
>>
>> So the 8 fio tasks from:
>>
>> # fio --cpus_allowed 1,2,3,4,5,6,7,8 --rw randwrite --bs 4k
>> --runtime 8s --iodepth 32 --direct 1 --ioengine libaio
>> --numjobs 8 --size 30g --name default --time_based
>> --group_reporting --cpus_allowed_policy shared
>> --directory /testfs
>>
>> don't have to fight with per-CPU kworkers on each CPU.
>>
>> e.g. 'nvme0q3 interrupt -> queue on workqueue dio/nvme0n1p2 ->
>> run iomap_dio_complete_work() in kworker/8:x'
>>
>> In case I trace the 'task_on_rq_queued(p) && p->se.sched_delayed &&
>> rq->nr_running > 1) condition in ttwu_runnable() condition i only see
>> the per-CPU kworker in there, so p->nr_cpus_allowed == 1.
>>
>> So the patch shouldn't make a difference for this scenario?
>>
>
> If the kworker is waking up an fio task it could. I don't think
> they are bound to a single cpu.
>
> But yes if your trace is only showing the kworker there then it would
> not help. Are you actually able to reproduce the difference?
No, with my setup I don't see any difference running your fio test. But
the traces also show me that there are no scenarios in which this patch
can make a difference in the scores.
>> But maybe your VDO or thinpool setup creates waker/wakee pairs with
>> wakee->nr_cpus_allowed > 1?
>>
>
> That's certainly possible but I don't know for sure. There are well more
> dio kworkers on the box than cpus though if I recall. I don't know
> if they all have singel cpu affinities.
Yeah there must be more tasks (inc. kworkers) w/ 'p->nr_cpus_allowed >
1' involved.
>> Does your machine has single CPU smp_affinity masks for these nvme
>> interrupts?
>>
>
> I don't know. I had to give the machine back.
Ah, too late then ;-)