RE: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on the GICR_VPENDBASER.Dirty bit
From: lushenming
Date: Wed Sep 16 2020 - 03:04:46 EST
Hi,
Our team just discussed this issue again and consulted our GIC hardware
design team. They think the RD can afford busy waiting. So we still think
maybe 0 is better, at least for our hardware.
In addition, if not 0, as I said before, in our measurement, it takes only
hundreds of nanoseconds, or 1~2 microseconds, to finish parsing the VPT
in most cases. So maybe 1 microseconds, or smaller, is more appropriate.
Anyway, 10 microseconds is too much.
But it has to be said that it does depend on the hardware implementation.
Besides, I'm not sure where are the start and end point of the total scheduling
latency of a vcpu you said, which includes many events. Is the parse time of
the VPT not clear enough?
-----Original Message-----
From: Marc Zyngier [mailto:maz@xxxxxxxxxx]
Sent: 2020-09-15 22:48
To: lushenming <lushenming@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Jason Cooper <jason@xxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Wanghaibin (D) <wanghaibin.wang@xxxxxxxxxx>; yuzenghui <yuzenghui@xxxxxxxxxx>
Subject: Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on the GICR_VPENDBASER.Dirty bit
On 2020-09-15 15:04, lushenming wrote:
> Thanks for your quick response.
>
> Okay, I agree that busy-waiting may add more overhead at the RD level.
> But I think that the delay time can be adjusted. In our latest
> hardware implementation, we optimize the search of the VPT, now even
> the VPT full of interrupts (56k) can be parsed within 2 microseconds.
It's not so much when the VPT is full that it is bad. It is when the pending interrupts are not cached, and that you don't know *where* to look for them in the VPT.
> It is true that the parse speeds of various hardware are different,
> but does directly waiting for 10 microseconds make the optimization of
> those fast hardware be completely masked? Maybe we can set the delay
> time smaller, like 1 microseconds?
That certainly would be more acceptable. But I still question the impact of such a change compared to the cost of a vcpu entry. I suggest you come up with measurements that actually show that polling this register more often significantly reduces the entry latency. Only then can we make an educated decision.
Thanks,
M.
--
Jazz is not dead. It just smells funny...