Re: [PATCH v2 0/1] hrtimer: More fixes for handling of timer slack of rt tasks
From: Thomas Gleixner
Date: Fri Aug 09 2024 - 16:15:31 EST
On Fri, Aug 09 2024 at 08:47, Felix MOESSBAUER wrote:
> On Fri, 2024-08-09 at 02:34 +0100, Qais Yousef wrote:
>> On 08/05/24 16:09, Felix Moessbauer wrote:
>> > This series fixes the (hopefully) last location of an incorrectly
>> > handled timer slack on rt tasks in hrtimer_start_range_ns(), which
>> > was
>> > uncovered by a userland change in glibc 2.33.
>> >
>> > Changes since v1:
>> >
>> > - drop patch "hrtimer: Document, that PI boosted tasks have no
>> > timer slack", as
>> > this behavior is incorrect and is already adressed in
>> > 20240610192018.1567075-1-qyousef@xxxxxxxxxxx
>>
>> There was discussion about this hrtimer usage in earlier version if
>> it helps to
>> come up with a potentially better patch
>
> Hi, Sebastian already pointed me to this thread.
>
> When debugging my issue, I did not know about it but was scratching my
> head if the behavior / usage of rt_task is actually correct.
> The whole naming was quite confusing. Many thanks for cleaning that up.
>
>>
>>
>> https://lore.kernel.org/lkml/20240521110035.KRIwllGe@xxxxxxxxxxxxx/
>>
>> My patches got picked up by the way, you'd probably want to rebase
>> and resend
>> as now the function is named rt_or_dl_task_policy()
>
> As we use rt_or_dl_task() in nanosleep, I'm wondering if we should use
> the same in hrtimer_start_range_ns(). Is that because PI boosted tasks
> need to acquire a lock which can only be a mutex_t or equivalent
> sleeping lock on PREEMPT_RT?
No. Arming the timer has nothing to do with mutexes or such. It's an
optimization to grant RT/DL tasks zero slack automatically.
The correct thing is to use policy based delta adjustment.
The fact that a task got boosted temporatily does not make it eligble
for zero slack. It stays a SCHED_OTHER task no matter what.
rt_or_dl_task() in nanosleep() is fundamentally wrong and needs to be
replaced with rt_or_dl_task_policy() and not the other way round.
> Anyways, I'm thinking about getting rid of the policy based delta=0 and
> just set the task->timer_slack_ns to 0 when changing the scheduling
> policy (and changing it back to the default when reverting to
> SCHED_OTHER). By that, we can get rid of the special handling and users
> of the procfs would also see correct data in /timerslack_ns.
That makes sense.
Thanks
tglx