Re: [PATCH] clockevents: Retry programming min delta up to 10 times

From: James Hogan
Date: Mon Apr 25 2016 - 11:51:34 EST


On Mon, Apr 25, 2016 at 03:48:58PM +0200, Martin Schwidefsky wrote:
> On Fri, 22 Apr 2016 11:40:11 +0100
> James Hogan <james.hogan@xxxxxxxxxx> wrote:
>
> > Under virtualisation it is possible to get unexpected latency during a
> > clockevent device's set_next_event() callback which can make it return
> > -ETIME even for a delta based on min_delta_ns.
>
> Do you have an example for this behavior?

The place where I've observed it is arch/mips/kernel/cevt-r4k.c, which
returns -ETIME when the delay is too short for it to be able to set it
and read back the timer.

I've also recently (Friday afternoon) seen a report of it apparently
happening with the MIPS GIC clockevent driver too
(drivers/clocksource/mips-gic-timer.c) which has similar logic, probably
copied from cevt-r4k, and this patch appeared to help (I still need to
confirm that one). That wasn't with virtualisation, but was on a
multithreaded core being stress tested, a case when its also hard to
find a guaranteed min delta.

> I would call that a BUG in the implementation of the clockevent
> device, no?

Several drivers seem to do that. I'm open to alternatives. Do you think
the driver should retry itself when it detects this race may have been
hit?

>
> > The clockevents_program_min_delta() implementation for
> > CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n doesn't handle retries when this
> > happens, nor does clockevents_program_event() or its callers when force
> > is true (for example hrtimer_reprogram()). This can result in hangs
> > until the clock event device does a full period.
>
> Is that because some clockevent devices can not program the minimum delta
> in some corner cases?

yes.

I think it actually ended up causing an arithmetic overflow somewhere in
ktime_get() (I'd have to dig through my notes to find specifics)
which resulted in __iter_div_u64_rem() being given an excessively large
dividend, which effectively hung the CPU.

Thanks
James

>
> > It isn't appropriate to use MIN_ADJUST in this case as occasional
> > hypervisor induced high latency will cause min_delta_ns to quickly
> > increase to the maximum.
>
> I agree, the whole minimum delta adjustment is quite broken on a virtualized
> system. On s390 we have seen the rise of the min_delta_ns to the maximum
> value due to a busy hypervisor.
>
> > Instead, borrow the retry pattern from the MIN_ADJUST case, but without
> > making adjustments. We retry up to 10 times before giving up.
>
> That will add a few unnecessary instruction for architectures that have a
> sane set_next_event function, namely those that always returns 0. Should
> not be too bad though.
>
> --
> blue skies,
> Martin.
>
> "Reality continues to ruin my life." - Calvin.
>

Attachment: signature.asc
Description: Digital signature