Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

From: Matt Fleming
Date: Mon May 16 2016 - 05:46:45 EST


On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote:
> Hi Matt,
>
> Thanks for Ccing me.
>
> I am indeed interested, because I recently encountered an rq clock
> issue, which is that the clock jumps about 200ms when I was
> experimenting the "flat util hierarchy" patches, which really annoyed
> me, and I had to stop to figure out what is wrong (but haven't yet
> figured out ;))
>
> First, this patchset does not solve my problem, but never mind, by
> reviewing your patches, I have some comments:

Thanks for the review. One gap that this patch series doesn't address
is that some callers of update_rq_clock() do not pin rq->lock, which
makes the diagnostic checks useless in that case.

I plan on handling that next, but I wanted to get this series out as
soon as possible for review.

> On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
> >
> > - rq->clock_skip_update = 0;
> > + /* Clear ACT, preserve everything else */
> > + rq->clock_update_flags ^= RQCF_ACT_SKIP;
>
> The comment says "Clear ACT", but this is really xor, and I am not sure
> this is even what you want.

Urgh, you're right. I'm not sure what I was thinking when I wrote
that.

> In addition, would it be simpler to do this?
>
> update_rq_clock()
> if (flags & RQCF_ACT_SKIP)
> flags <<= 1; /* effective skip is an update */
> return;
>
> flags = RQCF_UPDATED;

No because if someone calls rq_clock() immediately after __schedule(),
or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
should trigger a warning since the clock has not actually been
updated.