Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

From: Ankur Arora
Date: Fri Feb 16 2024 - 23:01:14 EST



Paul E. McKenney <paulmck@xxxxxxxxxx> writes:

> On Thu, Feb 15, 2024 at 06:59:25PM -0800, Paul E. McKenney wrote:
>> On Thu, Feb 15, 2024 at 04:45:17PM -0800, Ankur Arora wrote:
>> >
>> > Paul E. McKenney <paulmck@xxxxxxxxxx> writes:
>> >
>> > > On Thu, Feb 15, 2024 at 01:24:59PM -0800, Ankur Arora wrote:
>> > >>
>> > >> Paul E. McKenney <paulmck@xxxxxxxxxx> writes:
>> > >>
>> > >> > On Wed, Feb 14, 2024 at 07:45:18PM -0800, Paul E. McKenney wrote:
>> > >> >> On Wed, Feb 14, 2024 at 06:03:28PM -0800, Ankur Arora wrote:
>> > >> >> >
>> > >> >> > Paul E. McKenney <paulmck@xxxxxxxxxx> writes:
>> > >> >> >
>> > >> >> > > On Mon, Feb 12, 2024 at 09:55:24PM -0800, Ankur Arora wrote:
>> > >> >> > >> Hi,
>> > >> >> > >>
>> > >> >> > >> This series adds a new scheduling model PREEMPT_AUTO, which like
>> > >> >> > >> PREEMPT_DYNAMIC allows dynamic switching between a none/voluntary/full
>> > >> >> > >> preemption model. However, unlike PREEMPT_DYNAMIC, it doesn't depend
>> > >> >> > >> on explicit preemption points for the voluntary models.
>> > >> >> > >>
>> > >> >> > >> The series is based on Thomas' original proposal which he outlined
>> > >> >> > >> in [1], [2] and in his PoC [3].
>> > >> >> > >>
>> > >> >> > >> An earlier RFC version is at [4].
>> > >> >> > >
>> > >> >> > > This uncovered a couple of latent bugs in RCU due to its having been
>> > >> >> > > a good long time since anyone built a !SMP preemptible kernel with
>> > >> >> > > non-preemptible RCU. I have a couple of fixes queued on -rcu [1], most
>> > >> >> > > likely for the merge window after next, but let me know if you need
>> > >> >> > > them sooner.
>> > >> >> >
>> > >> >> > Thanks. As you can probably tell, I skipped out on !SMP in my testing.
>> > >> >> > But, the attached diff should tide me over until the fixes are in.
>> > >> >>
>> > >> >> That was indeed my guess. ;-)
>> > >> >>
>> > >> >> > > I am also seeing OOM conditions during rcutorture testing of callback
>> > >> >> > > flooding, but I am still looking into this.
>> > >> >> >
>> > >> >> > That's on the PREEMPT_AUTO && PREEMPT_VOLUNTARY configuration?
>> > >> >>
>> > >> >> On two of the PREEMPT_AUTO && PREEMPT_NONE configurations, but only on
>> > >> >> two of them thus far. I am running a longer test to see if this might
>> > >> >> be just luck. If not, I look to see what rcutorture scenarios TREE10
>> > >> >> and TRACE01 have in common.
>> > >> >
>> > >> > And still TRACE01 and TREE10 are hitting OOMs, still not seeing what
>> > >> > sets them apart. I also hit a grace-period hang in TREE04, which does
>> > >> > CONFIG_PREEMPT_VOLUNTARY=y along with CONFIG_PREEMPT_AUTO=y. Something
>> > >> > to dig into more.
>> > >>
>> > >> So, the only PREEMPT_VOLUNTARY=y configuration is TREE04. I wonder
>> > >> if you would continue to hit the TREE04 hang with CONFIG_PREEMTP_NONE=y
>> > >> as well?
>> > >> (Just in the interest of minimizing configurations.)
>> > >
>> > > I would be happy to, but in the spirit of full disclosure...
>> > >
>> > > First, I have seen that failure only once, which is not enough to
>> > > conclude that it has much to do with TREE04. It might simply be low
>> > > probability, so that TREE04 simply was unlucky enough to hit it first.
>> > > In contrast, I have sufficient data to be reasonably confident that the
>> > > callback-flooding OOMs really do have something to do with the TRACE01 and
>> > > TREE10 scenarios, even though I am not yet seeing what these two scenarios
>> > > have in common that they don't also have in common with other scenarios.
>> > > But what is life without a bit of mystery? ;-)
>> >
>> > :).
>> >
>> > > Second, please see the attached tarball, which contains .csv files showing
>> > > Kconfig options and kernel boot parameters for the various torture tests.
>> > > The portions of the filenames preceding the "config.csv" correspond to
>> > > the directories in tools/testing/selftests/rcutorture/configs.
>> >
>> > So, at least some of the HZ_FULL=y tests don't run into problems.
>> >
>> > > Third, there are additional scenarios hand-crafted by the script at
>> > > tools/testing/selftests/rcutorture/bin/torture.sh. Thus far, none of
>> > > them have triggered, other than via the newly increased difficulty
>> > > of configurating a tracing-free kernel with which to test, but they
>> > > can still be useful in ruling out particular Kconfig options or kernel
>> > > boot parameters being related to a given issue.
>> > >
>> > > But please do take a look at the .csv files and let me know what
>> > > adjustments would be appropriate given the failure information.
>> >
>> > Nothing stands out just yet. Let me start a run here and see if
>> > that gives me some ideas.
>>
>> Sounds good, thank you!
>>
>> > I'm guessing the splats don't give any useful information or
>> > you would have attached them ;).
>>
>> My plan is to extract what can be extracted from the overnight run
>> that I just started. Just in case the fixes have any effect on things,
>> unlikely though that might be given those fixes and the runs that failed.
>
> And I only got no failures from either TREE10 or TRACE01 on last night's
> run.

Oh that's great news. Same for my overnight runs for TREE04 and TRACE01.

Ongoing: a 24 hour run for those. Let's see how that goes.

> I merged your series on top of v6.8-rc4 with the -rcu tree's
> dev branch, the latter to get the RCU fixes. But this means that last
> night's results are not really comparable to earlier results.
>
> I did get a few TREE09 failures, but I get those anyway. I took it
> apart below for you because I got confused and thought that it was a
> TREE10 failure. So just in case you were curious what one of these
> looks like and because I am too lazy to delete it. ;-)

Heh. Well, thanks for being lazy /after/ dissecting it nicely.

> So from the viewpoint of moderate rcutorture testing, this series
> looks good. Woo hoo!!!

Awesome!

> We did uncover a separate issue with Tasks RCU, which I will report on
> in more detail separately. However, this issue does not (repeat, *not*)
> affect lazy preemption as such, but instead any attempt to remove all
> of the cond_resched() invocations.

So, that sounds like it happens even with (CONFIG_PREEMPT_AUTO=n,
CONFIG_PREEMPT=y)?
Anyway will look out for it when you go into the detail.

> My next step is to try this on bare metal on a system configured as
> is the fleet. But good progress for a week!!!

Yeah this is great. Fingers crossed for the wider set of tests.

Thanks

--
ankur