Re: psi_trigger_poll() is completely broken

From: Suren Baghdasaryan
Date: Mon Jan 10 2022 - 12:25:16 EST


On Mon, Jan 10, 2022 at 5:45 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Jan 05, 2022 at 11:13:30AM -0800, Linus Torvalds wrote:
> > On Wed, Jan 5, 2022 at 11:07 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Whoever came up with that stupid "replace existing trigger with a
> > > write()" model should feel bad. It's garbage, and it's actively buggy
> > > in multiple ways.
> >
> > What are the users? Can we make the rule for -EBUSY simply be that you
> > can _install_ a trigger, but you can't replace an existing one (except
> > with NULL, when you close it).
>
> Apologies for the delay, I'm traveling right now.
>
> The primary user of the poll interface is still Android userspace OOM
> killing. I'm CCing Suren who is the most familiar with this usecase.
>
> Suren, the way the refcounting is written right now assumes that
> poll_wait() is the actual blocking wait. That's not true, it just
> queues the waiter and saves &t->event_wait, and the *caller* of
> psi_trigger_poll() continues to interact with it afterwards.

Thanks for adding me, Johannes. I see where I made a mistake.
Terribly sorry for the trouble this caused. I do feel bad.

>
> If at all possible, I would also prefer the simplicity of one trigger
> setup per fd; if you need a new trigger, close the fd and open again.
>
> Can you please take a look if that is workable from the Android side?

Yes, one trigger per fd would work fine for Android. That's how we
intended to use it.
I'm still catching up on this email thread. Once I digest it, will try
to fix this with one-trigger-per-fd approach.

About the issue of serializing concurrent writes for
cgroup_pressure_write() similar to how psi_write() does. Doesn't
of->mutex inside kernfs_fop_write_iter() serialize the writes to the
same file: https://elixir.bootlin.com/linux/latest/source/fs/kernfs/file.c#L287
?

>
> (I'm going to follow up on the static branch issue Linus pointed out,
> later this week when I'm back home. I also think we should add Suren
> as additional psi maintainer since the polling code is a good chunk of
> the codebase and he shouldn't miss threads like these.)

That would help me not to miss these emails and respond promptly.
Thanks,
Suren.

>
> > That would fix the poll() lifetime issue, and would make the
> > psi_trigger_replace() races fairly easy to fix - just use
> >
> > if (cmpxchg(trigger_ptr, NULL, new) != NULL) {
> > ... free 'new', return -EBUSY ..
> >
> > to install the new one, instead of
> >
> > rcu_assign_pointer(*trigger_ptr, new);
> >
> > or something like that. No locking necessary.
> >
> > But I assume people actually end up re-writing triggers, because
> > people are perverse and have taken advantage of this completely broken
> > API.
> >
> > Linus