Re: [PATCH] mm: change memcg->oom_group access with atomic operations

From: Paul E. McKenney
Date: Tue Feb 21 2023 - 19:38:09 EST

Next message: Xiao Ni: "Re: [PATCH] md: fix null-ptr-deference in md_free_disk()"
Previous message: Sean Christopherson: "Re: [PATCH 2/2] Documentation/process: Add a maintainer handbook for KVM x86"
In reply to: Roman Gushchin: "Re: [PATCH] mm: change memcg->oom_group access with atomic operations"
Next in thread: Roman Gushchin: "Re: [PATCH] mm: change memcg->oom_group access with atomic operations"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Feb 21, 2023 at 03:57:58PM -0800, Roman Gushchin wrote:
> On Tue, Feb 21, 2023 at 03:38:24PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 21, 2023 at 03:13:36PM -0800, Shakeel Butt wrote:
> > > On Tue, Feb 21, 2023 at 2:38 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Feb 21, 2023 at 02:23:31PM -0800, Roman Gushchin wrote:
> > > > > On Tue, Feb 21, 2023 at 10:23:59AM -0800, Paul E. McKenney wrote:
> > > > > > On Tue, Feb 21, 2023 at 08:56:59AM -0800, Shakeel Butt wrote:
> > > > > > > +Paul & Marco
> > > > > > >
> > > > > > > On Tue, Feb 21, 2023 at 5:51 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 20, 2023 at 10:52:10PM -0800, Shakeel Butt wrote:
> > > > > > > > > On Mon, Feb 20, 2023 at 9:17 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
> > > > > > > > > > > On Feb 20, 2023, at 3:06 PM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 20, 2023 at 01:09:44PM -0800, Roman Gushchin wrote:
> > > > > > > > > > >>> On Mon, Feb 20, 2023 at 11:16:38PM +0800, Yue Zhao wrote:
> > > > > > > > > > >>> The knob for cgroup v2 memory controller: memory.oom.group
> > > > > > > > > > >>> will be read and written simultaneously by user space
> > > > > > > > > > >>> programs, thus we'd better change memcg->oom_group access
> > > > > > > > > > >>> with atomic operations to avoid concurrency problems.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Signed-off-by: Yue Zhao <findns94@xxxxxxxxx>
> > > > > > > > > > >>
> > > > > > > > > > >> Hi Yue!
> > > > > > > > > > >>
> > > > > > > > > > >> I'm curious, have any seen any real issues which your patch is solving?
> > > > > > > > > > >> Can you, please, provide a bit more details.
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > > IMHO such details are not needed. oom_group is being accessed
> > > > > > > > > > > concurrently and one of them can be a write access. At least
> > > > > > > > > > > READ_ONCE/WRITE_ONCE is needed here.
> > > > > > > > > >
> > > > > > > > > > Needed for what?
> > > > > > > > >
> > > > > > > > > For this particular case, documenting such an access. Though I don't
> > > > > > > > > think there are any architectures which may tear a one byte read/write
> > > > > > > > > and merging/refetching is not an issue for this.
> > > > > > > >
> > > > > > > > Wouldn't a compiler be within its rights to implement a one byte store as:
> > > > > > > >
> > > > > > > > load-word
> > > > > > > > modify-byte-in-word
> > > > > > > > store-word
> > > > > > > >
> > > > > > > > and if this is a lockless store to a word which has an adjacent byte also
> > > > > > > > being modified by another CPU, one of those CPUs can lose its store?
> > > > > > > > And WRITE_ONCE would prevent the compiler from implementing the store
> > > > > > > > in that way.
> > > > > > >
> > > > > > > Thanks Willy for pointing this out. If the compiler can really do this
> > > > > > > then [READ|WRITE]_ONCE are required here. I always have big bad
> > > > > > > compiler lwn article open in a tab. I couldn't map this transformation
> > > > > > > to ones mentioned in that article. Do we have name of this one?
> > > > > >
> > > > > > No, recent compilers are absolutely forbidden from doing this sort of
> > > > > > thing except under very special circumstances.
> > > > > >
> > > > > > Before C11, compilers could and in fact did do things like this. This is
> > > > > > after all a great way to keep the CPU's vector unit from getting bored.
> > > > > > Unfortunately for those who prize optimization above all else, doing
> > > > > > this can introduce data races, for example:
> > > > > >
> > > > > > char a;
> > > > > > char b;
> > > > > > spin_lock la;
> > > > > > spin_lock lb;
> > > > > >
> > > > > > void change_a(char new_a)
> > > > > > {
> > > > > > spin_lock(&la);
> > > > > > a = new_a;
> > > > > > spin_unlock(&la);
> > > > > > }
> > > > > >
> > > > > > void change_b(char new_b)
> > > > > > {
> > > > > > spin_lock(&lb);
> > > > > > b = new_b;
> > > > > > spin_unlock(&lb);
> > > > > > }
> > > > > >
> > > > > > If the compiler "optimized" that "a = new_a" so as to produce a non-atomic
> > > > > > read-modify-write sequence, it would be introducing a data race.
> > > > > > And since C11, the compiler is absolutely forbidden from introducing
> > > > > > data races. So, again, no, the compiler cannot invent writes to
> > > > > > variables.
> > > > > >
> > > > > > What are those very special circumstances?
> > > > > >
> > > > > > 1. The other variables were going to be written to anyway, and
> > > > > > none of the writes was non-volatile and there was no ordering
> > > > > > directive between any of those writes.
> > > > > >
> > > > > > 2. The other variables are dead, as in there are no subsequent
> > > > > > reads from them anywhere in the program. Of course in that case,
> > > > > > there is no need to read the prior values of those variables.
> > > > > >
> > > > > > 3. All accesses to all of the variables are visible to the compiler,
> > > > > > and the compiler can prove that there are no concurrent accesses
> > > > > > to any of them. For example, all of the variables are on-stack
> > > > > > variables whose addresses are never taken.
> > > > > >
> > > > > > Does that help, or am I misunderstanding the question?
> > > > >
> > > > > Thank you, Paul!
> > > > >
> > > > > So it seems like READ_ONCE()/WRITE_ONCE() are totally useless here.
> > > > > Or I still miss something?
> > > >
> > > > Yes, given that the compiler will already avoid inventing data-race-prone
> > > > C-language accesses to shared variables, so if that was the only reason
> > > > that you were using READ_ONCE() or WRITE_ONCE(), then READ_ONCE() and
> > > > WRITE_ONCE() won't be helping you.
> > > >
> > > > Or perhaps better to put it a different way... The fact that the compiler
> > > > is not permitted to invent data-racy reads and writes is exactly why
> > > > you do not normally need READ_ONCE() and WRITE_ONCE() for accesses in
> > > > lock-based critical sections. Instead, you only need READ_ONCE() and
> > > > WRITE_ONCE() when you have lockless accesses to the same shared variables.
> > >
> > > This is lockless access to memcg->oom_group potentially from multiple
> > > CPUs, so, READ_ONCE() and WRITE_ONCE() are needed, right?
> >
> > Agreed, lockless concurrent accesses should use READ_ONCE() and WRITE_ONCE().
> > And if either conflicting access is lockless, it is lockless. ;-)
>
> Now I'm confused, why we should use it here?
> Writing is happening from a separate syscall (a single write from a syscall),
> reading is happening from a oom context. The variable is boolean, it's either
> 0 or 1. What difference READ_ONCE()/WRITE_ONCE() will make here?
> Thanks!

In practice, not much difference other than documenting shared accesses.
Which can be valuable.

In theory, when you do a normal C-language store, the compiler is within
its rights to use the variable for temporary storage between the time
of the last read from that variable and the next write to that variable.
Back to practice, I have not heard of this happening for shared variables.
On the other hand, compilers really do this for on-stack variables whose
addresses are not taken, which is one of the reasons that gdb might say
that the variable is optimized out when you try to look at its value.

So the potential is there, and if it was my code, I would therefore use
READ_ONCE() and WRITE_ONCE().

Thanx, Paul

Next message: Xiao Ni: "Re: [PATCH] md: fix null-ptr-deference in md_free_disk()"
Previous message: Sean Christopherson: "Re: [PATCH 2/2] Documentation/process: Add a maintainer handbook for KVM x86"
In reply to: Roman Gushchin: "Re: [PATCH] mm: change memcg->oom_group access with atomic operations"
Next in thread: Roman Gushchin: "Re: [PATCH] mm: change memcg->oom_group access with atomic operations"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]