Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function

From: Boqun Feng
Date: Tue Oct 08 2024 - 08:50:15 EST

Next message: Mark Brown: "Re: [PATCH 1/2] ASoC: SOF: amd: Add error log for DSP firmware validation failure"
Previous message: Eric Dumazet: "Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address"
In reply to: Andrew Lunn: "Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function"
Next in thread: Miguel Ojeda: "Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Oct 08, 2024 at 02:12:51PM +0200, Andrew Lunn wrote:
> > Because of the might_resched() in might_sleep(), it will report the
> > quiescent state of the current CPU, and RCU will pass a grace period if
> > all CPUs have passed a quiescent state. So for example if someone writes
> > the following:
> >
> > <reader> <updater>
> > rcu_read_lock();
> > p = rcu_dereference(gp);
> > might_sleep():
> > might_resched():
> > todo = gp;
> > rcu_assign_pointer(gp, NULL);
> > synchronize_rcu();
> >
> > rcu_all_qs(); // report a quiescent state inside RCU read-side
> > // critical section, which may make a grace period
> > // pass even there is an active RCU reader
> >
> > kfree(todo);
> >
>
> You are obviously missing something here. The call that actually sleeps
>
> mutex_lock(&lock)
>
> > a = READ_ONCE(p->a); // UAF
> > rcu_read_unlock();
>
> A might_sleep() should be paired with something which does actually
> sleep, under some condition. At least, that is how it is used in C.

How do you guarantee the "should" part? How can a compiler detect a
might_sleep() that doesn't have a paired "something which does actually
sleep"? I feel like we are just talking through each other, what I was
trying to say is might_sleep() is unsafe because the rule of Rust safe
code (if we don't consider klint) and I'm using an example here to
explain why. And when we are talking about the safe/unsafe attribute of
a function, we cannot use the reasoning "this function should be always
used with another function".

> The iopoll being re-implemented here is an example of that.
>
> So take the might_sleep out above, just leaving the mutex_lock. If the
> mutex is uncontested, the code does not sleep and everything is O.K?
> If it needs to wait for the mutex, it triggers a UAF.
>
> The might_sleep() will also trigger a stack trace, if its is enabled,
> because you are not allowed to sleep inside rcu_read_lock(), it is an
> example of atomic context.

These functionalities you mentioned above are also provided by
__might_sleep(), no?

>
> As far as i see, might_sleep() will cause UAF where there is going to
> be a UAF anyway. If you are using it correctly, it does not cause UAF.
>

Again, I agree with your assumption that might_sleep() will always be
paired with a sleep function, but we cannot mark might_sleep() as safe
because of that. We can, however, mark might_sleep() as safe because
klint is supposed to cover the detection of atomic context violations.
But we have a better option: __might_sleep().

> > We probably call the reader side code a "wrong annotation", however,
> > it's still unsafe code because of the UAF. Also you seems to assume that
> > might_sleep() is always attached to a sleepable function, which is not
> > an invalid assumption, but we couldn't use it for reasoning the
> > safe/unsafe property of Rust functions unless we can encode this in the
> > type system.
>
> How are any of the sleeping call encoded in the type system? I assume

There's no easy way, something might work is introducing effect system
[1] into Rust, but that's very complicated and may take years. When
there's no easy way to encode something in the type system, it's usually
the time that unsafe comes to happen, an unsafe function can have a
requirement that cannot be easily detected by compilers, and via unsafe
block and safety comments, programmers provide the reasons why these
requirements are fulfilled.

> any use of a mutex lock, sleep, wait for completion, etc are not all
> marked as unsafe? There is some sort of wrapper around them? Why not

They are marked as safe because of the klint extension of safe Rust rule
I mentioned.

> just extend that wrapper to might_sleep().
>
> > For Rust code, without klint rule, might_sleep() needs to
> > be unsafe. So we have two options for might_sleep().
> >
> > * Since we rely on klint for atomic context detection, we can mark the
> > trivial wrapper (as what Alice presented in the other email) as safe,
> > but we need to begin to add klint annotation for that function, unless
> > Gary finds a smart way to auto-annotate functions.
>
> Are there klint annotations for all sleeping functions?
>

Not yet, klint is still WIP. But we generally agree that atomic context
violations should be detected by klint (instead of making sleep
functions unsafe or using type system to encode sleep functions).

> > * Instead of might_sleep(), we provide the wrapper of __might_sleep(),
> > since it doesn't have might_resched() in it, it should be safe. And
> > all we care about here is the debugging rather than voluntary context
> > switch. (Besides I think preempt=volunatry is eventually going to be
> > gone because of PREEMPT_AUTO [1], if that happens I think the
> > might_resched() might be dropped entirely).
>
> __might_sleep() might be safe, but your code is still broken and going
> to UAF at some point. Don't you want that UAF to happen more reliably
> and faster so you can find the issue? That would be the advantage of
> might_sleep() over __might_sleep().
>

Could you give me an example that might_sleep() can detect a bug while
__might_sleep() cannot? IIUC, __might_sleep() is the core of atomic
context detection in might_sleep(), so when CONFIG_DEBUG_ATOMIC_SLEEP=y,
__might_sleep() should detect all bugs that might_sleep() would detect.
Or you are talking about detecting even when
CONFIG_DEBUG_ATOMIC_SLEEP=n?

[1]: https://en.wikipedia.org/wiki/Effect_system

Regards,
Boqun

> Andrew

Next message: Mark Brown: "Re: [PATCH 1/2] ASoC: SOF: amd: Add error log for DSP firmware validation failure"
Previous message: Eric Dumazet: "Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address"
In reply to: Andrew Lunn: "Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function"
Next in thread: Miguel Ojeda: "Re: [PATCH net-next v2 5/6] rust: Add read_poll_timeout function"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]