Re: [PATCH v3 0/2] Support BPF traversal of wakeup sources

From: Greg Kroah-Hartman

Date: Fri Apr 03 2026 - 06:05:52 EST


On Thu, Apr 02, 2026 at 12:37:12PM -0700, Samuel Wu wrote:
> On Wed, Apr 1, 2026 at 9:06 PM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, Apr 01, 2026 at 12:07:12PM -0700, Samuel Wu wrote:
> > > On Wed, Apr 1, 2026 at 2:15 AM Greg Kroah-Hartman
> > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Mar 31, 2026 at 08:34:09AM -0700, Samuel Wu wrote:
> > > > > This patchset adds requisite kfuncs for BPF programs to safely traverse
> > > > > wakeup_sources, and puts a config flag around the sysfs interface.
> > > > >
> > > > > Currently, a traversal of wakeup sources require going through
> > > > > /sys/class/wakeup/* or /d/wakeup_sources/*. The repeated syscalls to query
> > > > > sysfs is inefficient, as there can be hundreds of wakeup_sources, with each
> > > > > wakeup source also having multiple attributes. debugfs is unstable and
> > > > > insecure.
> > > >
> > > > Describe "inefficient" please?
> > >
> > > Ack; I’ll provide a more detailed breakdown in the v4 cover letter. To
> > > summarize: the "inefficiency" isn't just the number of sources (150),
> > > but the fact that each source has 10 attributes. We are looking at
> > > 1,500+ sysfs nodes to get a full snapshot of the system.
> >
> > Wait, no, something is wrong here. You should NEVER be wanting to
> > combine multiple sysfs files at the same time into a "snapshot" of the
> > system because by virtue of how this works, it's going to change while
> > you are actually traversing all of those files!
>
> Agree, the current approach with sysfs might have stale values. The
> BPF approach holds a lock while traversing the list. It's not a
> perfect snapshot, but it's internally consistent and arguably better
> than the current sysfs implementation.
>
> > Why are you trying to read 1500+ sysfs files at once, and what are you
> > doing with that information? And if you really need it "all at once",
> > why can't we provide it for you in a sane manner, instead of being
> > forced to either walk the whole sysfs tree, or rely on a bpf script?
>
> The data is fundamental for debugging and improving power at scale.
> The original discussion and patch [1] provide more context of the
> intent. To summarize the history, debugfs was unstable and insecure,
> leading to the current sysfs implementation. However, sysfs has the
> constraint of one attribute per node, requiring 10 sysfs accesses per
> wakeup source.

Ok, as the sysfs api doesn't work your use case anymore, why do we need
to keep it around at all?

> That said, I completely agree that reading 1500+ sysfs files at once
> is unreasonable. Perhaps the sysfs approach was manageable at the time
> of [1], but moving forward we need a more scalable solution. This is
> the main motivator and makes BPF the sane approach, as it improves
> traversal in nearly every aspect (e.g. cycles, memory, simplicity,
> scalability).

I'm all for making this more scalable and work for your systems now, but
consider if you could drop the sysfs api entirely, would you want this
to be a different type of api entirely instead of having to plug through
these using ebpf?

thanks,

greg k-h