Re: [PATCH v2 4/7] sysfs: Add SYSFS_HUGE_BIN_FILE flag for binary attributes larger than PAGE_SIZE

From: M K, Muralidhara

Date: Thu May 14 2026 - 10:29:53 EST




On 5/13/2026 11:54 AM, Greg KH wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Wed, May 13, 2026 at 09:43:57AM +0530, K Prateek Nayak wrote:
Hello Greg,

On 5/12/2026 5:31 PM, Greg KH wrote:
On Mon, Apr 27, 2026 at 09:21:26PM +0530, Muralidhara M K wrote:
Historically, sysfs read buffers were allocated with get_zeroed_page(),
limiting reads to PAGE_SIZE. Commit 13c589d5b0ac ("sysfs: use seq_file
when reading regular files") transitioned regular (text) attribute reads
to seq_file, which can dynamically grow buffers beyond PAGE_SIZE.
However, the PAGE_SIZE limit was intentionally preserved for
compatibility. When binary attribute handling was later unified into
the same codebase, the non-seq_file read path (kernfs_file_read_iter)
retained this PAGE_SIZE cap for binary files as well.

Drivers that expose binary attributes larger than PAGE_SIZE — such as
the AMD HSMP metric table (~13 KB) — cannot deliver the full content
in a single read() call through the existing path.

That's fine, userspace must be able to handle a "short" read, and will
just continue on and read everything afterward, right? You can't rely
on userspace always asking for more data.

I think this is complicated by the HSMP driver bits that requires the
read to issue a HSMP command to the hardware first to updates the
table before copying from the MMIO region.

Then you have bigger problems here :(

If a concurrent reader arrives, they'll refresh the table for their
PAGE_SIZE chunk read and the prior user will see a torn value. For
most part it shouldn't be a problem but folks try to co-relate the
Temperature and Power data from the first chunk with the Throttle
Indicators in the second chunk and sometimes, they don't match the
expectations.

Again, this is a problem, perhaps do not use sysfs for this? You can't
control userspace, and to expect it to always work properly is not going
to end well. This change isn't going to fix your problems listed above
at all.

The table should never have grown this big but some folks decided it
was a good idea and we can't fix it for a while and have hit the
PAGE_SIZE limit now.

Just delete it and use a different interface to the kernel instead
please. If you need atomic read/writes, use an ioctl. Don't try to fix
sysfs into something that it was not designed for at all.


Thank you for the suggestion, Greg. The IOCTL interface approach works well. I'll implement this, test and send the next version for review.

If there is a better alternate, we are all ears, and more than happy
to try out an alternative suggestion for the described problem.

A misc device sounds like the properly solution.

Introduce a new opt-in flag SYSFS_HUGE_BIN_FILE (040000) that drivers
can OR into their bin_attribute mode. When set, sysfs selects a new
kernfs_ops (sysfs_bin_kfops_huge_file_ro) whose .seq_show callback
pipes the bin_attribute ->read() result through seq_file, allowing
reads of arbitrary size in one shot. Existing binary attributes
without the flag continue using the legacy capped path.

If this is such a big issue, why not just do it always for binary files?
What is the benefit of keeping two different code paths just for this
"new" flag?

We can do that! For bin attributes that specify .size or a size
function, we can use a flexible buffer and for the ones that don't, we
can enforce a PAGE_SIZE cap like today.

Would that be okay?

Overall, yes, but again, I don't think this is going to fix your
problem.

thanks,

greg k-h