RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1

From: Shiju Jose
Date: Tue Dec 03 2024 - 10:50:15 EST


>-----Original Message-----
>From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Sent: 29 November 2024 13:23
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>; Steven Rostedt <rostedt@xxxxxxxxxxx>
>Cc: dave.jiang@xxxxxxxxx; dan.j.williams@xxxxxxxxx; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; alison.schofield@xxxxxxxxx;
>nifan.cxl@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>dave@xxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>Linuxarm <linuxarm@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
>Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>>-----Original Message-----
>>From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>>Sent: 28 November 2024 10:02
>>To: Steven Rostedt <rostedt@xxxxxxxxxxx>
>>Cc: dave.jiang@xxxxxxxxx; dan.j.williams@xxxxxxxxx; Jonathan Cameron
>><jonathan.cameron@xxxxxxxxxx>; alison.schofield@xxxxxxxxx;
>>nifan.cxl@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>>dave@xxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx;
>>linux-kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>;
>>tanxiaofei <tanxiaofei@xxxxxxxxxx>; Zengtao (B)
>><prime.zeng@xxxxxxxxxxxxx>
>>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event
>>Record to CXL spec rev 3.1
>>
[...]
>
>Hi Steve,
>
>I debug this case and please find the info, 1. rasdaemon : read() from format
>file return around 1/3rd of the
> actual data in the file only when the total size of the format's file
> is above 4K bytes (page size), For example, in this case, the total size was 4512
>bytes,
> but read only 1674 bytes.
> This partial data resulted tep_parse_event() in libtraceevent failed internally
>in the parse_format()
> and since __parse_event() does not return error when parse_format() fail,
> thus initialization of the event does not fail.
>
> The following solution in rasdaemon solved the issue,
> (provided if no other fix for the above issue with read()),
> 1. read() and accumulate content of format file until EOF reached.
> 2. Increased the buffer size from 4K bytes to 8K bytes.
> 3. May be __parse_event() in libtraceevent and thus tep_parse_event()
>return error
> when parse_format() fail so that the initialization would fail if the input
>data is
> corrupted or partial?

Hi Steve,

Identified the root cause of this issue in the kernel:
The read() function for the event's format file calls seq_read() and seq_read_iter() in the kernel,
which allocates a buffer of PAGE_SIZE (4 KB) for sequential reads. However, it
detects an overflow in the following marked call (seq_has_overflowed()) and
returns with partial data read.

=====================================

File : https://elixir.bootlin.com/linux/v6.13-rc1/source/fs/seq_file.c

ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct seq_file *m = iocb->ki_filp->private_data;
[...]
/* grab buffer if we didn't have one */
if (!m->buf) {
---------> m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
if (!m->buf)
goto Enomem;
}
// something left in the buffer - copy it out first
[...]
// get a non-empty record in the buffer
m->from = 0;
p = m->op->start(m, &m->index);
while (1) {
err = PTR_ERR(p);
if (!p || IS_ERR(p)) // EOF or an error
break;
[...]
}
// EOF or an error
m->op->stop(m, p);
m->count = 0;
goto Done;
Fill:
// one non-empty record is in the buffer; if they want more,
// try to fit more in, but in any case we need to advance
// the iterator once for every record shown.
while (1) {
size_t offs = m->count;
loff_t pos = m->index;

[...]
err = m->op->show(m, p);
if (err > 0) { // ->show() says "skip it"
m->count = offs;
----------> } else if (err || seq_has_overflowed(m)) {
m->count = offs;
break;
}
}
m->op->stop(m, p);
n = copy_to_iter(m->buf, m->count, iter);
copied += n;
m->count -= n;
m->from = n;
Done:
[...]
goto Done;
}
EXPORT_SYMBOL(seq_read_iter);
====================================

The following fix in the kernel's seq_read_iter() allows userspace to read the
content of the format file etc in one go if the requested size exceeds PAGE_SIZE,
and resolves the parsing error caused by the event's format file exceeding PAGE_SIZE.

====================================
diff --git a/fs/seq_file.c b/fs/seq_file.c index e676c8b0cf5d..f1f1af180562 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -207,7 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)

/* grab buffer if we didn't have one */
if (!m->buf) {
- m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
+ if (iter->count % PAGE_SIZE)
+ m->size = ((iter->count / PAGE_SIZE) + 1) * PAGE_SIZE;
+ else
+ m->size = (iter->count / PAGE_SIZE) * PAGE_SIZE;
+ m->buf = seq_buf_alloc(m->size);
if (!m->buf)
goto Enomem;
}
====================================

Can you suggest which fix is the right one, kernel based fix or rasdaemon based fix (which
mentioned in the previous email)?

Thanks,
Shiju
>>
>>>
>>>>
>>>> I've also attached another format file,
>>>> "format_cxl_general_media_v3.1_full",
>>>> which contains the complete TP_printk() intended.
>>>
[...]
>>>-- Steve
>>
>