RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
From: Shiju Jose
Date: Wed Dec 04 2024 - 06:36:58 EST
>-----Original Message-----
>From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Sent: 03 December 2024 15:22
>To: Steven Rostedt <rostedt@xxxxxxxxxxx>
>Cc: dave.jiang@xxxxxxxxx; dan.j.williams@xxxxxxxxx; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; alison.schofield@xxxxxxxxx;
>nifan.cxl@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>dave@xxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>Linuxarm <linuxarm@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
>Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>[...]
>>
>>Hi Steve,
>>
>>I debug this case and please find the info, 1. rasdaemon : read() from
>>format file return around 1/3rd of the
>> actual data in the file only when the total size of the format's file
>> is above 4K bytes (page size), For example, in this case, the total
>>size was 4512 bytes,
>> but read only 1674 bytes.
>> This partial data resulted tep_parse_event() in libtraceevent
>>failed internally in the parse_format()
>> and since __parse_event() does not return error when parse_format() fail,
>> thus initialization of the event does not fail.
>>
>> The following solution in rasdaemon solved the issue,
>> (provided if no other fix for the above issue with read()),
>> 1. read() and accumulate content of format file until EOF reached.
>> 2. Increased the buffer size from 4K bytes to 8K bytes.
>> 3. May be __parse_event() in libtraceevent and thus
>>tep_parse_event() return error
>> when parse_format() fail so that the initialization would fail
>>if the input data is
>> corrupted or partial?
>
>Hi Steve,
>
>Identified the root cause of this issue in the kernel:
>The read() function for the event's format file calls seq_read() and
>seq_read_iter() in the kernel, which allocates a buffer of PAGE_SIZE (4 KB) for
>sequential reads. However, it detects an overflow in the following marked call
>(seq_has_overflowed()) and returns with partial data read.
>
>=====================================
>
>File : https://elixir.bootlin.com/linux/v6.13-rc1/source/fs/seq_file.c
>
>ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter) {
> struct seq_file *m = iocb->ki_filp->private_data; [...]
> /* grab buffer if we didn't have one */
> if (!m->buf) {
>---------> m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
> if (!m->buf)
> goto Enomem;
> }
> // something left in the buffer - copy it out first [...]
> // get a non-empty record in the buffer
> m->from = 0;
> p = m->op->start(m, &m->index);
> while (1) {
> err = PTR_ERR(p);
> if (!p || IS_ERR(p)) // EOF or an error
> break;
>[...]
> }
> // EOF or an error
> m->op->stop(m, p);
> m->count = 0;
> goto Done;
>Fill:
> // one non-empty record is in the buffer; if they want more,
> // try to fit more in, but in any case we need to advance
> // the iterator once for every record shown.
> while (1) {
> size_t offs = m->count;
> loff_t pos = m->index;
>
>[...]
> err = m->op->show(m, p);
> if (err > 0) { // ->show() says "skip it"
> m->count = offs;
>----------> } else if (err || seq_has_overflowed(m)) {
> m->count = offs;
> break;
> }
> }
> m->op->stop(m, p);
> n = copy_to_iter(m->buf, m->count, iter);
> copied += n;
> m->count -= n;
> m->from = n;
>Done:
>[...]
> goto Done;
>}
>EXPORT_SYMBOL(seq_read_iter);
>====================================
>
>The following fix in the kernel's seq_read_iter() allows userspace to read the
>content of the format file etc in one go if the requested size exceeds PAGE_SIZE,
>and resolves the parsing error caused by the event's format file exceeding
>PAGE_SIZE.
>
>====================================
>diff --git a/fs/seq_file.c b/fs/seq_file.c index e676c8b0cf5d..f1f1af180562
>100644
>--- a/fs/seq_file.c
>+++ b/fs/seq_file.c
>@@ -207,7 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter
>*iter)
>
> /* grab buffer if we didn't have one */
> if (!m->buf) {
>- m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
>+ if (iter->count % PAGE_SIZE)
>+ m->size = ((iter->count / PAGE_SIZE) + 1) * PAGE_SIZE;
>+ else
>+ m->size = (iter->count / PAGE_SIZE) * PAGE_SIZE;
>+ m->buf = seq_buf_alloc(m->size);
> if (!m->buf)
> goto Enomem;
> }
>====================================
>
>Can you suggest which fix is the right one, kernel based fix or rasdaemon based
>fix (which mentioned in the previous email)?
Following change for reading trace event's format file in the kernel worked fine to fix
the parsing error, which seems better than changing the common seq_read().
============================================
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7266ec2a4eea..9cb1c5a1f070 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1674,6 +1674,35 @@ static int trace_format_open(struct inode *inode, struct file *file)
return 0;
}
+/**
+ * trace_format_read - read() method for format file.
+ * @file: the file to read from
+ * @buf: the buffer to read to
+ * @size: the maximum number of bytes to read
+ * @ppos: the current position in the file
+ *
+ * * Return:
+ * * %0 - No bytes copied (EOF).
+ * * %>0 - Number of bytes copied.
+ * * %<0 - Error code.
+ */
+static ssize_t trace_format_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
+{
+ ssize_t copied = 0;
+ ssize_t ret;
+
+ do {
+ ret = seq_read(file, buf + copied, size - copied, ppos);
+ if (ret < 0)
+ return ret;
+ copied += ret;
+ if (copied >= size)
+ break;
+ } while (ret);
+
+ return copied;
+}
+
#ifdef CONFIG_PERF_EVENTS
static ssize_t
event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
@@ -2157,7 +2186,7 @@ static const struct file_operations ftrace_enable_fops = {
static const struct file_operations ftrace_event_format_fops = {
.open = trace_format_open,
- .read = seq_read,
+ .read = trace_format_read,
.llseek = seq_lseek,
.release = seq_release,
};
=============================================
>
>[...]
>>>>-- Steve
>>>
>>
>
Thanks,
Shiju