Re: [PATCH v2][RESEND] seq_file: don't set read position for invalid iterator

From: Tomasz Majchrzak
Date: Mon Oct 31 2016 - 05:32:29 EST


On Wed, Oct 26, 2016 at 11:17:13AM +0200, Miklos Szeredi wrote:
> On Wed, Oct 12, 2016 at 2:07 PM, Tomasz Majchrzak
> <tomasz.majchrzak@xxxxxxxxx> wrote:
> > If kernfs file is empty on a first read, successive read operations
> > using the same file descriptor will return no data, even when data is
> > available. Default kernfs 'seq_next' implementation advances iterator
> > position even when next object is not there. Kernfs 'seq_start' for
> > following requests will not return iterator as position is already on
> > the second object.
> >
> > This bug doesn't allow to monitor badblocks sysfs files from MD raid.
> > They are initially empty but if data appears at some stage, userspace is
> > not able to read it. It doesn't affect any released applications but it
> > is necessary for upcoming bad block support for external metadata in MD
> > raid.
>
> What is the expectation from the userspace application? Should
> seq_file support "tail -f" as more data is added to the file? AFAICS
> this patch doesn't address that generally, just the empty->nonempty
> transition (with the single record case).

Yes, it fixes only empty-nonempty transition. I believe it already works
fine ("tail -f") if the file cursor is in the middle of the file, however
mdmon never performs such operation - it "acknowledges" entry read from the
start of one sysfs file (unacknowledged_bad_blocks) via other sysfs file
(bad_blocks) and reads the sysfs file again from the start (there is new
data at position 0 at this stage).

My patch just resolves a regression introduced by a fix for multiple record
file: 4cdfe84b514 ("deal with the first call of ->show() generating no
output"). It makes it consistent with rest of the code in seq_file
(seq_read, traverse) - new iterator position is ignored if valid iterator
has not been returned.

> Why does userspace app not do open+read+close each time it polls the
> badblocks file?

This excerpt from mdadm/mdmon-design.txt file explains it:

"
The 'monitor' has the primary role of monitoring the array for
important state changes and updating the metadata accordingly. As
writes to the array can be blocked until 'monitor' completes and
acknowledges the update, it much be very careful not to block itself.
In particular it must not block waiting for any write to complete else
it could deadlock. This means that it must not allocate memory as
doing this can require dirty memory to be written out and if the
system choose to write to the array that mdmon is monitoring, the
memory allocation could deadlock.

So 'monitor' must never allocate memory and must limit the number of
other system call it performs. It may:
- use select (or poll) to wait for activity on a file descriptor
- read from a sysfs file descriptor
- write to a sysfs file descriptor
- write the metadata out to the block devices using O_DIRECT
- send a signal (kill) to the manager thread

It must not e.g. open files or do anything similar that might allocate
resources.
"

Regards,

Tomek

> >
> > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx>
> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > ---
> > fs/seq_file.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> > diff --git a/fs/seq_file.c b/fs/seq_file.c
> > index 6dc4296..74197f4 100644
> > --- a/fs/seq_file.c
> > +++ b/fs/seq_file.c
> > @@ -235,7 +235,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
> > p = m->op->start(m, &pos);
> > while (1) {
> > err = PTR_ERR(p);
> > - if (!p || IS_ERR(p))
> > + if (IS_ERR_OR_NULL(p))
> > break;
> > err = m->op->show(m, p);
> > if (err < 0)
> > @@ -244,7 +244,8 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
> > m->count = 0;
> > if (unlikely(!m->count)) {
> > p = m->op->next(m, p, &pos);
> > - m->index = pos;
> > + if (!IS_ERR_OR_NULL(p))
> > + m->index = pos;
> > continue;
> > }
> > if (m->count < m->size)
> > --
> > 1.8.3.1
> >