Re: [PATCH resend v2] tty: n_tty -- Add new TIOCPEEKRAW ioctl to peek unread data
From: Cyrill Gorcunov
Date: Mon Apr 11 2016 - 05:02:00 EST
On Sun, Apr 10, 2016 at 10:43:28PM -0700, Peter Hurley wrote:
> Hi Cyrill,
>
> On 04/07/2016 05:53 AM, Cyrill Gorcunov wrote:
> > Currently when we checkpoint PTY connections we ignore data queued inside
> > read buffer of ldisk, such data is barely lost after the restore. So we
> > would like to have an ability to dump and restore it.
Hi Peter! First of all, thanks a huge for comments!
> Do you want to be able to restore the exact state (like column and
> newline markers)?
>
> I'm wondering if a for-purpose checkpoint/restore ioctl would be more
> appropriate?
Strictly speaking saving complete state would be ideal but this require
exporting internal n_tty structure into user space as api (sure we
can name the members in any way we like, but this would force then
to keep backward compatibility when n_tty code get changed in future).
I think loosing some information like column and marker is acceptable
and fetching unread buffers is more general. I can try to implement
c/r'ing ioctl though which would not only fetch buffers but column
and such, and see how it goes, sounds ok?
> A generic peek() may find other uses which could make future improvements
> difficult.
>
> Plus this ioctl() is only reading the 4k line discipline read buffer, and
> not the tty buffers which could contain up to another 8k of unprocessed
> input.
Which is scheduled for flush into port and then landed into ld when
write() called. True, I managed to miss this moment, thanks! So the
proper general peek() handling should be implemented on tty code level
rather than ld, and fetch both -- start with tty buffer and proceed with
ld buffer then, right?
> > Here is a new ioctl code which simply copies data from read buffer into
> > the userspace without any additional processing (just like terminal is
> > sitting in a raw mode).
> >
> > The idea behind is when we checkpointing PTY pairs we peek this unread
> > data into images on disk and on restore phase we find the linked peer
> > and write it back thus reading peer will see it in read buffer.
>
> Have you frozen every process with either end open? I could see how this would
> be easy for an entire process group but what if the processes are unrelated?
> How do you ensure you have the correct linked peer with multiple devpts instances?
Currently we freeze all processes which are requested for dumping and then
we walk over all files. Once tty peer is met (actually we support pty
and vt only at the moment) we remember its index via TIOCGPTN (for master
peers) and expect to find appropriate slave peer. You know, having multiple
devpts mounted may be a problem actually in current criu code, I track indices
via one global array but rather should do that via per-sb basis.
Still we support that named external peers, where one of the ends do not
belong process tree we're dumping. In such scenario it's allowed to not
dump queued data because we don't control external peer anyhow.
> > I tried several approaches (including simply reuse of canon_copy_from_read_buf
> > and copy_from_read_buf) but it makes code more complicated, so I end up
> > in plain copying of buffer data.
> >
> > Strictly speaking, we could try handle it inside CRIU itself, as
> > Peter Hurley proposed (switch terminal to raw mode, read data,
> > and return original termios back) but in this case we may hit
> > the scenario when we read data and failed to restore it back
> > (due to any error) leaving the task we're dumping without
> > read buffer, thus peeking data looks like be the only
> > safe way.
>
> Code comments below.
>
>
> > v2:
> > - Use @char in ioctl definition.
> >
> > Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxxxxx>
> > CC: Jiri Slaby <jslaby@xxxxxxxx>
> > CC: Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>
> > CC: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > CC: Andrey Vagin <avagin@xxxxxxxxxxxxx>
> > CC: Pavel Emelianov <xemul@xxxxxxxxxxxxx>
> > CC: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
> > CC: Konstantin Khorenko <khorenko@xxxxxxxxxxxxx>
> > ---
> > drivers/tty/n_tty.c | 22 ++++++++++++++++++++++
> > include/uapi/asm-generic/ioctls.h | 1 +
> > 2 files changed, 23 insertions(+)
> >
> > Index: linux-ml.git/drivers/tty/n_tty.c
> > ===================================================================
> > --- linux-ml.git.orig/drivers/tty/n_tty.c
> > +++ linux-ml.git/drivers/tty/n_tty.c
> > @@ -2420,6 +2420,26 @@ static unsigned long inq_canon(struct n_
> > return nr;
> > }
> >
> > +static ssize_t n_tty_peek_raw(struct tty_struct *tty, unsigned char __user *buf)
>
> function name is redundant; n_tty_peek() is sufficient.
>
> Needs a buffer-length parameter; could return the size required to completely
> copy the ldisc read buffer, if smaller than required.
Seems better to provide some kind of structure with data address
and size would be kept, on read we can update @size and put there
the number of bytes unread if buffer is too small.
> > +{
> > + struct n_tty_data *ldata = tty->disc_data;
> > + ssize_t retval;
> > +
> > + if (!mutex_trylock(&ldata->atomic_read_lock))
> > + return -EAGAIN;
> > +
> > + down_read(&tty->termios_rwsem);
>
> Why take these two locks if parallel operations are not expected?
> Conversely, I would expect this to assert current state is TASK_TRACED.
If someone is already reading the data it will either exit soon with
data read or will sit here waiting for data to appear. So I thought
if we're fetching unread data we should not interfere with any other
readers? If someone is sitting in read procedure it implies that there
either no data on buffer and reader is waiting for it, or the read
procedure will complete soon, but indeed I rather should be checking
for read_cnt locklessly and return -EAGAIN if > 0 and can't lock
the atomic_read_lock. I implied that task can be in non-traced state.
If require traced indeed we won't need the lock. For criu lockless
+ tracing state is fine but i thought someone else might be needing
to peek data without tracing the task. Hm?
> Assuming the other pty end is halted, this ioctl would also need to
> wait for the input kworker to stop:
Yes, thanks!
>
> --- >% ---
> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
> index a946e49..744cb87 100644
> --- a/drivers/tty/tty_buffer.c
> +++ b/drivers/tty/tty_buffer.c
> @@ -614,3 +614,8 @@ bool tty_buffer_cancel_work(struct tty_port *port)
> {
> return cancel_work_sync(&port->buf.work);
> }
> +
> +void tty_buffer_flush_work(struct tty_port *port)
> +{
> + flush_work(&port->buf.work);
> +}
> diff --git a/include/linux/tty.h b/include/linux/tty.h
> index bf1bcdb..a541132 100644
> --- a/include/linux/tty.h
> +++ b/include/linux/tty.h
> @@ -480,6 +480,7 @@ extern void tty_buffer_init(struct tty_port *port);
> extern void tty_buffer_set_lock_subclass(struct tty_port *port);
> extern bool tty_buffer_restart_work(struct tty_port *port);
> extern bool tty_buffer_cancel_work(struct tty_port *port);
> +extern void tty_buffer_flush_work(struct tty_port *port);
> extern speed_t tty_termios_baud_rate(struct ktermios *termios);
> extern speed_t tty_termios_input_baud_rate(struct ktermios *termios);
> extern void tty_termios_encode_baud_rate(struct ktermios *termios,
>
> tty_buffer_flush_work(tty->port);
>
> > + retval = read_cnt(ldata);
> > + if (retval) {
> > + const unsigned char *from = read_buf_addr(ldata, ldata->read_tail);
> > + if (copy_to_user(buf, from, retval))
> > + retval = -EFAULT;
> > + }
> > + up_read(&tty->termios_rwsem);
> > + mutex_unlock(&ldata->atomic_read_lock);
> > + return retval;
> > +}
> > +
> > static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
> > unsigned int cmd, unsigned long arg)
> > {
> > @@ -2437,6 +2457,8 @@ static int n_tty_ioctl(struct tty_struct
> > retval = read_cnt(ldata);
> > up_write(&tty->termios_rwsem);
> > return put_user(retval, (unsigned int __user *) arg);
> > + case TIOCPEEKRAW:
>
> As with the function name, the ioctl symbol is redundant; simply TIOCPEEK
> is ok.
OK