Re: Strange intermittent EIO error when writing to stdout since v3.8.0

From: Mikael Pettersson
Date: Thu Jun 06 2013 - 10:44:31 EST


Markus Trippelsdorf writes:
> Since v3.8.0 several people reported intermittent IO errors that happen
> during high system load while using "emerge" under Gentoo:
> ...
> File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 260, in iteration
> if not x.callback(f, event, *x.args):
> File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 99, in _output_handler
> stdout_buf[os.write(stdout_fd, stdout_buf):]
> File "/usr/lib64/portage/pym/portage/__init__.py", line 246, in __call__
> rval = self._func(*wrapped_args, **wrapped_kwargs)
> OSError: [Errno 5] Input/output error
>
> Basically 'emerge' just writes the build output to stdout in a loop:
> ...
> def _output_handler(self, fd, event):
>
> background = self.background
> stdout_fd = self.stdout_fd
> log_file = self._log_file
>
> while True:
> buf = self._read_buf(fd, event)
>
> if buf is None:
> # not a POLLIN event, EAGAIN, etc...
> break
>
> if not buf:
> # EOF
> self._unregister()
> self.wait()
> break
>
> else:
> if not background and stdout_fd is not None:
> failures = 0
> stdout_buf = buf
> while stdout_buf:
> try:
> stdout_buf = \
> stdout_buf[os.write(stdout_fd, stdout_buf):]
> except OSError as e:
> if e.errno != errno.EAGAIN:
> raise
> ...
>
> see: https://bugs.gentoo.org/show_bug.cgi?id=459674
>
> (A similar issue also happens when building Firefox since v3.8.0. But
> because Firefox's build process doesn't raise an exception it just dies
> at random points without giving a clue.)
>
> Now the question is: Could this be a kernel bug? Maybe in the TTY layer?
>
> Unfortunately the issue is not easily reproducible and a git-bisect is
> out of the question.

I'm seeing a similar regression. I do a lot of gcc bootstraps and regression
test suite runs, and for the bootstraps I do

make -jN bootstrap |& tee build-log

(tcsh syntax, adjust as appropriate for your preferred shell) to get a complete
log for later inspection in case of error. N is typically the number of cores
or threads on the machine, e.g. -j8 on my Core-i7 IVB.

Up to the 3.7 kernel this never had any problems. Starting with the 3.8 kernel,
or possibly 3.9-rc1, this usually dies at some random point with an EIO.

I haven't had time to bisect it.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/