I could produce a short program that reproduces the symptom.
Ishikawa wrote:
> Hi,
>
> I am writing this message to a few e-mail aliases.
>
> This is because I could not figure out what is the cause of the problem.
>
> Does anyone have an idea what causes this problem?
>
> Observed Platform.
> Debian Gnu/Linux 2.2.14, 2.2.15, 2.2.16pre2
>
> Observed problem.
>
> A particular program, called `prog' in the following,
> invoked like the following manner from shell command line doesn't return
>
> to the shell prompt. It never returns.
>
> ./prog -q < inputfile > outputfile 2>&1
Hi,
I could produce a short version of the C program that showed
the symptom.
Looks to me there is a problem in handling opened tty ports when
_exit() is called.
Bash doesn't seem to be the cause of the problem. I am CC:ing to bug-bash
to
let you know this.
The C code is about 285 lines of code without the lengthy beginning comment
which
I attach below.
Please drop me a line if you need to take a look at the source code for
debugging, curiosity, etc..
My suggested fix:
I think kernel ought to close the tty ports forcibly if closing is
requested from within _exit(). [And losing the
written data which still lay in the buffer. But this is the
program's intention. Exiting cleanly is probably more important concern
here. ]
Happy Hacking!
Chiaki Ishikawa
--- begin quote ---
*
*
* $Id: newtest.c,v 1.1 2000/05/20 14:38:32 ishikawa Exp ishikawa $
*
* You need to have two UNUSED serial ports.
* You must not connect anything to it.
*
* (Actually you can connect the two ports with
* a cross cable (null-modem cable) and the
* the resulting symptom, that the calling shell
* does not return to the shell prompt, doesn't seem
* to appear even on Linux !!!
* On solaris, the (original) program exits cleanly,
* without such a cable.
*
* Overview:
*
* This program opens the two serial ports for
* read/write. (/dev/ttyS[01]. You need to make these
* world-read/writable if this program is run from normal
* user account.)
*
* This program, then, sets the termios characteristics of the
* serial ports for 8bit, even parity, one stop bit
* in the raw mode ( no processing at all.).
* Flow control is hardware control, etc..
*
* Then it enters a loop.
* For each loop step,
* it calls usleep to sleep for a short period of time.
*
* usleep() is a library function. It calls nanosleep, a system
* function. My reference to nanosleep() in previous postings might have
* been a little confusing. The source code doesn't mention nanosleep() at
* all. If you need to screen the verbose trace from strace, then you
* need to say something like, strace -e trace=\!nanosleep,read -p PID.
*
* Back to the description of the program.
* Then it calls time() to check the wall clock.
* At each of these iterations, it
* tries to read a character from each of the port (if any.).
* The read() wouldn't block since termios has been set up in such a
* way, read returns immediately if no character is available, and
* returns min(available chars, requested chars) if any.
* It also writes one byte to the port.
* It then updates the notion of the relative time since
* the beginning of the program invocation. If one second
* has passed since the last update, it prints the duration (sec).
*
* This iteration is repeated for two minutes and
* the program calls exit(0).
*
* The problem symptom is this:
* Here after the program calls exit(0),
* the calling shell prompt doesn't return if no cross cable is
* connected between the serial ports when this program is executed.
* This happens on Linux.
* This problem doesn't happen on Solaris 7 for x86.
*
* On linux, ps output shows something like this: Note that
* newtest appears inside a pair of "[]".
* I am running the shell inside Emacs shell buffer.
*
* 378 ttyp3 S 0:00 /bin/bash -i
* 582 ttyp3 SW 0:00 [newtest] <--- here!
* 584 ttyp1 R 0:00 ps axg
*
* ps axglw showed
*
* 000 1001 378 351 0 0 2448 1280 wait4 S ttyp3 0:00
/bin/bash -i
* 004 1001 582 378 0 0 0 0 tty_wa SW ttyp3 0:00
[newtest]
*
* At this stage, the output from the program is like this, and
* the shell prompt has not returned yet.
* ...., 119, 120, 120 sec. quitting...
* By monitoring the system calls executed by this program using
* strace, I know that _exit(0) has been called by then.
*
* After a lot of experimenting, I have found out that
* if the two ports are connected via cross cable, the
* shell prompt returns(!).
* That I found no problem back in early April and March was
* probably I had cable hooked up to these ports back then.
*
* But again, the problem didn't happen on Solaris 7 for x86 (without
* any cable at all).
* For solaris, you need to change the name of the tty device.
*
* I am not sure what the "tty_wa" in the "ps axglw" output means.
* Waiting for something?
* But, since _exit(0) by means of exit(0) has been called,
* shouldn't the process exit immediately and SIGCHLD be
* passed to the parent immediately, too?
*
* From, linux man page for _exit(2)
* --- begin quote ---
* DESCRIPTION
* _exit terminates the calling process immediately. Any open
* file descriptors belonging to the process are closed; any
* children of the process are inherited by process 1, init,
* and the process's parent is sent a SIGCHLD signal.
*
* status is returned to the parent process as the process's
* exit status, and can be collected using one of the wait
* family of calls.
* --- end quote ---
*
* (OK, I see there must be a problem in
* closing of the file descriptors for ttys? Hmm... )
* Shouldn't we forcibly close the tty in this case when _exit()
* request such actions?
*
* [ This program is a very shortened version of
* a program to explain the event-driven programming, in
* which an event-type is the arrival of a certain packet
* from a device connected to serial port.
* The intention was to produce a skelton code that can
* be shown to programmers who might later need to port
* the skelton code to DOS(aga!), very simple embedded OS, and
* other OSs. (No select call, for example, for portability
* reasons. )
* ]
*/
--- end quote ---
PS: writing to the serial port (unconnected) seems to trigger the problem.
The data presumably lay waiting in the buffer associated with the serial
line.
I tested the above program on linux 2.2.16pre3 (Alan Cox's pre-patch didn't
update
the uname -a output: it still says 2.2.16pre2 when in fact it is pre3.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:19 EST