Re: Fw: 2.5.61 oops running SDET

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sun Feb 16 2003 - 13:07:36 EST

Next message: Rahul Vaidya: "Re: linux 2.5.53 not compiling"
Previous message: Rahul Vaidya: "Re: linux 2.5.53 not compiling"
Next in thread: Martin J. Bligh: "Re: Fw: 2.5.61 oops running SDET"
Reply: Martin J. Bligh: "Re: Fw: 2.5.61 oops running SDET"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 15 Feb 2003, Martin J. Bligh wrote:
>
> OK, I did the following, which is what I think you wanted, plus Zwane's
> observation that task_state acquires the task_struct lock (we're the only
> caller, so I just removed it), but I still get the same panic and this time
> the box hung.

Yeah, a closer look shows that the exit path doesn't actually take the
task lock at all around any of the signal stuff, so the lock protects
"task->mm", "task->files" and "task->fs", but it does NOT protect
"task->signal" or "task->sighand" (illogical, but true).

Oh, damn.

The core that checks for "p->sighand" takes the tasklist lock for this
reason (see "collect_sigign_sigcatch()"

So the choice seems to be either:

- make the exit path hold the task lock over the whole exit path, not
just over mm exit.

- take the "tasklist_lock" over more of "task_sig()" (not just the
collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending"
rendering too.

The latter is a two-liner. The former is the "right thing" for multiple
reasons.

The reason I'd _like_ to see the task lock held over _all_ of the fields
in the exit() path is:

- right now we actually take it and drop it multiple times in exit. See
__exit_files(), __exit_fs(), __exit_mm(). Which all take it just to
serialize setting ot "mm/files/fs" to NULL.

- task_lock is a nice local lock, no global scalability impact.

So it really would be much nicer to do something like this in do_exit():

        struct mm_struct *mm;
        struct files_struct *files;
        struct fs_struct *fs;
        struct signal_struct *signal;
        struct sighand_struct *sighand;

        task_lock(task);
        mm = task->mm;
        files = task->files;
        fs = task->fs;
        signal = task->signal;
        sighand = task->sighand;

        task->mm = NULL;
        task->files = NULL;
        task->fs = NULL;
        task->signal = NULL;
        task->sighand = NULL;
        task_unlock(task);

.. actually do the "__exit_mm(task, mm)" etc here ..

which would make things a lot more readable, and result in us taking the
lock only _once_ instead of three times, and would properly protect
"signal" and "sighand" so that the /proc code wouldn't need to take the
heavily used "tasklist_lock" just to read the signal state for a single
task.

But fixing up exit to do the above would require several (trivial) calling
convention changes, like changing

        static inline void __exit_mm(struct task_struct * tsk)
        {
                struct mm_struct *mm = tsk->mm;
                ...

into

        static inline void __exit_mm(struct task_struct * tsk,
                        struct mm_struct *mm)
        {
                ...

instead and updatign the callers.

Is anybody willing to do that (hopefully fairly trivial) fixup and test
it, or should we go with the stupid "take the 'tasklist_lock'" approach?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rahul Vaidya: "Re: linux 2.5.53 not compiling"
Previous message: Rahul Vaidya: "Re: linux 2.5.53 not compiling"
Next in thread: Martin J. Bligh: "Re: Fw: 2.5.61 oops running SDET"
Reply: Martin J. Bligh: "Re: Fw: 2.5.61 oops running SDET"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Feb 23 2003 - 22:00:15 EST