Re: [PATCH 1/2] fput: task_work_add() can fail if the caller haspassed exit_task_work()

From: Andrew Morton
Date: Fri Jun 14 2013 - 17:58:39 EST


On Fri, 14 Jun 2013 21:09:47 +0200 Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> fput() assumes that it can't be called after exit_task_work() but
> this is not true, for example free_ipc_ns()->shm_destroy() can do
> this. In this case fput() silently leaks the file.
>
> Change it to fallback to delayed_fput_work if task_work_add() fails.
> The patch looks complicated but it is not, it changes the code from
>
> if (PF_KTHREAD) {
> schedule_work(...);
> return;
> }
> task_work_add(...)
>
> to
> if (!PF_KTHREAD) {
> if (!task_work_add(...))
> return;
> /* fallback */
> }
> schedule_work(...);
>
> As for shm_destroy() in particular, we could make another fix but I
> think this change makes sense anyway. There could be another similar
> user, it is not safe to assume that task_work_add() can't fail.
>
> ...
>
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -306,17 +306,18 @@ void fput(struct file *file)
> {
> if (atomic_long_dec_and_test(&file->f_count)) {
> struct task_struct *task = current;
> + unsigned long flags;
> +
> file_sb_list_del(file);
> - if (unlikely(in_interrupt() || task->flags & PF_KTHREAD)) {
> - unsigned long flags;
> - spin_lock_irqsave(&delayed_fput_lock, flags);
> - list_add(&file->f_u.fu_list, &delayed_fput_list);
> - schedule_work(&delayed_fput_work);
> - spin_unlock_irqrestore(&delayed_fput_lock, flags);
> - return;
> + if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> + init_task_work(&file->f_u.fu_rcuhead, ____fput);
> + if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
> + return;

A comment here would be useful, explaining the circumstances under
which we fall through to the delayed fput. This is particularly needed
because kernel/task_work.c is such undocumented crap.

This?

--- a/fs/file_table.c~fput-task_work_add-can-fail-if-the-caller-has-passed-exit_task_work-fix
+++ a/fs/file_table.c
@@ -313,6 +313,12 @@ void fput(struct file *file)
init_task_work(&file->f_u.fu_rcuhead, ____fput);
if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
return;
+ /*
+ * After this task has run exit_task_work(),
+ * task_work_add() will fail. free_ipc_ns()->
+ * shm_destroy() can do this. Fall through to delayed
+ * fput to avoid leaking *file.
+ */
}
spin_lock_irqsave(&delayed_fput_lock, flags);
list_add(&file->f_u.fu_list, &delayed_fput_list);


> }
> - init_task_work(&file->f_u.fu_rcuhead, ____fput);
> - task_work_add(task, &file->f_u.fu_rcuhead, true);
> + spin_lock_irqsave(&delayed_fput_lock, flags);
> + list_add(&file->f_u.fu_list, &delayed_fput_list);
> + schedule_work(&delayed_fput_work);
> + spin_unlock_irqrestore(&delayed_fput_lock, flags);

OT: I don't think that schedule_work() needs to be inside the locked
region. Scalability improvements beckon!

> }
> }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/