Re: [PATCH 5/9] writeback: support > 1 flusher thread per bdi

From: Jan Kara
Date: Thu Aug 06 2009 - 16:56:36 EST


On Thu 06-08-09 09:05:46, Jens Axboe wrote:
> On Wed, Aug 05 2009, Jan Kara wrote:
> > > +static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
> > > +{
> > > + if (work) {
> > > + work->seen = bdi->wb_mask;
> > > + BUG_ON(!work->seen);
> > > + atomic_set(&work->pending, bdi->wb_cnt);
> > I guess the idea here is that every writeback thread has to acknowledge
> > the work. But what if some thread decides to die after the work is queued
> > but before it manages to acknowledge it? We would end up waiting
> > indefinitely...
>
> The writeback thread checks for race added work on exit, so it should be
> fine.
Sorry if I'm a bit dense but I don't get it (hmm, probably I gave too few
details in my comment above). Assume there are at least two writeback
threads on bdi->wb_list:

CPU1 CPU2
bdi_writeback_task()
list_del_rcu(wb->list);
if (!list_empty(&bdi->work_list))
wb_do_writeback(wb, 1);
bdi_queue_work()
...
atomic_set(&work->pending, bdi->wb_cnt);
...
bdi->wb_list isn't empty
bdi_sched_work()
bdi_put_wb(bdi, wb); -> only now the bdi->wb_cnt is decreased...

Looking at the code more in detail, it actually gets fixed once the
forker task wakes up, notices there's some work to do and adds the default
flusher thread again but still it could have strange effects.

> Additionally, only the default thread will exit and that one will
> always have it's count and mask be valid (since we auto-fork it again,
> if needed).
Ah OK, I see.

> > > + BUG_ON(!bdi->wb_cnt);
> > > +
> > > + /*
> > > + * Make sure stores are seen before it appears on the list
> > > + */
> > > + smp_mb();
> > > +
> > > + spin_lock(&bdi->wb_lock);
> > > + list_add_tail_rcu(&work->list, &bdi->work_list);
> > > + spin_unlock(&bdi->wb_lock);
> > > + }
> > > +
> > > /*
> > > - * This only happens the first time someone kicks this bdi, so put
> > > - * it out-of-line.
> > > + * If the default thread isn't there, make sure we add it. When
> > > + * it gets created and wakes up, we'll run this work.
> > > */
> > > - if (unlikely(!bdi->wb.task))
> > > + if (unlikely(list_empty_careful(&bdi->wb_list)))
> > > wake_up_process(default_backing_dev_info.wb.task);
> > > + else
> > > + bdi_sched_work(bdi, work);
> > > +}
> > > +
> > > +/*
> > > + * Used for on-stack allocated work items. The caller needs to wait until
> > > + * the wb threads have acked the work before it's safe to continue.
> > > + */
> > > +static void bdi_wait_on_work_clear(struct bdi_work *work)
> > > +{
> > > + wait_on_bit(&work->state, 0, bdi_sched_wait, TASK_UNINTERRUPTIBLE);
> > > +}
> > I still feel the rules for releasing / cleaning up work are too
> > complicated.
> > 1) I believe we can bear one more "int" for flags in the struct bdi_work
> > so that you don't have to hide them in sb_data.
>
> Sure, but there's little reason to do that I think, since it's only used
> internally. Let me put it another way, why add an extra int if we can
> avoid it?
Actually, looking again that the work struct "state" field has lots of
free bits. I think the code looks nicer with the attached patch, what do
you think?

> > 2) I'd introduce a flag with the meaning: free the work when you are
> > done. Obviusly this flag makes sence only with dynamically allocated work
> > structure. There would be no "on stack" flag.
> > 3) I'd create a function:
> > bdi_wait_work_submitted()
> > which you'd have to call whenever you didn't set the flag and want to
> > free the work (either explicitely, or via returning from a function which
> > has the structure on stack).
> > It would do:
> > bdi_wait_on_work_clear(work);
> > call_rcu(&work->rcu_head, bdi_work_free);
> >
> > wb_work_complete() would just depending on the flag setting either
> > completely do away with the work struct or just do bdi_work_clear().
> >
> > IMO that would make the code easier to check and also less prone to
> > errors (currently you have to think twice when you have to wait for the rcu
> > period, call bdi_work_free, etc.).
>
> Didn't we go over all that last time, too?
Well, probably about something similar. But this time I have a patch ;-)
Compile tested only... IMO it looks nicer this way as it wraps up all the
details of work freeing into one function.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR