Re: [rfc] direct IO submission and completion scalability issues
From: Zach Brown
Date: Mon Feb 04 2008 - 13:21:30 EST
[ ugh, still jet lagged. ]
> Hi Nick,
>
> When Matthew was describing this work at an LCA presentation (not
> sure whether you were at that presentation or not), Zach came up
> with the idea that allowing the submitting application control the
> CPU that the io completion processing was occurring would be a good
> approach to try. That is, we submit a "completion cookie" with the
> bio that indicates where we want completion to run, rather than
> dictating that completion runs on the submission CPU.
>
> The reasoning is that only the higher level context really knows
> what is optimal, and that changes from application to application.
> The "complete on the submission CPU" policy _may_ be more optimal
> for database workloads, but it is definitely suboptimal for XFS and
> transaction I/O completion handling because it simply drags a bunch
> of global filesystem state around between all the CPUs running
> completions. In that case, we really only want a single CPU to be
> handling the completions.....
>
> (Zach - please correct me if I've missed anything)
Yeah, I think Nick's patch (and Jens' approach, presumably) is just the
sort of thing we were hoping for when discussing this during Matthew's talk.
I was imagining the patch a little bit differently (per-cpu tasks, do a
wake_up from the driver instead of cpu nr testing up in blk, work
queues, whatever), but we know how to iron out these kinds of details ;).
> Looking at your patch - if you turn it around so that the
> "submission CPU" field can be specified as the "completion cpu" then
> I think the patch will expose the policy knobs needed to do the
> above.
Yeah, that seems pretty straight forward.
We might need some logic for noticing that the desired cpu has been
hot-plugged away while the IO was in flight, it occurs to me.
- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/