Re: block: DMA alignment of IO buffer allocated from slab
From: Jens Axboe
Date: Tue Sep 25 2018 - 11:45:02 EST
On 9/25/18 1:49 AM, Dave Chinner wrote:
> On Mon, Sep 24, 2018 at 12:09:37PM -0600, Jens Axboe wrote:
>> On 9/24/18 12:00 PM, Christopher Lameter wrote:
>>> On Mon, 24 Sep 2018, Jens Axboe wrote:
>>>
>>>> The situation is making me a little uncomfortable, though. If we export
>>>> such a setting, we really should be honoring it...
>
> That's what I said up front, but you replied to this with:
>
> | I think this is all crazy talk. We've never done this, [...]
>
> Now I'm not sure what you are saying we should do....
>
>>> Various subsystems create custom slab arrays with their particular
>>> alignment requirement for these allocations.
>>
>> Oh yeah, I think the solution is basic enough for XFS, for instance.
>> They just have to error on the side of being cautious, by going full
>> sector alignment for memory...
>
> How does the filesystem find out about hardware alignment
> requirements? Isn't probing through the block device to find out
> about the request queue configurations considered a layering
> violation?
Right now it isn't a stacked property, so answering the question
isn't even possible beyond "what does the top device require".
> What if sector alignment is not sufficient? And how would this work
> if we start supporting sector sizes larger than page size? (which the
> XFS buffer cache supports just fine, even if nothing else in
> Linux does).
If sector alignment isn't sufficient, then we'd need to bounce 512b
formats... But I don't want to over-design something that isn't
relevant to real life setups. I'm not aware of anything that needs
memory aligned to that degree.
> But even ignoring sector size > page size, implementing this
> requires a bunch of new slab caches, especially for 64k page
> machines because XFS supports sector sizes up to 32k. And every
> other filesystem that uses sector sized buffers (e.g. HFS) would
> have to do the same thing. Seems somewhat wasteful to require
> everyone to implement their own aligned sector slab cache...
>
> Perhaps we should take the filesystem out of this completely - maybe
> the block layer could provide a generic "sector heap" and have all
> filesystems that use sector sized buffers allocate from it. e.g.
> something like
>
> mem = bdev_alloc_sector_buffer(bdev, sector_size)
>
> That way we don't have to rely on filesystems knowing anything about
> the alignment limitations of the devices or assumptions about DMA
> to work correctly...
I like that idea, would probably also need a mempool backing for
certain cases.
--
Jens Axboe