Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed)

From: Richard Sharpe
Date: Thu Aug 13 2009 - 18:20:50 EST


On Thu, Aug 13, 2009 at 2:28 PM, Greg Freemyer<greg.freemyer@xxxxxxxxx> wrote:
> On Thu, Aug 13, 2009 at 4:44 PM, <david@xxxxxxx> wrote:
>> On Thu, 13 Aug 2009, Greg Freemyer wrote:
>>
>>> On Thu, Aug 13, 2009 at 12:33 PM, <david@xxxxxxx> wrote:
>>>>
>>>> On Thu, 13 Aug 2009, Markus Trippelsdorf wrote:
>>>>
>>>>> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote:
>>>>>>
>>>>>> I am planning a complete overhaul of the discard work.  Users can send
>>>>>> down discard requests as frequently as they like.  The block layer will
>>>>>> cache them, and invalidate them if writes come through.  Periodically,
>>>>>> the block layer will send down a TRIM or an UNMAP (depending on the
>>>>>> underlying device) and get rid of the blocks that have remained
>>>>>> unwanted
>>>>>> in the interim.
>>>>>
>>>>> That is a very good idea. I've tested your original TRIM implementation
>>>>> on
>>>>> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of
>>>>> milliseconds to digest a single TRIM command. And since your
>>>>> implementation
>>>>> sends a TRIM for each extent of each deleted file, the whole system is
>>>>> unusable after a short while.
>>>>> An optimal solution would be to consolidate the discard requests, bundle
>>>>> them and send them to the drive as infrequent as possible.
>>>>
>>>> or queue them up and send them when the drive is idle (you would need to
>>>> keep track to make sure the space isn't re-used)
>>>>
>>>> as an example, if you would consider spinning down a drive you don't hurt
>>>> performance by sending accumulated trim commands.
>>>>
>>>> David Lang
>>>
>>> An alternate approach is the block layer maintain its own bitmap of
>>> used unused sectors / blocks. Unmap commands from the filesystem just
>>> cause the bitmap to be updated.  No other effect.
>>
>> how does the block layer know what blocks are unused by the filesystem?
>>
>> or would it be a case of the filesystem generating discard/trim requests to
>> the block layer so that it can maintain it's bitmap, and then the block
>> layer generating the requests to the drive below it?
>>
>> David Lang
>
> Yes, my thought.was that block layer would consume the discard/trim
> requests from the filesystem in realtime to maintain the bitmap, then
> at some later point in time when the system has extra resources it
> would generate the calls down to the lower layers and eventually the
> drive.

Why should the block layer be forced to maintain something that is
probably of use for only a limited number of cases? For example, the
devices I work on already maintain their own mapping of HOST-visible
LBAs to underlying storage, and I suspect that most such devices do.
So, you are duplicating something that we already do, and there is no
way that I am aware of to synchronise the two.

All we really need, I believe is for the UNMAP requests to come down
to us with writes barriered until we respond, and it is a relatively
cheap operation, although writes that are already in the cache and
uncommitted to disk present some issues if an UNMAP request comes down
for recently written blocks.

> I highlight the lower layers because mdraid is also going to have to
> be in the mix if raid5/6 is in use.  ie. At a minimum it will have to
> adjust the block range to align with the stripe boundaries.
>
> Greg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



--
Regards,
Richard Sharpe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/