Re: swap on eMMC and other flash

From: Minchan Kim
Date: Mon Apr 16 2012 - 22:05:05 EST

Next message: huang ying: "Re: [RFC PATCH] PCIe: Add PCIe runtime D3cold support"
Previous message: Jonathan Nieder: "Re: [regression] Ideapad S10-3 does not wake up from suspend"
In reply to: Minchan Kim: "Re: swap on eMMC and other flash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Arnd,

On 04/17/2012 03:59 AM, Arnd Bergmann wrote:

On Monday 16 April 2012, Stephan Uphoff wrote:
opportunity to plant a few ideas.

In contrast to rotational disks read/write operation overhead and
costs are not symmetric.
While random reads are much faster on flash - the number of write
operations is limited by wearout and garbage collection overhead.
To further improve swapping on eMMC or similar flash media I believe
that the following issues need to be addressed:

1) Limit average write bandwidth to eMMC to a configurable level to
guarantee a minimum device lifetime
2) Aim for a low write amplification factor to maximize useable write bandwidth
3) Strongly favor read over write operations

Lowering write amplification (2) has been discussed in this email
thread - and the only observation I would like to add is that
over-provisioning the internal swap space compared to the exported
swap space significantly can guarantee a lower write amplification
factor with the indirection and GC techniques discussed.

Yes, good point.

I believe the swap functionality is currently optimized for storage
media where read and write costs are nearly identical.
As this is not the case on flash I propose splitting the anonymous
inactive queue (at least conceptually) - keeping clean anonymous pages
with swap slots on a separate queue as the cost of swapping them
out/in is only an inexpensive read operation. A variable similar to
swapiness (or a more dynamic algorithmn) could determine the
preference for swapping out clean pages or dirty pages. ( A similar
argument could be made for splitting up the file inactive queue )

I'm not sure I understand yet how this would be different from swappiness.

The problem of limiting the average write bandwidth reminds me of
enforcing cpu utilization limits on interactive workloads.
Just as with cpu workloads - using the resources to the limit produces
poor interactivity.
When interactivity suffers too much I believe the only sane response
for an interactive device is to limit usage of the swap device and
transition into a low memory situation - and if needed - either
allowing userspace to reduce memory usage or invoking the OOM killer.
As a result low memory situations could not only be encountered on new
memory allocations but also on workload changes that increase the
number of dirty pages.

While swap is just a special case for anonymous memory in writeback
rather than file backed pages, I think what you want here is a tuning
knob that decides whether we should discard a clean page or write back
a dirty page under memory pressure. I have to say that I don't know
whether we already have such a knob or whether we already treat them
differently, but it is certainly a valid observation that on hard
drives, discarding a clean page that is likely going to be needed
again has about the same overhead as writing back a dirty page
(i.e. one seek operation), while on flash the former would be much
cheaper than the latter.

It seems to make sense with considering asymmetric of flash and there is a CFLRU(Clean First LRU)[1] paper about it. You might already know it. Anyway if you don't aware of it, I hope it helps you.

[1] http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fstaff.ustc.edu.cn%2F~jpq%2Fpaper%2Fflash%2F2006-CASES-CFLRU-%2520A%2520Replacement%2520Algorithm%2520for%2520Flash%2520Memory.pdf&ei=G8-MT5jGIqnj0gGMzJyCCg&usg=AFQjCNHybc5rUvuAlMylOUNwsHoFmWegzw&sig2=Uu5LDD3suso0QHsfD7yZ9Q

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: huang ying: "Re: [RFC PATCH] PCIe: Add PCIe runtime D3cold support"
Previous message: Jonathan Nieder: "Re: [regression] Ideapad S10-3 does not wake up from suspend"
In reply to: Minchan Kim: "Re: swap on eMMC and other flash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]