Re: [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

From: Mikulas Patocka
Date: Mon Jul 25 2016 - 17:52:28 EST




On Sat, 23 Jul 2016, NeilBrown wrote:

> "dirtying ... from the reclaim context" ??? What does that mean?
> According to
> Commit: 26eecbf3543b ("[PATCH] vm: pageout throttling")
> From the history tree, the purpose of throttle_vm_writeout() is to
> limit the amount of memory that is concurrently under I/O.
> That seems strange to me because I thought it was the responsibility of
> each backing device to impose a limit - a maximum queue size of some
> sort.

Device mapper doesn't impose any limit for in-flight bios.

Some simple device mapper targets (such as linear or stripe) pass bio
directly to the underlying device with generic_make_request, so if the
underlying device's request limit is reached, the target's request routine
waits.

However, complex dm targets (such as dm-crypt, dm-mirror, dm-thin) pass
bios to a workqueue that processes them. And since there is no limit on
the number of workqueue entries, there is no limit on the number of
in-flight bios.

I've seen a case when I had a HPFS filesystem on dm-crypt. I wrote to the
filesystem, there was about 2GB dirty data. The HPFS filesystem used
512-byte bios. dm-crypt allocates one temporary page for each incoming
bio. So, there were 4M bios in flight, each bio allocated 4k temporary
page - that is attempted 16GB allocation. It didn't trigger OOM condition
(because mempool allocations don't ever trigger it), but it temporarily
exhausted all computer's memory.

I've made some patches that limit in-flight bios for device mapper in the
past, but there were not integrated into upstream.

> If a thread is only making transient allocations, ones which will be
> freed shortly afterwards (not, for example, put in a cache), then I
> don't think it needs to be throttled at all. I think this universally
> applies to mempools.
> In the case of dm_crypt, if it is writing too fast it will eventually be
> throttled in generic_make_request when the underlying device has a full
> queue and so blocks waiting for requests to be completed, and thus parts
> of them returned to the mempool.

No, it won't be throttled.

dm-crypt does:
1. pass the bio to the encryption workqueue
2. allocate the outgoing bio and allocate temporary pages for the
encrypted data
3. do the encryption
4. pass the bio to the writer thread
5. submit the write request with generic_make_request

So, if the underlying block device is throttled, it stalls the writer
thread, but it doesn't stall the encryption threads and it doesn't stall
the caller that submits the bios to dm-crypt.

There can be really high number of in-flight bios for dm-crypt.

Mikulas