Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

From: Matthias Dahl
Date: Wed Jul 13 2016 - 09:18:54 EST


Hello Michal,

many thanks for all your time and help on this issue. It is very much
appreciated and I hope we can track this down somehow.

On 2016-07-13 14:18, Michal Hocko wrote:

So it seems we are accumulating bios and 256B objects. Buffer heads as
well but so much. Having over 4G worth of bios sounds really suspicious.
Note that they pin pages to be written so this might be consuming the
rest of the unaccounted memory! So the main question is why those bios
do not get dispatched or finished.

Ok. It is the Block IOs that do not get completed. I do get it right
that those bio-3 are already the encrypted data that should be written
out but do not for some reason? I tried to figure this out myself but
couldn't find anything -- what does the number "-3" state? It is the
position in some chain or has it a different meaning?

Do you think a trace like you mentioned would help shed some more light
on this? Or would you recommend something else?

I have also cc' Mike Snitzer who commented on this issue before, maybe
he can see some pattern here as well. Pity that Neil Brown is no longer
available as I think this is also somehow related to it being a Intel
Rapid Storage RAID10... since it is the only way I can reproduce it. :(

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
services: custom software [desktop, mobile, web], server administration