Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
From: Michal Hocko
Date: Wed Jul 13 2016 - 09:48:33 EST
On Wed 13-07-16 15:18:11, Matthias Dahl wrote:
> Hello Michal,
>
> many thanks for all your time and help on this issue. It is very much
> appreciated and I hope we can track this down somehow.
>
> On 2016-07-13 14:18, Michal Hocko wrote:
>
> > So it seems we are accumulating bios and 256B objects. Buffer heads as
> > well but so much. Having over 4G worth of bios sounds really suspicious.
> > Note that they pin pages to be written so this might be consuming the
> > rest of the unaccounted memory! So the main question is why those bios
> > do not get dispatched or finished.
>
> Ok. It is the Block IOs that do not get completed. I do get it right
> that those bio-3 are already the encrypted data that should be written
> out but do not for some reason?
Hard to tell. Maybe they are just allocated and waiting for encryption.
But this is just a wild guessing.
> I tried to figure this out myself but
> couldn't find anything -- what does the number "-3" state? It is the
> position in some chain or has it a different meaning?
$ git grep "kmem_cache_create.*bio"
block/bio-integrity.c: bip_slab = kmem_cache_create("bio_integrity_payload",
so there doesn't seem to be any cache like that in the vanilla kernel.
> Do you think a trace like you mentioned would help shed some more light
> on this? Or would you recommend something else?
Dunno. Seeing who is allocating those bios might be helpful but it won't
tell much about what has happened to them after allocation. The tracing
would be more helpful for a mem leak situation which doesn't seem to be
the case here.
This is getting out of my area of expertise so I am not sure I can help
you much more, I am afraid.
--
Michal Hocko
SUSE Labs