Re: Crash kernel with 256 MB reserved memory runs into OOM condition

From: Dave Young
Date: Mon Aug 12 2019 - 22:54:56 EST


On 08/13/19 at 10:46am, Dave Young wrote:
> Add more cc.
> On 08/13/19 at 10:43am, Dave Young wrote:
> > Hi,
> >
> > On 08/12/19 at 11:50am, Michal Hocko wrote:
> > > On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> > > > Dear Linux folks,
> > > >
> > > >
> > > > On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> > > > 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> > > >
> > > > Please find the messages of the normal and the crash kernel attached.
> > >
> > > You will need more memory to reserve for the crash kernel because ...
> > >
> > > > [ 4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > > [ 4.573612] lowmem_reserve[]: 0 125 125 125
> > > > [ 4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:15720kB writepending:0kB present:261560kB managed:133752kB mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB local_pcp:212kB free_cma:0kB
> > >
> > > ... the memory is really depleted and nothing to be reclaimed (no anon.
> > > file pages) Look how tht free memory is below min watermark (node zone DMA has
> > > lowmem protection for GFP_KERNEL allocation).
> >
> > We found similar issue on our side while working on kdump on SME enabled
> > systemd. Kairui is working on some patches.
> >
> > Actually on those SME/SEV enabled machines, swiotlb is enabled
> > automatically so at least we need extra 64M+ memory for kdump other
> > than the normal expectation.
> >
> > Can you check if this is also your case?
>
> The question is to Paul, also it would be always good to cc kexec mail
> list for kexec and kdump issues.

Looks like hardware iommu is used, maybe you do not enable SME?

Also replace maxcpus=1 with nr_cpus=1 can save some memory, can have a
try.