On Mon 14-09-20 09:57:02, Vijay Balakrishna wrote:
On 9/14/2020 7:33 AM, Michal Hocko wrote:
On Thu 10-09-20 13:47:39, Vijay Balakrishna wrote:
When memory is hotplug added or removed the min_free_kbytes must be
recalculated based on what is expected by khugepaged. Currently
after hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost. This leaves the system with small
min_free_kbytes which isn't suitable for systems especially with network
intensive loads. Typical failure symptoms include HW WATCHDOG reset,
soft lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process
kills.
Care to explain some more please? The whole point of increasing
min_free_kbytes for THP is to get a larger free memory with a hope that
huge pages will be more likely to appear. While this might help for
other users that need a high order pages it is definitely not the
primary reason behind it. Could you provide an example with some more
data?
Thanks Michal. I haven't looked into THP as part of my investigation, so I
cannot comment.
In our use case we are hotplug removing ~2GB of 8GB total (on our SoC)
during normal reboot/shutdown. This memory is hotplug hot-added as movable
type via systemd late service during start-of-day.
In our stress test first we ran into HW WATCHDOG recovery, on enabling
kernel watchdog we started seeing soft lockup hung task notices, failure
symptons varied, where stack trace of hung tasks sometimes trying to
allocate GFP_ATOMIC memory, looping in do_notify_resume, NETDEVICE WATCHDOG
timeouts, OOM process kills etc., During investigation we reran stress test
without hotplug use case. Surprisingly this run didn't encounter the said
problems. This led to comparing what is different between the two runs,
while looking at various globals, studying hotplug code I uncovered the
issue of failing to restore min_free_kbytes. In particular on our 8GB SoC
min_free_kbytes went down to 8703 from 22528 after hotplug add.
Did you try to increase min_free_kbytes manually after hot remove? Btw.
I would consider oom killer invocation due to min_free_kbytes really
weird behavior. If anything the higher value would cause more memory
reclaim and potentially oom rather than smaller one.