On Wed, Jul 03, 2024 at 10:23:35AM GMT, Huan Yang wrote:Yes, The PMC does have the idea of priority control.
在 2024/7/3 3:27, Roman Gushchin 写道:[...]
Please correct me if I am wrong, IIUC you have applcations withHello Huan,thanks
thank you for sharing your work.
Some high-level thoughts:Haha, sorry for my pool english
1) Naming is hard, but it took me quite a while to realize that you're talking
about free memory. Cache is obviously an overloaded term, but per-memcg-cacheCurrently, my idea is that all memory released by processes under memcg will
can mean absolutely anything (pagecache? cpu cache? ...), so maybe it's not
go into the `cache`,
and the original attributes will be ignored, and can be freely requested by
processes under memcg.
(so, dma-buf\page cache\heap\driver, so on). Maybe named PMP more friendly?
:)
the best choice.I like it too :)
2) Overall an idea to have a per-memcg free memory pool makes sense to me,
especially if we talk 2MB or 1GB pages (or order > 0 in general).
3) You absolutely have to integrate the reclaim mechanism with a genericYes, I all think about it.
memory reclaim mechanism, which is driven by the memory pressure.
4) You claim a ~50% performance win in your workload, which is a lot. It's notLet me describe it more specifically. In our test scenario, we have 8GB of
clear to me where it's coming from. It's hard to believe the page allocation/release
paths are taking 50% of the cpu time. Please, clarify.
RAM, and our camera application
has a complex set of algorithms, with a peak memory requirement of up to
3GB.
Therefore, in a multi-application background scenario, starting the camera
and taking photos will create a
very high memory pressure. In this scenario, any released memory will be
quickly used by other processes (such as file pages).
So, during the process of switching from camera capture to preview, DMA-BUF
memory will be released,
while the memory used for the preview algorithm will be simultaneously
requested.
We need to take a lot of slow path routes to obtain enough memory for the
preview algorithm, and it seems that the
just released DMA-BUF memory does not provide much help.
But using PMC (let's call it that for now), we are able to quickly meet the
memory needs of the subsequent preview process
with the just released DMA-BUF memory, without having to go through the slow
path, resulting in a significant performance improvement.
(of course, break migrate type may not good.)
different latency or performance requirements, running on the same
system but the system is memory constraint. You want applications with
stringent performance requirement to go less in the allocation slowpath
and want the lower priority (or no perf requirement) applications to do
more slowpath work (reclaim/compaction) for themselves as well as for
the high priority applications.
What about the allocations from the softirqs or non-memcg-aware kernel
allocations?
An alternative approach would be something similar to the watermark
based approach. Low priority applications (or kswapds) doing
reclaim/compaction at a higher newly defined watermark and the higher
priority applications are protected through the usual memcg protection.
Yes, LMKD is helpfull.
I can see another use-case for whatever the solution we comeup with and
that is userspace reliable oom-killer.
Shakeel