Re: [PATCH v5 0/7] psi: pressure stall monitors v5
From: Suren Baghdasaryan
Date: Tue Mar 19 2019 - 20:04:11 EST
On Tue, Mar 19, 2019 at 3:51 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
>
> On Fri, Mar 08, 2019 at 10:43:04AM -0800, Suren Baghdasaryan wrote:
> > This is respin of:
> > https://lwn.net/ml/linux-kernel/20190206023446.177362-1-surenb%40google.com/
> >
> > Android is adopting psi to detect and remedy memory pressure that
> > results in stuttering and decreased responsiveness on mobile devices.
> >
> > Psi gives us the stall information, but because we're dealing with
> > latencies in the millisecond range, periodically reading the pressure
> > files to detect stalls in a timely fashion is not feasible. Psi also
> > doesn't aggregate its averages at a high-enough frequency right now.
> >
> > This patch series extends the psi interface such that users can
> > configure sensitive latency thresholds and use poll() and friends to
> > be notified when these are breached.
> >
> > As high-frequency aggregation is costly, it implements an aggregation
> > method that is optimized for fast, short-interval averaging, and makes
> > the aggregation frequency adaptive, such that high-frequency updates
> > only happen while monitored stall events are actively occurring.
> >
> > With these patches applied, Android can monitor for, and ward off,
> > mounting memory shortages before they cause problems for the user.
> > For example, using memory stall monitors in userspace low memory
> > killer daemon (lmkd) we can detect mounting pressure and kill less
> > important processes before device becomes visibly sluggish. In our
> > memory stress testing psi memory monitors produce roughly 10x less
> > false positives compared to vmpressure signals. Having ability to
> > specify multiple triggers for the same psi metric allows other parts
> > of Android framework to monitor memory state of the device and act
> > accordingly.
> >
> > The new interface is straight-forward. The user opens one of the
> > pressure files for writing and writes a trigger description into the
> > file descriptor that defines the stall state - some or full, and the
> > maximum stall time over a given window of time. E.g.:
> >
> > /* Signal when stall time exceeds 100ms of a 1s window */
> > char trigger[] = "full 100000 1000000"
> > fd = open("/proc/pressure/memory")
> > write(fd, trigger, sizeof(trigger))
> > while (poll() >= 0) {
> > ...
> > };
> > close(fd);
> >
> > When the monitored stall state is entered, psi adapts its aggregation
> > frequency according to what the configured time window requires in
> > order to emit event signals in a timely fashion. Once the stalling
> > subsides, aggregation reverts back to normal.
> >
> > The trigger is associated with the open file descriptor. To stop
> > monitoring, the user only needs to close the file descriptor and the
> > trigger is discarded.
> >
> > Patches 1-6 prepare the psi code for polling support. Patch 7 implements
> > the adaptive polling logic, the pressure growth detection optimized for
> > short intervals, and hooks up write() and poll() on the pressure files.
> >
> > The patches were developed in collaboration with Johannes Weiner.
> >
> > The patches are based on 5.0-rc8 (Merge tag 'drm-next-2019-03-06').
> >
> > Suren Baghdasaryan (7):
> > psi: introduce state_mask to represent stalled psi states
> > psi: make psi_enable static
> > psi: rename psi fields in preparation for psi trigger addition
> > psi: split update_stats into parts
> > psi: track changed states
> > refactor header includes to allow kthread.h inclusion in psi_types.h
> > psi: introduce psi monitor
> >
> > Documentation/accounting/psi.txt | 107 ++++++
> > include/linux/kthread.h | 3 +-
> > include/linux/psi.h | 8 +
> > include/linux/psi_types.h | 105 +++++-
> > include/linux/sched.h | 1 -
> > kernel/cgroup/cgroup.c | 71 +++-
> > kernel/kthread.c | 1 +
> > kernel/sched/psi.c | 613 ++++++++++++++++++++++++++++---
> > 8 files changed, 833 insertions(+), 76 deletions(-)
> >
> > Changes in v5:
> > - Fixed sparse: error: incompatible types in comparison expression, as per
> > Andrew
> > - Changed psi_enable to static, as per Andrew
> > - Refactored headers to be able to include kthread.h into psi_types.h
> > without creating a circular inclusion, as per Johannes
> > - Split psi monitor from aggregator, used RT worker for psi monitoring to
> > prevent it being starved by other RT threads and memory pressure events
> > being delayed or lost, as per Minchan and Android Performance Team
> > - Fixed blockable memory allocation under rcu_read_lock inside
> > psi_trigger_poll by using refcounting, as per Eva Huang and Minchan
> > - Misc cleanup and improvements, as per Johannes
> >
> > Notes:
> > 0001-psi-introduce-state_mask-to-represent-stalled-psi-st.patch is unchanged
> > from the previous version and provided for completeness.
>
> Please fix kbuild test bot's warning in 6/7
> Other than that, for all patches,
Thanks for the review!
Pushed v6 with the fix for the warning: https://lkml.org/lkml/2019/3/19/987
Also fixed a bug introduced in https://lkml.org/lkml/2019/3/8/686
which I discovered while testing (description in the changelog of the
new patchset).
>
> Acked-by: Minchan Kim <minchan@xxxxxxxxxx>