[RFC PATCH 00/11] mm / virtio: Provide support for paravirtual waste page treatment
From: Alexander Duyck
Date: Thu May 30 2019 - 17:57:19 EST
This series provides an asynchronous means of hinting to a hypervisor
that a guest page is no longer in use and can have the data associated
with it dropped. To do this I have implemented functionality that allows
for what I am referring to as "waste page treatment".
I have based many of the terms and functionality off of waste water
treatment, the idea for the similarity occured to me after I had reached
the point of referring to the hints as "bubbles", as the hints used the
same approach as the balloon functionality but would disappear if they
were touched, as a result I started to think of the virtio device as an
aerator. The general idea with all of this is that the guest should be
treating the unused pages so that when they end up heading "downstream"
to either another guest, or back at the host they will not need to be
written to swap.
So for a bit of background for the treatment process, it is based on a
sequencing batch reactor (SBR)[1]. The treatment process itself has five
stages. The first stage is the fill, with this we take the raw pages and
add them to the reactor. The second stage is react, in this stage we hand
the pages off to the Virtio Balloon driver to have hints attached to them
and for those hints to be sent to the hypervisor. The third stage is
settle, in this stage we are waiting for the hypervisor to process the
pages, and we should receive an interrupt when it is completed. The fourth
stage is to decant, or drain the reactor of pages. Finally we have the
idle stage which we will go into if the reference count for the reactor
gets down to 0 after a drain, or if a fill operation fails to obtain any
pages and the reference count has hit 0. Otherwise we return to the first
state and start the cycle over again.
This patch set is still far more intrusive then I would really like for
what it has to do. Currently I am splitting the nr_free_pages into two
values and having to add a pointer and an index to track where we area in
the treatment process for a given free_area. I'm also not sure I have
covered all possible corner cases where pages can get into the free_area
or move from one migratetype to another.
Also I am still leaving a number of things hard-coded such as limiting the
lowest order processed to PAGEBLOCK_ORDER, and have left it up to the
guest to determine what size of reactor it wants to allocate to process
the hints.
Another consideration I am still debating is if I really want to process
the aerator_cycle() function in interrupt context or if I should have it
running in a thread somewhere else.
[1]: https://en.wikipedia.org/wiki/Sequencing_batch_reactor
---
Alexander Duyck (11):
mm: Move MAX_ORDER definition closer to pageblock_order
mm: Adjust shuffle code to allow for future coalescing
mm: Add support for Treated Buddy pages
mm: Split nr_free into nr_free_raw and nr_free_treated
mm: Propogate Treated bit when splitting
mm: Add membrane to free area to use as divider between treated and raw pages
mm: Add support for acquiring first free "raw" or "untreated" page in zone
mm: Add support for creating memory aeration
mm: Count isolated pages as "treated"
virtio-balloon: Add support for aerating memory via bubble hinting
mm: Add free page notification hook
arch/x86/include/asm/page.h | 11 +
drivers/virtio/Kconfig | 1
drivers/virtio/virtio_balloon.c | 89 ++++++++++
include/linux/gfp.h | 10 +
include/linux/memory_aeration.h | 54 ++++++
include/linux/mmzone.h | 100 +++++++++--
include/linux/page-flags.h | 32 +++
include/linux/pageblock-flags.h | 8 +
include/uapi/linux/virtio_balloon.h | 1
mm/Kconfig | 5 +
mm/Makefile | 1
mm/aeration.c | 324 +++++++++++++++++++++++++++++++++++
mm/compaction.c | 4
mm/page_alloc.c | 220 ++++++++++++++++++++----
mm/shuffle.c | 24 ---
mm/shuffle.h | 35 ++++
mm/vmstat.c | 5 -
17 files changed, 838 insertions(+), 86 deletions(-)
create mode 100644 include/linux/memory_aeration.h
create mode 100644 mm/aeration.c
--