Re: [PATCH 4/5] mm/memcontrol: allow to uncharge page without using page->lru field

From: Jerome Glisse
Date: Mon Jul 10 2017 - 11:32:38 EST

On Mon, Jul 10, 2017 at 10:28:06AM +0200, Michal Hocko wrote:
> On Wed 05-07-17 10:35:29, Jerome Glisse wrote:
> > On Tue, Jul 04, 2017 at 02:51:13PM +0200, Michal Hocko wrote:
> > > On Mon 03-07-17 17:14:14, Jérôme Glisse wrote:
> > > > HMM pages (private or public device pages) are ZONE_DEVICE page and
> > > > thus you can not use page->lru fields of those pages. This patch
> > > > re-arrange the uncharge to allow single page to be uncharge without
> > > > modifying the lru field of the struct page.
> > > >
> > > > There is no change to memcontrol logic, it is the same as it was
> > > > before this patch.
> > >
> > > What is the memcg semantic of the memory? Why is it even charged? AFAIR
> > > this is not a reclaimable memory. If yes how are we going to deal with
> > > memory limits? What should happen if go OOM? Does killing an process
> > > actually help to release that memory? Isn't it pinned by a device?
> > >
> > > For the patch itself. It is quite ugly but I haven't spotted anything
> > > obviously wrong with it. It is the memcg semantic with this class of
> > > memory which makes me worried.
> >
> > So i am facing 3 choices. First one not account device memory at all.
> > Second one is account device memory like any other memory inside a
> > process. Third one is account device memory as something entirely new.
> >
> > I pick the second one for two reasons. First because when migrating
> > back from device memory it means that migration can not fail because
> > of memory cgroup limit, this simplify an already complex migration
> > code. Second because i assume that device memory usage is a transient
> > state ie once device is done with its computation the most likely
> > outcome is memory is migrated back. From this assumption it means
> > that you do not want to allow a process to overuse regular memory
> > while it is using un-accounted device memory. It sounds safer to
> > account device memory and to keep the process within its memcg
> > boundary.
> >
> > Admittedly here i am making an assumption and i can be wrong. Thing
> > is we do not have enough real data of how this will be use and how
> > much of an impact device memory will have. That is why for now i
> > would rather restrict myself to either not account it or account it
> > as usual.
> >
> > If you prefer not accounting it until we have more experience on how
> > it is use and how it impacts memory resource management i am fine with
> > that too. It will make the migration code slightly more complex.
> I can see why you want to do this but the semantic _has_ to be clear.
> And as such make sure that the exiting task will simply unpin and
> invalidate all the device memory (assuming this memory is not shared
> which I am not sure is even possible).

So there is 2 differents path out of device memory:
- munmap/process exiting: memory will get uncharge from its memory
cgroup just like regular memory
- migration to non device memory, the memory cgroup charge get
transfer to the new page just like for any other page

Do you want me to document all this in any specific place ? I will
add a comment in memory_control.c and in HMM documentations for this
but should i add it anywhere else ?

Note that the device memory is not pin. The whole point of HMM is to
do away with any pining. Thought as device page are not on lru they
are not reclaim like any other page. However we expect that device
driver might implement something akin to device memory reclaim to
make room for more important data base on statistic collected by the
device driver. If there is enough commonality accross devices then
we might implement a more generic mechanisms but at this point i
rather grow as we learn.