Re: [HELP] How to get task_struct from mm

From: Michal Hocko
Date: Fri May 31 2019 - 10:00:05 EST


On Fri 31-05-19 20:51:05, Yang Shi wrote:
>
>
> On 5/30/19 11:41 PM, Michal Hocko wrote:
> > On Thu 30-05-19 14:57:46, Yang Shi wrote:
> > > Hi folks,
> > >
> > >
> > > As what we discussed about page demotion for PMEM at LSF/MM, the demotion
> > > should respect to the mempolicy and allowed mems of the process which the
> > > page (anonymous page only for now) belongs to.
> > cpusets memory mask (aka mems_allowed) is indeed tricky and somehow
> > awkward. It is inherently an address space property and I never
> > understood why we have it per _thread_. This just doesn't make any
> > sense to me. This just leads to weird corner cases. What should happen
> > if different threads disagree about the allocation affinity while
> > working on a shared address space?
>
> I'm supposed (just my guess) such restriction should just apply for the
> first allocation. Just like memcg charge, who does it first, whose policy
> gets applied.

I am not really sure that was the deliberate design choice. Maybe
somebody has a different recollection though.

> > > The vma that the page is mapped to can be retrieved from rmap walk easily,
> > > but we need know the task_struct that the vma belongs to. It looks there is
> > > not such API, and container_of seems not work with pointer member.
> > I do not think this is a good idea. As you point out in the reply we
> > have that for memcgs but we really hope to get rid of mm->owner there
> > as well. It is just more tricky there. Moreover such a reverse mapping
> > would be incorrect. Just think of a disagreeing yet overlapping cpusets
> > for different threads mapping the same page.
> >
> > Is it such a big deal to document that the node migrate is not
> > compatible with cpusets?
>
> Not only cpuset, but get_vma_policy() also needs find task_struct from vma.
> Currently, get_vma_policy() just uses "current", so it just returns the
> current process's mempolicy if the vma doesn't have mempolicy. For the node
> migrate case, "current" is definitely not correct.
>
> It looks there is not an easy way to workaround it unless we claim node
> migrate is not compatible with both cpusets and mempolicy.

yep, it seems so.
--
Michal Hocko
SUSE Labs