Re: [PATCH v2 06/33] Documentation, x86: Documentation for Intel resource allocation user interface

From: Shaohua Li
Date: Fri Sep 09 2016 - 13:54:55 EST


On Fri, Sep 09, 2016 at 12:22:46AM -0700, Fenghua Yu wrote:
> On Thu, Sep 08, 2016 at 03:45:14PM -0700, Shaohua Li wrote:
> > On Thu, Sep 08, 2016 at 06:17:47PM -0700, Fenghua Yu wrote:
> > > On Thu, Sep 08, 2016 at 03:01:20PM -0700, Shaohua Li wrote:
> > > > On Thu, Sep 08, 2016 at 02:57:00AM -0700, Fenghua Yu wrote:
> > > > > From: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > > > >
> > > > > The documentation describes user interface of how to allocate resource
> > > > > in Intel RDT.
> > > > >
> > > > > Please note that the documentation covers generic user interface. Current
> > > > > patch set code only implemente CAT L3. CAT L2 code will be sent later.
> > > > >
> > > > > Signed-off-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > > > > Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> > > > > ---
> > > > > Documentation/x86/intel_rdt_ui.txt | 164 +++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 164 insertions(+)
> > > > > create mode 100644 Documentation/x86/intel_rdt_ui.txt
> > > > >
> > > > > diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
> > > > > new file mode 100644
> > > > > index 0000000..27de386
> > > > > --- /dev/null
> > > > > +++ b/Documentation/x86/intel_rdt_ui.txt
> > > > > @@ -0,0 +1,164 @@
> > > > > +User Interface for Resource Allocation in Intel Resource Director Technology
> > > > > +
> > > > > +Copyright (C) 2016 Intel Corporation
> > > > > +
> > > > > +Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > > > > +Tony Luck <tony.luck@xxxxxxxxx>
> > > > > +
> > > > > +This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
> > > > > +X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
> > > > > +
> > > > > +To use the feature mount the file system:
> > > > > +
> > > > > + # mount -t resctrl resctrl [-o cdp,verbose] /sys/fs/resctrl
> > > > > +
> > > > > +mount options are:
> > > > > +
> > > > > +"cdp": Enable code/data prioritization in L3 cache allocations.
> > > > > +
> > > > > +"verbose": Output more info in the "info" file under info directory
> > > > > + and in dmesg. This is mainly for debug.
> > > > > +
> > > > > +
> > > > > +Resource groups
> > > > > +---------------
> > > > > +Resource groups are represented as directories in the resctrl file
> > > > > +system. The default group is the root directory. Other groups may be
> > > > > +created as desired by the system administrator using the "mkdir(1)"
> > > > > +command, and removed using "rmdir(1)".
> > > > > +
> > > > > +There are three files associated with each group:
> > > > > +
> > > > > +"tasks": A list of tasks that belongs to this group. Tasks can be
> > > > > + added to a group by writing the task ID to the "tasks" file
> > > > > + (which will automatically remove them from the previous
> > > > > + group to which they belonged). New tasks created by fork(2)
> > > > > + and clone(2) are added to the same group as their parent.
> > > > > + If a pid is not in any sub partition, it is in root partition
> > > > > + (i.e. default partition).
> > > > Hi Fenghua,
> > > >
> > > > Will you add a 'procs' interface to allow move a process into a group? Using
> > > > the 'tasks' interface to move process is inconvenient and has race conditions
> > > > (eg, some new threads could be escaped).
> > >
> > > We don't plan to add a 'procs' interface for rdtgroup. We only use resctrl
> > > interface to allocate resources.
> > >
> > > Why the "tasks" is inconvenient? If sysadmin wants to allocte a portion of L3
> > > for a pid, the operation in resctl is to write the pid to a "tasks". While
> > > in 'procs', the operation is to write a partition to a pid. If considering
> > > convenience, they are same, right?
> > >
> > > A thread uses either default partition (in root dir) or a sub partition (in
> > > sub-directory). Sysadmin can control that. Kernel handles race condition.
> > > Any issue with that?
> >
> > I don't mean writing the 'tasks' file is inconvenient. So to move a process to
> > a group, we do:
> > 1. get all thread pid of the process
> > 2. write every pid to 'tasks'
> >
> > this is inconvenient. And if a new thread is created between 1 and 2, we don't
> > put the thread to the group. Am I missing anything?
>
> As said in this doc, "New tasks created by fork(2) and clone(2) are added
> to the same group as their parent.". So the new thread created b/w 1 and 2
> will automatically go to the "tasks" as the process. Later sysadming can
> still move any pid to any group.

So if we want to move a process from group1 to group2, we do:
1. find all threads pid of the process
2. write each thread pid to group2's 'tasks'

I don't think this is convenient, but it's ok. Now if we create a new thread
between 1 and 2, the new thread is in group1. The new thread pid isn't in the
pid list we found in 1, so after 2, the new thread still is in group 1. Truely
sysadmin can repeat the step 1 & 2 and move the new thread to group 2, but
there is always chance the process creates new thread between 1 and 2, and the
new thread remains in group 1. There is no guarantee we can safely move a
process from one group to another.

Thanks,
Shaohua