Re: [PATCH 1/5] cpuset memory spread basic implementation

From: Andi Kleen
Date: Tue Feb 07 2006 - 04:40:05 EST


On Tuesday 07 February 2006 01:19, Ingo Molnar wrote:
>
> * Paul Jackson <pj@xxxxxxx> wrote:
>
> > First it might be most useful to explain a detail of your proposal
> > that I don't get, which is blocking me from considering it seriously.
> >
> > I understand mount options, but I don't know what mechanisms (at the
> > kernel-user API) you have in mind to manage per-directory and per-file
> > options.
>
> well, i thought of nothing overly complex: it would have to be a
> persistent flag attached to the physical inode. Lets assume XFS added
> this - e.g. as an extended attribute.

There used to be a patch floating around to do policy for file caches
(or rather arbitary address spaces)
It used special ELF headers to set the policy. I thought about these policy EAs
long ago. The main reason I never liked them much is that on some EA
implementations you have to fetch an separate block to get at the EA.
And this policy EA would need to be read all the time, thus adding lots
of additional seeks. That didn't seem worth it.


> - default: the vast majority of inodes would have no flag set
>
> - some would have a 'cache:local' flag
>
> - some would have a 'cache:global' flag

If you do policy you could as well do the full policy states from
mempolicy.c. Both cache:local and cache:global can be expressed in it.

> which would result in every inode getting flagged as either 'local' or
> 'global'. When the pagecache (and inode/dentry cache) gets populated,
> the kernel will always know what the current allocation strategy is for
> any given object:

In practice it will probably only set for a small minority of objects
if at all. I could imagine admining this policy could be a PITA too.

> workloads may share the same object and may want to use it in different
> ways. E.g. there's one big central database file, and one job uses it in
> a 'local' way another one uses it in a 'global' way. Each job would
> have to set the attribute to the right value. Setting the flag for the
> inode results in all existing pages for that inode to be flushed. The
> jobs need to serialize their access to the object, as the kernel can
> only allocate according to one policy.

I think we are much better off with some sensible defaults for file cache

- global or "nearby" for read/write
- global for inode/dcache
- local for mmap file data

I bet that will cover most cases quite nicely.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/