Re: [patch 3/4] mempolicy: add MPOL_F_STATIC_NODES flag

From: David Rientjes
Date: Wed Feb 13 2008 - 16:58:55 EST


On Wed, 13 Feb 2008, Paul Jackson wrote:

> Yes, if an application considers nodes to be interchangeable, I'm
> trying to avoid having that application -have- to know its current
> cpuset placement, for two reasons:
>
> For one thing, it's racey. It's cpuset placement could change,
> unbeknownst to it, between the time it queried it, and the time
> that it issued the mbind or set_mempolicy call.
>
> For the other thing, it's not always possible. If the application
> is currently in a cpuset that is smaller than it's preferred
> configuration, it would not be possible to express its preferred
> memory policies using just the smaller number of memory nodes
> allowed by its current cpuset placement. How do you say "put
> this on my third node" if you don't have a third node and you
> can only speak of the nodes you currently have?
>

So let's say, like my first example from the previous email, that you have
MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES over nodes 3-4 and your cpuset's
mems is only nodes 5-7. This would interleave over no nodes. Correct?

It seems like MPOL_F_RELATIVE_NODES is primarily designed to maintain a
certain order among the nodes it effects the mempolicy over. It comes
with the premise that the task doesn't already know it's cpuset mems
(otherwise, the current implementation without MPOL_F_STATIC_NODES would
work fine for this) so it doesn't really care what nodes it allocates
pages on, it just cares about the order.

This works for MPOL_PREFERRED and MPOL_BIND as well, right?

I don't understand the use case for this (at all), but if you have
workloads that require this type of setting then I can implement this as
part of my series. I just want to confirm that there are real world cases
backing this so that we don't have flags with highly highly specialized
cornercases.

[ If a user _does_ specify MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES
as part of their syscall, then we'll simply return -EINVAL. ]

> > Well, I didn't cave on anything
>
> ;) Your simple "ok" was ambiguous enough that we were able to
> read into it whatever we wanted to.
>
> But I've made my case on that issue (involving the separate or
> packed policy flag field). So I probably won't say more, and
> I expect to live with whatever you choose, after any further
> input from Lee or others.
>

Well, there's advantages and disadvantages to either approach.

My preference (both mode and flags stored in the same member of struct
mempolicy):

Advantages:

- completely consistent with the userspace API of passing modes
and flags together in a pointer to an int, and

- does not require additional formals to be added to several
functions, including functions outside mm/mempolicy.c.

Disadvantage:

- use of mpol_mode() throughout mm/mempolicy.c code to mask
off optional mode flags for conditionals or switch statements.

Your preference (separate mode and flags members in struct mempolicy):

Advantages:

- clearer implementation when dealing with modes: all existing
statements involving pol->policy can remain unchanged.

Disadvantages:

- requires additional formals to be added to several functions,
including functions outside mm/mempolicy.c, and

- takes additional space in struct mempolicy (two bytes) which
could eventually be used for something else.

In both cases the testing of mode flags is the same as before:

if (pol->policy & MPOL_F_STATIC_NODES) {
...
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/