Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily

From: Miao Xie
Date: Tue Mar 09 2010 - 02:26:20 EST

Next message: Jim Meyering: "Re: ATA 4 KiB sector issues."
Previous message: Hitoshi Mitake: "Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursionprotection"
In reply to: David Rientjes: "Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily"
Next in thread: Nick Piggin: "Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

on 2010-3-9 5:46, David Rientjes wrote:
[snip]
>> Considering the change of task->mems_allowed is not frequent, so in this patch,
>> I use two variables as a tag to indicate whether task->mems_allowed need be
>> update or not. And before setting the tag, cpuset caches the new mask of every
>> task at its task_struct.
>>
>
> So what exactly is the benefit of 58568d2 from last June that caused this
> issue to begin with? It seems like this entire patchset is a revert of
> that commit. So why shouldn't we just revert that one commit and then add
> the locking and updating necessary for configs where
> MAX_NUMNODES > BITS_PER_LONG on top?

I worried about the consistency of task->mempolicy with task->mems_allowed for
configs where MAX_NUMNODES <= BITS_PER_LONG.

The problem that I worried is fowllowing:
When the kernel allocator allocates pages for tasks, it will access task->mempolicy
first and get the allowed node, then check whether that node is allowed by
task->mems_allowed.

But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same
time. the kernel allocator may access the inconsistent information of ->mempolicy
and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy,
but checks whether that node is allowed by new mems_allowed which does't intersect
old mempolicy.

So I made this patchset.

>> +/**
>> + * cpuset_update_task_mems_allowed - update task memory placement
>> + *
>> + * If the current task's mems_allowed_for_update and mempolicy_for_update are
>> + * changed by cpuset behind our backs, update current->mems_allowed,
>> + * mems_generation and task NUMA mempolicy to the new value.
>> + *
>> + * Call WITHOUT mems_lock held.
>> + *
>> + * This routine is needed to update the pre-task mems_allowed and mempolicy
>> + * within the tasks context, when it is trying to allocate memory.
>> + */
>> +static __always_inline void cpuset_update_task_mems_allowed(void)
>> +{
>> + struct task_struct *tsk = current;
>> + unsigned long flags;
>> +
>> + if (unlikely(tsk->mems_generation != tsk->mems_generation_for_update)) {
>> + task_mems_lock_irqsave(tsk, flags);
>> + tsk->mems_allowed = tsk->mems_allowed_for_update;
>> + tsk->mems_generation = tsk->mems_generation_for_update;
>> + task_mems_unlock_irqrestore(tsk, flags);
>
> By this synchronization, you're guaranteeing that no other kernel code
> ever reads tsk->mems_allowed when tsk != current? Otherwise, you're
> simply protecting the store to tsk->mems_allowed here and not serializing
> on the loads that can return empty nodemasks.

I guarantee that no other kernel code changes tsk->mems_allowed when tsk != current.
so every task can be safe to read tsk->mems_allowed without lock.

I will use mems_lock to protect it when other task reads.

>> + /* Protection of ->mems_allowed_for_update */
>> + spinlock_t mems_lock;
>> + /*
>> + * This variable(mems_allowed_for_update) are just used for caching
>> + * memory placement information.
>> + *
>> + * ->mems_allowed are used by the kernel allocator.
>> + */
>> + nodemask_t mems_allowed_for_update; /* Protected by mems_lock */
>
> Another nodemask_t in struct task_struct for this? And for all configs,
> including those that can do atomic updates to mems_allowed?

Yes, for all configs.

>
>> +
>> + /*
>> + * Increment this integer everytime ->mems_allowed_for_update is
>> + * changed by cpuset. Task can compare this number with mems_generation,
>> + * and if they are not the same, mems_allowed_for_update is changed and
>> + * ->mems_allowed must be updated. In this way, tasks can avoid having
>> + * to lock and reload mems_allowed_for_update unless it is changed.
>> + */
>> + int mems_generation_for_update;
>> + /*
>> + * After updating mems_allowed, set mems_generation to
>> + * mems_generation_for_update.
>> + */
>> + int mems_generation;
>
> I don't see why you need two mems_generation numbers, one should belong in
> the task's cpuset. Then you can compare tsk->mems_generation to
> task_cs(tsk)->mems_generation at cpuset_update_task_memory_state() if you
> set tsk->mems_generation = task_cs(tsk)->mems_generation on
> cpuset_attach() or update_nodemask().

In this way, we must use rcu_read_lock() to protect task's cs, and the performance
will slowdown though rcu read lock's spending is very small.

Thanks!
Miao

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jim Meyering: "Re: ATA 4 KiB sector issues."
Previous message: Hitoshi Mitake: "Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursionprotection"
In reply to: David Rientjes: "Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily"
Next in thread: Nick Piggin: "Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]