On Tue, Aug 22, 2023 at 01:08:52PM +0100, Matthew Wilcox wrote:
On Tue, Aug 22, 2023 at 11:41:41AM +0800, Tong Tiangen wrote:
在 2023/8/22 2:33, Matthew Wilcox 写道:
On Mon, Aug 21, 2023 at 05:13:12PM +0800, Tong Tiangen wrote:
We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2
unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result,
softlockup is triggered.
For collect_procs_anon(), we will not modify the tasklist, but only perform
read traversal. Therefore, we can use rcu lock instead of spin lock
tasklist_lock, from this, we can break the softlock chain above.
The only thing that's giving me pause is that there's no discussion
about why this is safe. "We're not modifying it" isn't really enough
to justify going from read_lock() to rcu_read_lock(). When you take a
normal read_lock(), writers are not permitted and so you see an atomic
snapshot of the list. With rcu_read_lock() you can see inconsistencies.
Hi Matthew:
When rcu_read_lock() is used, the task list can be modified during the
iteration, but cannot be seen during iteration. After the iteration is
complete, the task list can be updated in the RCU mechanism. Therefore, the
task list used by iteration can also be considered as a snapshot.
No, that's not true! You are not iterating a snapshot of the list,
you're iterating the live list. It will change under you. RCU provides
you with some guarantees about that list. See Documentation/RCU/listRCU.rst
For example, if new tasks can be added to the tasklist, they may not
be seen by an iteration. Is this OK?
The newly added tasks does not access the HWPoison page, because the
HWPoison page has been isolated from the
buddy(memory_failure()->take_page_off_buddy()). Therefore, it is safe to see
the newly added task during the iteration and not be seen by iteration.
Tasks may be removed from the
tasklist after they have been seen by the iteration. Is this OK?
Task be seen during iteration are deleted from the task list after
iteration, it's task_struct is not released because reference counting is
added in __add_to_kill(). Therefore, the subsequent processing of
kill_procs() is not affected (sending signals to the task deleted from task
list). so i think it's safe too.
I don't know this code, but it seems unsafe to me. Look:
collect_procs_anon:
for_each_process(tsk) {
struct task_struct *t = task_early_kill(tsk, force_early);
add_to_kill_anon_file(t, page, vma, to_kill);
add_to_kill_anon_file:
__add_to_kill(tsk, p, vma, to_kill, 0, FSDAX_INVALID_PGOFF);
__add_to_kill:
get_task_struct(tsk);
static inline struct task_struct *get_task_struct(struct task_struct *t)
{
refcount_inc(&t->usage);
return t;
}
/**
* refcount_inc - increment a refcount
* @r: the refcount to increment
*
* Similar to atomic_inc(), but will saturate at REFCOUNT_SATURATED and WARN.
*
* Provides no memory ordering, it is assumed the caller already has a
* reference on the object.
*
* Will WARN if the refcount is 0, as this represents a possible use-after-free
* condition.
*/
I don't see anything that prevents that refcount_inc from seeing a zero
refcount. Usually that would be prevented by tasklist_lock, right?
This "calling get_task_struct in for_each_process loop with read_rcu_lock"
pattern seems to be used also in mm/oom_kill.c (for example in select_bad_process()),
so this might be more generic problem.
I tried to see how OOM code prevents the issue, but there seems nothing to do that.
oom_kill_process(), which is responsible for terminating the victim process, directly
tries to acquire task_lock(victim), despite *victim could be freed at this point.
If someone knows OOM code is safe for some reason, hwpoison could potentially follow
the approach.
Thanks,
Naoya Horiguchi
.
Andrew, I think this patch is bad and needs to be dropped.