Re: [RFC 0/9] mce recovery for Sandy Bridge server

From: Tony Luck
Date: Wed May 25 2011 - 19:53:17 EST


2011/5/25 Ingo Molnar <mingo@xxxxxxx>:
> Well, the primary thing TIF_MCE_NOTIFY does is a roundabout way to
> iterate through repeat calls to memory_failure(), with all pfns that
> got buffered so far.
>
> We already have a generic facility to do such things at
> return-to-userspace: _TIF_USER_RETURN_NOTIFY.

This looked really promising as a way to drop one use of TIF_MCE_NOTIFY,
but it doesn't currently quite do what is needed for my new case.

What I need is a way to grab the current task just before it returns to user
space - what this code appears to do is to catch the current
*processor* just before
it sees a flagged process trying to return to user space.

These aren't quite the same ... if I use "user_return_notifier_register()" in
my machine check handler, what might happen is that entry_64.S
paranoid_userspace may see _TIF_NEED_RESCHED, and call schedule.
Now my "i don't want this to run" process could be picked up by a different
cpu that doesn't have the notifier registered.

The big clue was
head = &get_cpu_var(return_notifier_list);
in fire_user_return_notifiers()


But I wonder if I'm misreading the code - I'm not quite certain
what the kvm code is trying to do when using this, but it looks
to me that it might also suffer from the resched and migrate to
another cpu possibility.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/