Re: [PATCH -tip] introduce sys_membarrier(): process-wide memorybarrier (v9)

From: Ingo Molnar
Date: Thu Mar 04 2010 - 15:24:04 EST

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 4 Mar 2010, Ingo Molnar wrote:
> >
> > - SA_NOFPU: on x86 to skip the FPU/SSE save/restore, for such fast in/out special
> > purpose signal handlers? (can whip up a quick patch for you if you want)
> I'd love to do this, but it's wrong.
> It's too damn easy to use the FPU by mistake in user land, without ever
> being aware of it. memset()/memcpy are obvious potential users SSE, but they
> might be called in non-obvious ways implicitly by the compiler (ie structure
> copy and setup).
> And modern glibc ends up using SSE4 even for things like strstr and strlen,
> so it really is creeping into all kinds of trivial helper functions that
> might not be obvious. So SA_NOFPU is a lovely idea, but it's also an idea
> that sucks rotten eggs in practice, with quite possibly the same _binary_
> working or not working depending on what kind of CPU and what shared library
> it happens to be using.
> Too damn fragile, in other words.
> (Now, if it's accompanied by the kernel actually _testing_ that there is no
> FPU activity, by setting the TS flag and checking at fault time and causing
> a SIGFPE, then that would be better. At least you'd get a nice clear signal
> rather than random FPU state corruption. But you're still in the situation
> that now the binary might work on some machines and setups, and not on
> others.

Perhaps NOFPU could do lazy context saving: clear the TS flag and only save
the FPU state if it's actually used by the signal handler?

This turns it into a 'hint', not into an FPU state corruption issue.

Clearing/enabling FPU instructions is still faster than a full-blown FPU
context save/restore.

Careful and lightweight signal handlers (like a GC scheme would likely be)
would thus be faster. In the worst-case it incures an extra trap and a
(measurable/profilable) slowdown.

In any case this would be a secondary optimization - the biggest difference
i'd expect from the 'dont wake up the world' logic:

> > - SA_RUNNING: a way to signal only running threads - as a way for user-space
> > based concurrency control mechanisms to deschedule running threads (or, like
> > in your case, to implement barrier / garbage collection schemes).
> Hmm. This sounds less fundamentally broken, but at the same time also _way_
> more invasive in the signal handling layer. It's already one of our more
> "exciting" layers out there.

Yeah, definitely. But i still tend to think it should be actively tried, at
which point we can still say 'yuck this cannot work, lets go for the
sys_membarrier() solution'.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at