Re: Shared memory not SMP safe to user-mode code.

Richard B. Johnson (root@chaos.analogic.com)
Wed, 1 Dec 1999 16:44:37 -0500 (EST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Alexander Viro: "Re: (XFree86) mmap of /proc/XXX/mem"
Previous message: Mike Galbraith: "[patch] Re: 2.3.29: ramdisk still broken."

On Wed, 1 Dec 1999, Jamie Lokier wrote:

> Richard B. Johnson wrote:
> > On Wed, 1 Dec 1999, Jamie Lokier wrote:
> >
> > > Richard B. Johnson wrote a spinlock:
> > > > pars->spin += key;
> > > > if(pars->spin != key)
> > > > {
> > > > pars->spin -= key;
> > >
> > > Wrong. This is not safe on UP /or/ SMP because "+=" and "-=" are not
> > > atomic operations in C. `volatile' does not help either.
> > >
> > It is not necessary for the operations to be atomic! It is only essential
> > that they can be undone. Both the child and the parent have private
> > copies of their key value. The resulting arithmetic on the shared variable
> > will be wrong if any of the read/modify/write operations are interrupted
> > by another task, however, the resulting "mess" will be completely undone
> > when both of the tasks subtract their key values (in any order).
>
> Subtracting the private key values will not undo the mess. Back to the
> diagram. += and -= are not atomic in C.
>
> [ On SMP, even if the compiler happens to choose (unlocked)
> read-modify-write instructions, the processor divides them into those
> three phases using an internal register anyway. So reg below still
> makes sense ].

[SNIPPING]

You are forgetting that this is user-mode code. There is only one
'current'. This is Unix, not Multix. There is not a 'current' for CPU 0
and another 'current' for CPU N. There is only one user-mode task
executing at any one time. That task can only be switched at 'HZ'
intervals. It can be interrupted at any time, but the CPU will be
given back after the interrupt and no user-mode code is modified
by an interrupt. This is not interrupt code, nor signal code. There are
millions and millions of instructions that will be executed between these
intervals.

Given some non-atomic operations, task 1 reads from shared memory,
has added something, but has not written in back yet. A context-switch
occurs. Task 2 is guaranteed enough time to complete whatever operation
it wants to do with shared memory, including acquiring a lock, reading
and/or writing to it zillions of times before the CPU is ever returned
to task 1. To guarantee that the CPU is given up while it's safe, each
task always normalizes (subtracts its key value) from its local variable
before calling sleep. Sleep is long enough (usleep) so when the original
task gets the CPU back it will have enough time to complete the operation
before it, too, calls sleep. Task 1 doesn't even know that the shared
variable has been changed and, in fact, it hasn't. It's still zero because
task 2 has already completed its operation, returning it to zero.

This is no invention. This is the way (time-slicing) multitasking
operating systems have done 'locking' for years. Most POSIX.4 - compliant
Unix systems have sched_yield(), which can be executed instead of
usleep(). Given a number of processes which "do their thing", then give up
the CPU, there will never be any contention for the lock after the first.

In the kernel, things are different. There can be several CPUs attempting
to modify memory at the same time. In the kernel, you need single
machine-cycle granularity, not zillions. Here, spin-locks are essential
if you have two or more CPUs. Given a single CPU, you can often do locking
(which is against an interrupt because it's the only way you would
have any problems with a single CPU), by a simple 'cli' instruction.
However, it is possible to eliminate most such "dead-to-interrupt" code
by using variables which can be modified atomically as semaphores, to
protect critical regions of code.

Cheers,
Dick Johnson

Penguin : Linux version 2.3.13 on an i686 machine (400.59 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alexander Viro: "Re: (XFree86) mmap of /proc/XXX/mem"
Previous message: Mike Galbraith: "[patch] Re: 2.3.29: ramdisk still broken."