------=_NextPart_000_02F9_01BEEFB6.2E90CB00
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Correction.
They only break if we cascade them, that is, use a lock hiearchy with. =
So long as you are just using a single lock and are not holding multiple =
locks, it we canot reproduce it. If we are using multiple locks, then =
it breaks. I guess the assumption here is that sleep on will only allow =
one process at a time to make it through. What isn't clear is why these =
primitives don't work in a nested lock hiearcy (which is what sleep =
locks are for in the first damn place). =20
Please advise.
Jeff
----- Original Message -----=20
From: Jeff Merkey=20
To: torvalds@transmeta.com ; linux-kernel@vger.rutgers.edu=20
Sent: Thursday, August 26, 1999 11:23 AM
Subject: Fw: Locks used in the FAT file system are non-atomic and in =
fact, don't work on SMP systems
Linus,
You are also doing this in locks.h and the functions lock_super() and =
unlock_super(). Am I missing something here? We used this same method, =
and got corrupted data on SMP systems. It is possible for two processes =
to blow up here by entering the function at the same time if the lock =
variable is zero. It's hard to reproduce (we have to perform cyclic =
copies with 8+ processes on a 4 processor system for over two hours to =
reproduce, but there is a hole here if we use these locking primitives =
the way you have defined them in locks.h.
Comments? =20
Please advise.
Jeff
----- Original Message -----=20
From: Jeff Merkey=20
To: linux-kernel@vger.rutgers.edu=20
Sent: Thursday, August 26, 1999 10:44 AM
Subject: Locks used in the FAT file system are non-atomic and in fact, =
don't work on SMP systems
=20
We had attempted to use the FAT version of locks with wait queues, but =
have discovered they are non-atomic and in fact, under very heavy load =
allow shared data corrupton on SMP systems. They also have some subtle =
race conditions even on non-SMP systems i reentrant code. We are using =
atomic semaphores now instead. Just thought we would warn folks that =
what's out there appears to be busted.
The offending code is:
Lock()
{
while (lock) sleep_on(&wait);
lock =3D 1;
}
Unlock()
{
lock =3D 0;
wake_up(&wait);
}
Two processes can enter Lock() while lock is equal to 0, and both set =
it. We have seen this occur, and it seems broken. =20
Jeff
------=_NextPart_000_02F9_01BEEFB6.2E90CB00
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
------=_NextPart_000_02F9_01BEEFB6.2E90CB00-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/----- Original Message -----From:=20 Jeff=20 MerkeySent: Thursday, August 26, 1999 = 11:23=20 AMSubject: Fw: Locks used in the = FAT file=20 system are non-atomic and in fact, don't work on SMP systemsLinus,You are also doing this in locks.h = and the=20 functions lock_super() and unlock_super(). Am I missing = something=20 here? We used this same method, and got corrupted data on SMP=20 systems. It is possible for two processes to blow up here by = entering=20 the function at the same time if the lock variable is zero. It's = hard to=20 reproduce (we have to perform cyclic copies with 8+ processes on a 4 = processor=20 system for over two hours to reproduce, but there is a hole here if we = use=20 these locking primitives the way you have defined them in=20 locks.h.Comments?Please advise.Jeff----- Original Message -----=20From: Jeff=20 MerkeySent: Thursday, August 26, 1999 10:44 AMSubject: Locks used in the FAT file system are non-atomic = and in=20 fact, don't work on SMP systemsWe had attempted to use the FAT = version of locks=20 with wait queues, but have discovered they are non-atomic and in fact, = under=20 very heavy load allow shared data corrupton on SMP systems. They = also=20 have some subtle race conditions even on non-SMP systems i reentrant=20 code. We are using atomic semaphores now instead. Just = thought we=20 would warn folks that what's out there appears to be = busted.The offending code is:Lock(){while (lock)=20 sleep_on(&wait);lock =3D 1;}Unlock(){lock =3D 0;= wake_up(&wait);}Two processes can enter Lock() while = lock is=20 equal to 0, and both set it. We have seen this occur, and it = seems=20 broken.Jeff