Re: Locks used in the FAT file system are non-atomic and in fact, don't work on SMP systems

Jeff Merkey (jmerkey@timpanogas.com)
Thu, 26 Aug 1999 11:28:55 -0600


This is a multi-part message in MIME format.

------=_NextPart_000_02F9_01BEEFB6.2E90CB00
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Correction.

They only break if we cascade them, that is, use a lock hiearchy with. =
So long as you are just using a single lock and are not holding multiple =
locks, it we canot reproduce it. If we are using multiple locks, then =
it breaks. I guess the assumption here is that sleep on will only allow =
one process at a time to make it through. What isn't clear is why these =
primitives don't work in a nested lock hiearcy (which is what sleep =
locks are for in the first damn place). =20

Please advise.

Jeff
----- Original Message -----=20
From: Jeff Merkey=20
To: torvalds@transmeta.com ; linux-kernel@vger.rutgers.edu=20
Sent: Thursday, August 26, 1999 11:23 AM
Subject: Fw: Locks used in the FAT file system are non-atomic and in =
fact, don't work on SMP systems

Linus,

You are also doing this in locks.h and the functions lock_super() and =
unlock_super(). Am I missing something here? We used this same method, =
and got corrupted data on SMP systems. It is possible for two processes =
to blow up here by entering the function at the same time if the lock =
variable is zero. It's hard to reproduce (we have to perform cyclic =
copies with 8+ processes on a 4 processor system for over two hours to =
reproduce, but there is a hole here if we use these locking primitives =
the way you have defined them in locks.h.

Comments? =20

Please advise.

Jeff

----- Original Message -----=20
From: Jeff Merkey=20
To: linux-kernel@vger.rutgers.edu=20
Sent: Thursday, August 26, 1999 10:44 AM
Subject: Locks used in the FAT file system are non-atomic and in fact, =
don't work on SMP systems

=20
We had attempted to use the FAT version of locks with wait queues, but =
have discovered they are non-atomic and in fact, under very heavy load =
allow shared data corrupton on SMP systems. They also have some subtle =
race conditions even on non-SMP systems i reentrant code. We are using =
atomic semaphores now instead. Just thought we would warn folks that =
what's out there appears to be busted.

The offending code is:

Lock()
{
while (lock) sleep_on(&wait);
lock =3D 1;
}

Unlock()
{
lock =3D 0;
wake_up(&wait);
}

Two processes can enter Lock() while lock is equal to 0, and both set =
it. We have seen this occur, and it seems broken. =20

Jeff

------=_NextPart_000_02F9_01BEEFB6.2E90CB00
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

 
Correction.
 
They only break if we cascade them, = that is, use a=20 lock hiearchy with.  So long as you are just using a single lock = and are=20 not holding multiple locks, it we canot reproduce it.  If we are = using=20 multiple locks, then it breaks.  I guess the assumption here is = that sleep=20 on will only allow one process at a time to make it through.  What = isn't=20 clear is why these primitives don't work in a nested lock hiearcy (which = is what=20 sleep locks are for in the first damn place). 
 
Please advise.
 
Jeff
----- Original Message -----
From:=20 Jeff=20 Merkey
To: torvalds@transmeta.com ; linux-kernel@vger.rutgers.edu =
Sent: Thursday, August 26, 1999 = 11:23=20 AM
Subject: Fw: Locks used in the = FAT file=20 system are non-atomic and in fact, don't work on SMP systems

 
Linus,
 
You are also doing this in locks.h = and the=20 functions lock_super() and unlock_super().  Am I missing = something=20 here?  We used this same method, and got corrupted data on SMP=20 systems.  It is possible for two processes to blow up here by = entering=20 the function at the same time if the lock variable is zero.  It's = hard to=20 reproduce (we have to perform cyclic copies with 8+ processes on a 4 = processor=20 system for over two hours to reproduce, but there is a hole here if we = use=20 these locking primitives the way you have defined them in=20 locks.h.
 
Comments? 
 
Please advise.
 
Jeff
 
 
----- Original Message -----=20
From: Jeff=20 Merkey
To: linux-kernel@vger.rutgers.edu =
Sent: Thursday, August 26, 1999 10:44 AM
Subject: Locks used in the FAT file system are non-atomic = and in=20 fact, don't work on SMP systems

   
We had attempted to use the FAT = version of locks=20 with wait queues, but have discovered they are non-atomic and in fact, = under=20 very heavy load allow shared data corrupton on SMP systems.  They = also=20 have some subtle race conditions even on non-SMP systems i reentrant=20 code.  We are using atomic semaphores now instead.  Just = thought we=20 would warn folks that what's out there appears to be = busted.
 
The offending code is:
 
Lock()
{
   while (lock)=20 sleep_on(&wait);
   lock =3D 1;
}
 
Unlock()
{
   lock =3D 0;
   = wake_up(&wait);
}
 
Two processes can enter Lock() while = lock is=20 equal to 0, and both set it.  We have seen this occur, and it = seems=20 broken. 
 
Jeff
 
 
------=_NextPart_000_02F9_01BEEFB6.2E90CB00-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/