ä 2013å06æ05æ 23:41, JÃrn Engel åé:We also should prevent sg_device change those parameters which are needed to copy to sfp during sfp initialization.On Thu, 6 June 2013 00:16:45 +0800, vaughan wrote:I find my patch is not enough to avoid this race condition said above.ä 2013å06æ05æ 21:27, JÃrn Engel åé:On Wed, 5 June 2013 17:18:33 +0800, vaughan wrote:It's shorter than the original line, so I just leave it like this...
Check and set sdp->exclude should be atomic when set in sg_open().
The patch is line-wrapped. More importantly, it doesn't seem to do
Sure. What I meant by line-wrapped is that your mailer mangled the
patch. Those two lines should have been one:
- ((!sfds_list_empty(sdp) || get_exclude(sdp))
? 0 : set_exclude(sdp, 1)));
what your description indicates it should do. And lastly, does thisI found it by code inspection. A race condition may happen with the
fix a bug, possibly even one you have a testcase for, or was it found
by code inspection?
old code if two threads are both trying to open the same sg with
O_EXCL simultaneously. It's possible that they both find fsds list
is empty and get_exclude(sdp) returns 0, then they both call
set_exclude() and break out from wait_event_interruptible and resume
open. So it's necessary to check again with sg_open_exclusive_lock
held to ensure only one can set sdp->exclude and return >0 to break
out from wait_event loop.
Makes sense. And reading the code again, I have to wonder what monkey
came up with the get_exclude/set_exclude functions.
Can I sucker you into a slightly larger cleanup? I think the entire
"get_exclude(sdp)) ? 0 : set_exclude(sdp, 1)" should be simplified.
And once you add the try_set_exclude(), set_exclude will only ever do
clear_exclude, so you might as well rename and simplify that as well.
Since sg_add_sfp() just do an add_to_list without check and wait_event
check don't set a sign to announce a future add_to_list is on going, the
time window between wait_event and sg_add_sfp gives others to open sg
before the prechecked sg_add_sfp() called.
The same case also happens when one shared and one exclude open occur
simultaneously. If the shared open pass the precheck stage and ready to
sg_add_sfp(). At this time another exclude open will also pass the check:
((!sfds_list_empty(sdp) || get_exclude(sdp)) ? 0 :
try_set_exclude(sdp)));
Then, both open can succeed.
I think the point is we separate the check&add routine and haven't set
an sign to let others wait until the whole actions complete. I suppose
we may change the steps a bit to avoid trouble like this. If we can
malloc&initialize sfp at first, and then check&add sfp under the
protection of sg_index_lock, everything seems to be quite simple.
--
Regards,
Vaughan
Let no good deed go unpunished.
JÃrn
--
It's just what we asked for, but not what we want!
-- anonymous