Benjamin Herrenschmidt wrote:
> That was one of the solutions proposed by Rusty, that is basically
> waiting for all CPUs to have scheduled upon exit from module_exit
> and before doing the actual removal.
No, that's not what I mean. If you have a de-registration function,
it may give you one of the following assurances:
1) none (e.g. of xxx_deregister just unlinks some ops structure,
but makes no attempt to squash concurrent accesses)
2) guarantee that no further access to ops structure will occur
after xxx_deregister returns, and
2a) one can probe for cached references/execution of callbacks,
e.g.
foo_deregister(&my_ops);
while (foo_running()) schedule();
/* can destroy local state/code now */
This is a common construct in the Linux kernel. Note that
none of the module code executes after foo_running returns,
so return-after-removal can't happen.
2b) we're guaranteed that any running callbacks or such will
have acquired their own locks/counts/etc. before
xxx_deregister returns. Modules would (also) use the use
count to protect code. Not sure if this happens anywhere
in the kernel. return-after-removal would be possible
here (unless we're going through a layer of trampolines
or such). E.g.
foo_deregister(&my_ops);
while (myself_running()) schedule();
/* may still have to execute code after unlocking */
while (MOD_IN_USE) schedule();
Note that this one still protects data !
3) guarantee that no direct effects of accesses to ops structure
occur after xxx_deregister returns, e.g.
foo_deregister(&my_ops);
/* can destroy local state/code now */
This also eliminates return-after-removal, because deregister
would wait for callbacks to return.
4a/4b) Like 3, but the callback signals back completion without
returning (i.e. to indicate that it has copied all shared data,
and is now working on a private copy), we're back to case 2a or
2b. In the case of modules, the usage count would be used to
protect code, so decrement_and_return would be sufficient here.
I think these are the most common cases. The point is that only
case 1) leaves you completely in the dark, and only cases 2b and
4b are special when it comes to modules. Cases 2a, 3, and 4a are
always safe.
Moreover, case 1, and cases 2a, 2b, 4a, and 4b without
synchronizing after de-registration, can also cause problems in
non-module code, e.g.
bar_callback(void *my_data)
{
*(int **) my_data = 1;
}
...
int whatever = 0,*ptr = &whatever;
...
foo_register(&bar_ops,&ptr);
...
foo_deregister(&bar_ops); /* case 1, 2a, 2b, 4a, or 4b */
ptr = NULL;
/* BUG: bar_callback may still be running ! */
...
Such code would only be data-safe (but not code-safe) if all shared
data is static, e.g. if the callback writes to some hardware
register, and the address is either static or constant.
AFAIK, most de-registration functions should be in case 2a, 3, or
4a. Well, we probably have a bunch of cases 1, too :-)
Now, what does this all boil down to ? Cases 2a, 3, and 4a are
non-issues. Cases 2b and 4b still need decrement_and_return, so I
was a bit too optimistic in assuming we're totally safe. Case 1 is
pathological also for non-modules, unless a few unlikely
constraints are met.
- Werner
-- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://icapeople.epfl.ch/almesber/_____________________________________/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Jul 07 2002 - 22:00:09 EST