Re: [LTP] [PATCH] sysctl03: sysctl returns EACCES after 2.6.33-rc1

From: Eric W. Biederman
Date: Fri Mar 05 2010 - 02:50:19 EST

Garrett Cooper <yanegomi@xxxxxxxxx> writes:

> On Thu, Mar 4, 2010 at 11:57 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Garrett Cooper <yanegomi@xxxxxxxxx> writes:
>>> Wow... that's a fair amount of code refactoring and additions to the syscall.
>>> Yes, all of the issues with opening a directory and reading/writing
>>> now apply to sysctl(2), especially if someone attempts to read from a
>>> write-only descriptor, or vice versa.
>> No mismatches of file descriptor modes and how the descriptor is
>> accessed can not occur. ÂThere is a file descriptor but the file
>> descriptor is completely internal to binary_sysctl(), and it is opened
>> with the mode of what we are trying to use. ÂThere are no user space
>> controllable parts there.
>> Looking through the old sysctl code it appears that it was a bug that
>> kept it from returning EACCES. ÂThe code has had this beautiful snippet
>> in it for ages:
>> static int test_perm(int mode, int op)
>> {
>> Â Â Â Âif (!current->euid)
>> Â Â Â Â Â Â Â Âmode >>= 6;
>> Â Â Â Âelse if (in_egroup_p(0))
>> Â Â Â Â Â Â Â Âmode >>= 3;
>> Â Â Â Âif ((mode & op & 0007) == op)
>> Â Â Â Â Â Â Â Âreturn 0;
>> Â Â Â Âreturn -EACCES;
>> }
> Wow. Took a second for me to stare and it and see what you mean,
> but yeah -- that is pretty dang awesome that it was always hardwired
> to return 0.

Sorry that wasn't clear without context. What used to happen
where all of the callers of that function did:
if (test_perm(...))
return -EPERM;

Instead of the much more conventional:

err = test_perm()
if (err)
return err;

>> I admit that the manpage doesn't document EACCES but the manpage
>> has always said don't use sysctl(2) so...
> Well, if someone bumbles across this later, it will be a confusing
> issue to work through. It's better to be documented instead of
> undocumented. I'll file the bug upstream to document this, but it
> would be nice to determine if there are any more immediate gaps which
> need to be addressed in the changes.

I think the linux test project may be nearly the only caller of sysctl(2)
at this point. At least until recently there was one caller in arm
glibc. But finding any program that uses sysctl(2) is nearly impossible.

>> You may see a slightly different error code from sysctl(2) on failure
>> but otherwise Âsysctl(2) should be unchanged, and yes I did test it.
>> Of course I was not being picky about which error code I got on failure.
> Hmmm.. ok. We just get 20 questions when something fails and it's
> not documented why it should fail in a particular manner :).

>> What exists today is simply a backwards compatibility wrapper of
>> sysctl(2) built on top of /proc/sys. Âsysctl(2) was a practically
>> unmaintained bit-rotting pile, that was never adequately maintained or
>> tested.
> Yeah, you're probably right (especially because Linux tends not to
> focus on sysctl(3) like the BSDs do).
>> At this point nothing should change again until such time as the code
>> is disabled/removed by default.
> Hmmm... ok. I assume that sysctl(2) is going completely out the
> window in the future, in favor of what (just out of curiosity)? 100%
> sysfs only tunables maybe?

/proc/sys is going to stay. Which is what people have actually used.
Even /sbin/sysctl has always used /proc/sys. Nothing anyone actually
uses is going to go away. Just the practically dead code that is the
syscall is slowly going away. Since I have written the emulation layer
the need for it to disappear is less immediate than it once was, but I will
strongly discourage anyone from using it.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at