Re: [RFC] syscalls, x86: Add __NR_kcmp syscall

From: Pavel Emelyanov
Date: Wed Jan 18 2012 - 00:09:00 EST


On 01/18/2012 01:40 AM, Eric W. Biederman wrote:
> Cyrill Gorcunov <gorcunov@xxxxxxxxx> writes:
>
>> On Tue, Jan 17, 2012 at 10:47:37AM -0800, H. Peter Anvin wrote:
>>> On 01/17/2012 06:44 AM, Cyrill Gorcunov wrote:
>>>> On Tue, Jan 17, 2012 at 04:38:14PM +0200, Alexey Dobriyan wrote:
>>>>> On 1/17/12, Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote:
>>>>>> +#define KCMP_EQ 0
>>>>>> +#define KCMP_LT 1
>>>>>> +#define KCMP_GT 2
>>>>>
>>>>> LT and GT are meaningless.
>>>>>
>>>>
>>>> I found symbolic names better than open-coded values. But sure,
>>>> if this is problem it could be dropped.
>>>>
>>>> Or you mean that in general anything but 'equal' is useless?
>>>>
>>>
>>> Why on Earth would user space need to know which order in memory certain
>>> kernel objects are?
>>>
>>> Keep in mind that this is *exactly* the kind of information which makes
>>> rootkits easier.
>>>
>>
>> Hmm, indeed this might help narrow down the target address I fear. So
>> after some conversation with Pavel I think we can try to live with just
>> one result -- is objects are at same location in kernel memory or not.
>> The updated version is below. Please review if you get a chance. Thanks
>> a lot for comments!
>
> Seriously?
>
> Or is this a case where you get something in then when people start
> seriously using it and the performance is sucks badly you go back to
> something like the current system call?
>
> How are you going to ensure the performance does not degrade badly when
> looking across a large number of processes?

We can compare the e.g. files' target inodes (ino + dev) and positions and
comparing each-to-each only for those having these pairs equal. Looking at
the existing large containers with tens thousands of fd-s we have this
gives us maximum 6 files to compare, and performing 15 syscalls for this suits
us for now.

Of course, if you manage to persuade Peter that his memory ordering concerns
are not real problems _now_, that would be great, but, yet again -- simple
{eq, ne} suit us for now, providing we can extend this API on {eq, le, gt}
in the future.

> Eric
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/