Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations

From: Linus Torvalds
Date: Mon Jun 29 2020 - 14:37:58 EST


On Mon, Jun 29, 2020 at 11:07 AM Christoph Hellwig <hch@xxxxxx> wrote:
>
> One issue is that a lot setsockopt calls are in the fast path, and
> even have micro-optimizations like putting an int on stack for the
> fast path to avoid the memory allocation.

Yeah., An the RFC patch I posted could easily be updated to do exactly
that for small optlen values (say, avoid the kmalloc and use a stack
buffer for oplen smaller than 16 bytes or whatever).

Most of the setsockopt's I'm aware of are just a single integer, so if
that's the bulk of them, then we'd never actually need to do the
kmalloc() in those cases, and only fall back to the kmalloc for the
(hopefully quite unusual) bigger options..

> I'd love to be able to do that. And now that we want through this
> whole mess than Nth time I have another idea:
>
> - we assume optlen is correct, which should cover about 90% of
> the protocols
> - but to override that a new setsockopt_len method is added that
> returns the correct length for all the messy ones.
>
> Let me try if that works out.

Doing a quick grep, there's about 100 different ".setsockopt" function
initializers, but a quarter of them are just setting it to
'sock_no_setsockopt'.

A number of others are using 'sock_common_setsockopt'.

Which leaves something like 50 different implementations of the
.setsockopt functions. But I didn't go any deeper than that - maybe
they then have hundreds of different option cases each and this is all
a nightmare.

Looking at a couple of them, the "int val" situation does seem to be
the most common one by _far_, and is often handled by a common
"get_user()" thing, so converting them to just getting the thing as a
kernel pointer doesn't look _too_ nasty, because even when they have a
lot of subcases, the actual optval accesses are much fewer.

Which is not to say that it looks all that much fun, but it doesn't
look entirely undoable either.

The good news (I guess) is that any missed transformation will be
fairly obvious (ie somebody uses a "get_user()" on what is now a
kernel pointer, and returns -EFAULT. So it shouldn't cause any subtle
failures, and it shouldn't cause any security issues.

I didn't look at the compat cases, but if anything I'd expect those to
become simpler by having kernel pointers. And there doesn't actually
seem to be that many of them (possibly because the "int" case si so
common that it all ends up being the same?)

Linus