Re: [PATCH bpf v6 1/2] lib/strncpy_from_user.c: Don't overcopy bytes after NUL terminator

From: Daniel Xu
Date: Mon Nov 16 2020 - 23:53:11 EST


Hi Linus,

On Mon, Nov 16, 2020 at 02:15:52PM -0800, Linus Torvalds wrote:
> On Mon, Nov 16, 2020 at 1:17 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > Based on on-list discussion and some off-list discussion with Alexei,
> > I'd like to propose the v4-style patch without the `(*out & ~mask)`
> > bit.
>
> So I've verified that at least on x86-64, this doesn't really make
> code generation any worse, and I'm ok with the patch from that
> standpoint.
>
> However, this was not what the discussion actually amended at as far
> as I'm concerned.
>
> I mentioned that if BPF cares about the bytes past the end of the
> string, I want a *BIG COMMENT* about it. Yes, in strncpy_from_user()
> itself, but even more in the place that cares.

Sure, sorry. Will send another version with comments.

> And no, that does not mean bpf_probe_read_user_str(). That function
> clearly doesn't care at all, and doesn't access anything past the end
> of the string. I want a comment in whatever code that accesses past
> the end of the string.

If I'm reading things right, that place is technically in
kernel/bpf/hashtab.c:alloc_htab_elem. In line:

memcpy(l_new->key, key, key_size);

where `key_size` is the width of the hashtab key. The flow looks like:

<bpf prog code>
bpf_map_update_elem()
htab_map_update_elem()
alloc_htab_elem()

It feels a bit weird to me to add a comment about strings in there
because the hash table code is string-agnostic, as mentioned in the
commit message.

> And your ABI point is actively misleading:
>
> > We can't really zero out the rest of the buffer due to ABI issues.
> > The bpf docs state for bpf_probe_read_user_str():
> >
> > > In case the string length is smaller than *size*, the target is not
> > > padded with further NUL bytes.
>
> This comment is actively wrong and misleading.
>
> The code (after the patch) clearly *does* pad a bit with "further NUL
> bytes". It's just that it doesn't pad all the way to the end.

Right, it's a bit ugly and perhaps misleading. But it fixes the real
problem of keying a map with potentially random memory that
strncpy_from_user() might append on. If we pad a deterministic number of
zeros it should be ok.

> Where is the actual buffer zeroing done?

Usually the bpf prog does it. I believe (could be wrong) the verifier
enforces the key is initialized in some form.

For my specific use case, it's done in the bpftrace code:
https://github.com/iovisor/bpftrace/blob/0c88a1a4711a3d13e8c9475f7d08f83a5018fdfd/src/ast/codegen_llvm.cpp#L529-L530

> Because without the buffer zeroing, this whole patch is completely
> pointless. Which is why I want that comment, and I want a pointer to
> where that zeroing is done.
>
> Really. You have two cases:
>
> (a) the buffer isn't zeroed before the strncpy_from_user()
>
> (b) the buffer is guaranteed to be zero before that
>
> and in case (a), this patch is pointless, since the data after the
> string is already undefined.

See above -- I think the verifier enforces that the data is initialized.

> And in case (b), I want to see a comment and a pointer to the code
> that actually does the zeroing.

See above also.

> HOWEVER. Look at bpf_probe_read_user_str_common(), and notice how it
> ALREADY does the zeroing of the buffer past the end, it's just that it
> only does it in the error case.

I don't think the error case is very relevant here. I normally wouldn't
make any assumptions about the state of a buffer after a failed function
call.

> Why do you send this patch, instead of
>
> (a) get rid of the pointless pre-zeroing
>
> (b) change bpf_probe_read_user_str_common() to do
>
> int ret;
> u32 offset;
>
> ret = strncpy_from_user_nofault(dst, unsafe_ptr, size);
> offset = ret < 0 ? 0 : ret;
> memset(dst+offset, 0, size-offset);
> return ret;
>
> which seems to be much simpler anyway. The comment you quote about
> "target is not padded with further NUL bytes" is clearly wrong anyway,
> since that error case *does* pad the target with NUL bytes, and always
> has.

I think on the bpf side there's the same concern about performance.
You can't really dynamically size buffers in bpf land so users usually
use a larger buffer than necessary, sometimes on the order of KBs. So if
we unnecessarily zero out the rest of the buffer it could cause perf
regressions.

[...]

Thanks,
Daniel