Re: [kernel-hardening] [RFC v4 03/18] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles

From: Jann Horn
Date: Wed Oct 26 2016 - 16:16:13 EST


On Wed, Oct 26, 2016 at 10:03:09PM +0200, Mickaël Salaün wrote:
> On 26/10/2016 21:01, Jann Horn wrote:
> > On Wed, Oct 26, 2016 at 08:56:39AM +0200, Mickaël Salaün wrote:
> >> This new arraymap looks like a set and brings new properties:
> >> * strong typing of entries: the eBPF functions get the array type of
> >> elements instead of CONST_PTR_TO_MAP (e.g.
> >> CONST_PTR_TO_LANDLOCK_HANDLE_FS);
> >> * force sequential filling (i.e. replace or append-only update), which
> >> allow quick browsing of all entries.
> >>
> >> This strong typing is useful to statically check if the content of a map
> >> can be passed to an eBPF function. For example, Landlock use it to store
> >> and manage kernel objects (e.g. struct file) instead of dealing with
> >> userland raw data. This improve efficiency and ensure that an eBPF
> >> program can only call functions with the right high-level arguments.
> >>
> >> The enum bpf_map_handle_type list low-level types (e.g.
> >> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
> >> updating a map entry (handle). This handle types are used to infer a
> >> high-level arraymap type which are listed in enum bpf_map_array_type
> >> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
> >>
> >> For now, this new arraymap is only used by Landlock LSM (cf. next
> >> commits) but it could be useful for other needs.
> >>
> >> Changes since v3:
> >> * make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
> >> * factor out the arraymay walk
> >>
> >> Changes since v2:
> >> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
> >> handle entries (suggested by Andy Lutomirski)
> >> * remove useless checks
> >>
> >> Changes since v1:
> >> * arraymap of handles replace custom checker groups
> >> * simpler userland API
> > [...]
> >> + case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
> >> + handle_file = fget(handle->fd);
> >> + if (IS_ERR(handle_file))
> >> + return ERR_CAST(handle_file);
> >> + /* check if the FD is tied to a user mount point */
> >> + if (unlikely(handle_file->f_path.mnt->mnt_flags & MNT_INTERNAL)) {
> >> + fput(handle_file);
> >> + return ERR_PTR(-EINVAL);
> >> + }
> >> + path_get(&handle_file->f_path);
> >> + ret = kmalloc(sizeof(*ret), GFP_KERNEL);
> >> + ret->path = handle_file->f_path;
> >> + fput(handle_file);
> >
> > You can use fdget() and fdput() here because the reference to
> > handle_file is dropped before the end of the syscall.
>
> The reference to handle_file is dropped but not the reference to the
> (inner) path thanks to path_get().

That's irrelevant. As long as you promise to fdput() any references
acquired using fdget() before any of the following can happen, using
fdget() is okay:
- the syscall exits
- the fd table is shared with a process that might write to it
- an fd is closed by the kernel
In other words, you must be able to prove that nobody can remove the
struct file * from your fd table before you fdput().

Taking a long-term reference to an object pointed to by a struct file
that was looked up with fdget() is fine.

Attachment: signature.asc
Description: Digital signature