Re: [PATCH 0/3] SG_IO command filtering via sysfs

From: Paolo Bonzini
Date: Thu Nov 15 2018 - 19:26:50 EST


On 11/11/2018 15:14, Theodore Y. Ts'o wrote:
> On Sun, Nov 11, 2018 at 02:26:45PM +0100, Paolo Bonzini wrote:
>>
>> I'm not very eBPF savvy, the question I have is: what kind of
>> information about the running process is available in an eBPF program?
>> For example, even considering only the examples you make, would it be
>> able to access the CDB, the capabilities and uid/gid of the task, the
>> SCSI device type, the WWN? Of course you also need the mode of the file
>> descriptor in order to allow SANITIZE ERASE if the disk is opened for write.
>
> The basic uid/gid of the task is certainly available. For storage
> stack specific things, it's a matter of what we make available to the
> eBPF function. For example, there is an experimental netfilter
> replacement which uses eBPF; obviously that requires making the packet
> which is being inspecting so it can be given a thumbs up or thumbs
> down result. That's going to require letting the eBPF function to
> have access to the network header, access to the connection tracking
> tables, etc.

Yeah, and there are even already helpers such as
bpf_get_current_uid_gid. So that part can be done in a sort-of generic way.

I can try and do the work, but I'd like some agreement on the design
first... For example a more important question is how would the BPF
filter be attached? Two possibilities that come to mind are:

- add it to the /dev/sg* or /dev/sd* struct file(*) via a ioctl, and use
pass the file descriptor to the unprivileged QEMU after setting up the
BPF filter, via either fork() or SCM_RIGHTS. This would be a very nice
model for privilege separation, but I'm afraid it would not work for
your use case

- add BPF programs to cgroups, in the form of a new
BPF_PROG_TYPE_CGROUP_CDB_FILTER or something like that. That would also
work for my usecase, and it seems to be in line with how the network
guys are doing things. So it would seem like the way to go.

Some other details... Registering the first cgroup-based filter would
disable the default filter; if multiple filters are attached, the
outcomes of all filters would be AND-ed, also similarly to how socket
and sockaddr cgroup BPF works. Finally, filters would be applied also
to processes with CAP_SYS_RAWIO, unlike the current filter.

Needless to say, this would not add special case code, but it would
still add a substantial amount of code, probably comparable to this series.

Christoph?

Paolo

(*) that's not immediate for /dev/sd*, because it would require using
the block device file's private_data; that's not possible yet via struct
block_device_operations, but as far as I can tell block devices
themselves do not need the private_data, so it is at least doable.