Re: [Ping^3] Re: [PATCH] sg_io: allow UNMAP and WRITE SAME withoutCAP_SYS_RAWIO

From: Paolo Bonzini
Date: Thu Sep 06 2012 - 08:37:03 EST


Il 06/09/2012 14:08, Ric Wheeler ha scritto:
>> According to the standard, the translation layer can write a
>> user-provided pattern to every sector in the disk. It's an optional
>> feature and libata doesn't do that, but it is still possible.
>
> It is not possible today with our stack though, any patch that would
> change that would also need to be vetted.

It is not possible with SATA disks, but native SCSI disks might well
interpret FORMAT UNIT destructively.

>>> I don't see allowing anyone who can open the device to zero the data as
>>> better though :)
>> Note: anyone who can open it for writing! And they can just as well
>> issue WRITE, it just takes a little more effort than with WRITE SAME. :)
>> If you only have read access, you cannot issue WRITE or FORMAT UNIT,
>> and with this patch you will not be able to issue WRITE SAME.
>
> This just seems like an argument over whether or not capabilities make
> sense. In general, anything as destructive as a single CDB that can kill
> all of your data should be tightly controlled.

In practice, a single write to the first MB of the disk is just as
destructive. For that you do not even need a SCSI command.

> Pushing more code in the data path is not where we are going - we
> routinely need to disable IO scheduling for example when driving IO to
> high speed/low latency devices and are actively looking at how to tackle
> other performance bottlenecks in the stack.

I am not talking about the regular data path, only of SG_IO.

> I don't see a strong reason that our existing scheme (root or
> CAP_SYS_RAWIO access) prevents you from doing what you need to do.

Here are three:

- CAP_SYS_RAWIO partly bypasses DAC; you can issue destructive commands
even if you only opened the disk for reading. CAP_SYS_RAWIO also gives
access to _really_ destructive commands (WRITE BUFFER and PERSISTENT
RESERVE OUT for example).

- CAP_SYS_RAWIO lets you send SCSI commands to partitions, and they will
gladly read/write the disk going outside the boundaries of the
partition. Changing this behavior was rejected upstream already.

- CAP_SYS_RAWIO also gives access to I/O ports, mmap at address 0, and
too many other insecure things.

All the above mean that:

- any application using CAP_SYS_RAWIO would have to implement its own
whitelisting, even if just to duplicate what is done in the kernel;

- exploiting a CAP_SYS_RAWIO process leads to root too easily, and it is
not possible to give the capability to anything that will run in a
hostile environment (in my case QEMU).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/