Re: device namespaces

From: Hannes Reinecke
Date: Tue Jun 08 2021 - 10:10:12 EST


On Tue, Jun 08, 2021 Greg-KH wrote:
> On Tue, Jun 08, 2021 at 02:30:50PM +0200, Christian Brauner wrote:
>> On Tue, Jun 08, 2021 at 11:38:16AM +0200, Enrico Weigelt,
>> metux IT consult wrote:
>>> Hello folks,
>>>
>>>
>>> I'm going to implement device namespaces, where containers can get
>>> an entirely different view of the devices in the machine (usually
>>> just a specific subset, but possibly additional virtual devices).
>>>
[ .. ]
>>> Is this a good way to go ? Or what would be a better one ?
>>
>> Ccing Greg. Without adressing specific problems, I should warn you
>> that this idea is not new and the plan is unlikely to go anywhere.
>> Especially not without support from Greg.
>
> Hah, yeah, this is a non-starter.
>
> Enrico, what real problem are you trying to solve by doing this? And
> have you tried anything with this yet? We almost never talk about
> "proposals" without seeing real code as it's pointless to discuss
> things when you haven't even proven that it can work.
>
> So let's see code before even talking about this...
>
> And as Christian points out, you can do this today without any kernel
> changes, so to think you need to modify the kernel means that you
> haven't even tried this at all?
>
Curious, I had been looking into this, too.
And I have to side with Greg and Christian that your proposal should
already be possible today (cf device groups, which curiously has a
near-identical interface to what you proposed).
Also, I think that a generic 'device namespace' is too broad a scope;
some subsystems like net already inherited namespace support, and it
turns out to be not exactly trivial to implement.

What I'm looking at, though, is to implement 'block' namespaces, to
restrict access to _new_ block devices to any give namespace.
Case in point: if a container creates a ramdisk it's questionable
whether other containers should even see it. iSCSI devices are a similar
case; when starting iSCSI devices from containers their use should be
restricted to that container.
And that's not only the device node in /dev, but would also entail sysfs
access, which from my understanding is not modified with the current code.

uevent redirection would help here, but from what I've seen it's only
for net devices; feels a bit awkward to have a network namespace to get
uevents for block devices, but then I'll have to test.
And, of course, that also doesn't change the sysfs layout.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer