Re: [GIT PULL] remove ksys_mount() and ksys_dup()

From: Al Viro
Date: Tue Dec 17 2019 - 18:23:50 EST


On Tue, Dec 17, 2019 at 10:57:43PM +0000, Al Viro wrote:
> On Tue, Dec 17, 2019 at 02:21:03PM -0800, Jesse Barnes wrote:
> > > and yes,that particular problem only triggers when you have some odd
> > > root filesystem without a /dev/console. Or a kernel config that
> > > doesn't have those devices enabled at all.
> > >
> > > I delayed pulling it for a couple of days, but the branch was not in
> > > linux-next, so my delay didn't make any difference, and all these
> > > things only became obvious after I pulled. And while it was all
> > > horribly buggy, it was only buggy for the "these cases don't happen in
> > > a normal distro" case, so the regular use didn't show them.
> > >
> > > My bad. I shouldn't have pulled this, but it all looked very obvious
> > > and trivial.
> >
> > Oh I should have caught that too, I was looking right at it...
> >
> > But anyway it looks like a nice cleanup with a few more fixes.
> > Hopefully we can get there soon...
>
> FWIW, this is precisely what I'd been talking about[*] - instead of
> a plain "we are reusing the damn syscall, with fixed interface and
> debugged by userland all the time" we'd got an open-coded analogue
> that will be a headache (and a source of bitrot) for years.
>
> It's not a normal part of the kernel, and I bloody well remember
> what kind of headache it had been before it got massaged to use
> of plain syscalls. Constant need to remember that a change in
> VFS guts might break something in the code that is hell to
> debug - getting test coverage for it is not fun at all. As we
> are seeing right now...
>
> Seriously, these parts of init/* ought to be treated as userland code
> that runs in kernel mode mostly because it's too much PITA to arrange
> building a static ELF binary and linking it into the image.
>
>
> [*] "IMO it's not a good idea. Exposing the guts of fs/namespace.c to
> what's essentially a userland code that happens to run in kernel thread
> is asking for trouble - we'd been there and it had been hell to untangle."
>
> My fault, I guess - should've been more specific than that ;-/

PS: please, don't take that kind of stuff any further; right now all that
thing does is marshalling the arguments. At that level it's just going
to be a headache while debugging that code. Take it further (e.g.
play with calling do_move_mount() et.al. instead of using MS_MOVE) and
the headache will be ongoing, not just one-time. "Just use ksys_...()
in init/*.c" prevented that kind of stuff; now that this policy no
longer holds, we'd better watch out for trouble in that area.

To quote the original patchset, "instead of pretending to be userspace,
... can be implemented using using in-kernel functions" and exact same
rationale would lead to a lot of trouble. That's what I'm really
worried about; let's not go there.