RE: [PATCH 0/2] nvme: Add kernel API for admin command

From: Baldyga, Robert
Date: Mon Sep 16 2019 - 08:13:29 EST


On Mon, Sep 16, 2019 at 09:34:55AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 16, 2019 at 07:16:52AM +0000, Baldyga, Robert wrote:
> > On Fri, Sep 13, 2019 at 04:37:09PM +0200, Christoph Hellwig wrote:
> > > On Fri, Sep 13, 2019 at 01:16:08PM +0200, Robert Baldyga wrote:
> > > > Hello,
> > > >
> > > > This patchset adds two functions providing kernel to kernel API
> > > > for submiting NVMe admin commands. This is for use of NVMe-aware
> > > > block device drivers stacking on top of NVMe drives. An example of
> > > > such driver is Open CAS Linux [1] which uses NVMe extended LBA
> > > > formats and thus needs to issue commands like nvme_admin_identify.
> > >
> > > We never add functionality for out of tree crap. And this shit really
> > > is a bunch of crap, so it is unlikely to ever be merged.
> >
> > So that modules which are by design out of tree have to hack around
> > lack of API allowing to use functionality implemented by driver.
> > Don't you think that this is what actually produces crap?
>
> Because you added another badly designed caching layer instead of fixing
> up one of the 2 to 3 (depending on how you count) in the kernel tree.
> And yours is amazingly bad even compared to the not very nice one in the
> tree. It basically spends more efforts on abstracting away abstraction
> that you wouldn't need without another layer of abstractions. And it
> does all that pointlessly tied to NVMe through a bunch of layering
> violations.
>
> >
> > > Why can't intel sometimes actually do something useful for a change
> > > instead of piling junk over junk?
> >
> > Proposed API is equally useful for both in tree and out of tree modules,
> > so I find your comment unrelated.
>
> It is not. We will not let random kernel modules directly issue nvme
> admin command as there is no reason for it. And even if for some weird
> reason we did we'd certainly not do it for out of tree modules.

Ok, fair enough. We want to keep things hidden behind certain layers,
and that's definitely a good thing. But there is a problem with these
layers - they do not expose all the features. For example AFAIK there
is no clear way to use 512+8 format via block layer (nor even a way
to get know that bdev is formatted to particular lbaf). It's impossible
to use it without layering violations, which nobody sees as a perfect
solution, but it is the only one that works.

> > If you don't like the way it's done, we can look for alternatives.
> > The point is to allow other drivers use NVMe admin commands, which is
> > currently not possible as neither the block layer nor the nvme driver
> > provides sufficient API.
>
> And the answer is that we are not going to allow that. And we'd not
> allow other things even if they were not a bad design for out of tree
> modules.

I'm not going to insist on merging changes dedicated for out of tree
modules. I'd be happy to get involved in improving kernel support for
NVMe so that there would be no need for that changes. What I proposed
is a quick fix for the problem that, for sure, can be solved another,
better way.

And this is not about being in tree or out of tree - a lot of out of
tree modules are doing well without any special care from the upstream.
This is about lack of support for certain features. If you have any
proposal how we should solve this problem in a better way than I
proposed in my humble patchset, I will gladly submit new patches.

Best regards,
Robert Baldyga