Re: [PATCH v2 00/10] PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
From: Koichiro Den
Date: Fri Mar 20 2026 - 11:06:01 EST
On Fri, Feb 27, 2026 at 05:57:40PM +0900, Koichiro Den wrote:
> On Fri, Feb 27, 2026 at 05:49:45PM +0900, Koichiro Den wrote:
> > This series fixes doorbell bit/vector handling for the EPF-based NTB
> > pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
> > per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
> > without changing the on-the-wire doorbell mapping.
> >
> >
> > Background / problem
> > ====================
> >
> > ntb_hw_epf historically applies an extra offset when ringing peer
> > doorbells: the link event uses the first interrupt slot, and doorbells
> > start from the third slot (i.e. a second slot is effectively unused).
> > pci-epf-vntb carries the matching offset on the EP side as well.
> >
> > As long as db_vector_count()/db_vector_mask() are not implemented, this
> > mismatch is mostly masked. Doorbell events are effectively treated as
> > "can hit any QP" and the off-by-one vector numbering does not surface
> > clearly.
> >
> > However, once per-vector handling is enabled, the current state becomes
> > problematic:
> >
> > - db_valid_mask exposes bits that do not correspond to real doorbells
> > (link/unused slots leak into the mask).
> > - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
> > expects a 0-based db_vector for doorbells.
> > - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
> > it directly calls pci_epc_raise_irq(), which can sleep.
> >
> >
> > Why NOT fix the root offset?
> > ============================
> >
> > The natural "root" fix would be to remove the historical extra offset in
> > the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
> > Unfortunately this would lead to interoperability issues when mixing old
> > and new kernel versions (old/new peers). A new side would ring a
> > different interrupt slot than what an old peer expects, leading to
> > missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
> > are implemented.
> >
> > Therefore this series intentionally keeps the legacy offset, and instead
> > fixes the surrounding pieces so the mapping is documented and handled
> > consistently in masks, vector numbering, and per-vector reporting.
> >
> >
> > What this series does
> > =====================
> >
> > - pci-epf-vntb:
> >
> > - Document the legacy offset.
> > - Defer MSI doorbell raises to process context to avoid sleeping in
> > atomic context. This becomes relevant once multiple doorbells are
> > raised concurrently at a high rate.
> > - Report doorbell vectors as 0-based to ntb_db_event().
> > - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().
> >
> > - ntb_hw_epf:
> >
> > - Document the legacy offset in ntb_epf_peer_db_set().
> > - Fix db_valid_mask to cover only real doorbell bits.
> > - Report 0-based db_vector to ntb_db_event() (accounting for the
> > unused slot).
> > - Keep db_val as a bitmask and fix db_read/db_clear semantics
> > accordingly.
> > - Implement db_vector_count()/db_vector_mask().
> >
> >
> > Compatibility
> > =============
> >
> > By keeping the legacy offset intact, this series aims to remain
> > compatible across mixed kernel versions. The observable changes are
> > limited to correct mask/vector reporting and safer execution context
> > handling.
> >
> > Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
> > applied separately for each tree. I am sending them together in this
> > series to provide the full context and to make the cross-subsystem
> > compatibility constraints explicit. Ideally the whole series would be
> > applied in a single tree, but each subset is safe to merge on its own.
> >
> > - Patch 1-5 can apply cleanly onto pci/endpoint latest:
> > f6797680fe31 ("PCI: epf-mhi: Return 0 on success instead of positive
> > jiffies from pci_epf_mhi_edma_{read/write}")
> >
> > - Patch 6-10 can apply cleanly onto ntb-next latest:
> > 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link disable
> > path")
> >
> > Note: I don't have a suitable hardware to test ntb_hw_epf + pci-epf-ntb
> > (not vNTB) bridge scenario, but I believe no changes are needed in
> > pci-epf-ntb.c.
> >
> >
> > Changelog
> > =========
> >
> > Changes since v1:
> > - Addressed feedback from Dave (add a source code comment, introduce
> > enum to eliminate magic numbers)
> > - Updated source code comment in Patch 2.
> > - No functional changes, so retained Reviewed-by tags by Frank and Dave.
> > Thank you both for the review.
>
> Sorry, I accidentally used an incorrect series title.
> The correct subject should be:
>
> [PATCH v2 00/10] NTB: epf: Enable per-doorbell bit handling while keeping legacy offset
>
> For reference, v1 is:
> https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@xxxxxxxxxxxxx/
>
> Best regards,
> Koichiro
Hi Mani (cc: Jon, Dave),
This series has been sitting for a while, so I'd like to check how to proceed.
I'm thinking of the following approach:
- get the remaining acks from the NTB side
(Dave already gave Reviewed-by for Patch 6/10)
- then route the whole series via the PCI EP tree
Does that sound reasonable?
If so, I can prepare a v3 rebased onto the latest pci/endpoint.
Best regards,
Koichiro