Re: [PATCH v4 00/12] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
From: Koichiro Den
Date: Tue May 26 2026 - 12:02:14 EST
On Tue, May 26, 2026 at 07:45:11AM -0700, Dave Jiang wrote:
>
>
> On 5/20/26 11:29 PM, Koichiro Den wrote:
> > On Tue, May 19, 2026 at 02:29:51PM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 5/12/26 7:49 PM, Koichiro Den wrote:
> >>> This series fixes doorbell bit/vector handling for the EPF-based NTB
> >>> pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
> >>> per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
> >>> without changing the on-the-wire doorbell mapping.
> >>>
> >>>
> >>> Background / problem
> >>> ====================
> >>>
> >>> ntb_hw_epf historically applies an extra offset when ringing peer
> >>> doorbells: the link event uses the first interrupt slot, and doorbells
> >>> start from the third slot (i.e. a second slot is effectively unused).
> >>> pci-epf-vntb carries the matching offset on the EP side as well.
> >>>
> >>> As long as db_vector_count()/db_vector_mask() are not implemented, this
> >>> mismatch is mostly masked. Doorbell events are effectively treated as
> >>> "can hit any QP" and the off-by-one vector numbering does not surface
> >>> clearly.
> >>>
> >>> However, once per-vector handling is enabled, the current state becomes
> >>> problematic:
> >>>
> >>> - db_valid_mask exposes bits that do not correspond to real doorbells
> >>> (link/unused slots leak into the mask).
> >>> - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
> >>> expects a 0-based db_vector for doorbells.
> >>> - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
> >>> it directly calls pci_epc_raise_irq(), which can sleep.
> >>>
> >>>
> >>> Why NOT fix the root offset?
> >>> ============================
> >>>
> >>> The natural "root" fix would be to remove the historical extra offset in
> >>> the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
> >>> Unfortunately this would lead to interoperability issues when mixing old
> >>> and new kernel versions (old/new peers). A new side would ring a
> >>> different interrupt slot than what an old peer expects, leading to
> >>> missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
> >>> are implemented.
> >>>
> >>> Therefore this series intentionally keeps the legacy offset, and instead
> >>> fixes the surrounding pieces so the mapping is documented and handled
> >>> consistently in masks, vector numbering, and per-vector reporting.
> >>>
> >>>
> >>> Dependencies
> >>> ============
> >>>
> >>> This v4 is prepared on top of the current pci/endpoint:
> >>> commit f5aa97125491 ("misc: pci_endpoint_test: remove dead BAR read before
> >>> doorbell trigger")
> >>
> >> I can't seem to apply the patch set via 'b4 shazam' on top of f5aa97125491. Trying to set it up for review.
> >
> > Hi Dave,
> >
> >>
> >> DJ
> >>
> >>>
> >>> plus these patches as contextual prerequisites:
> >>>
> >>> - [PATCH 0/2] NTB: epf: Fix ntb_hw_epf ISR issues
> >>> https://lore.kernel.org/ntb/20260304083028.1391068-1-den@xxxxxxxxxxxxx/
> >>>
> >>> - [PATCH 0/2] PCI: endpoint: pci-epf-{v}ntb: A couple of fixes
> >>> https://lore.kernel.org/linux-pci/20260407124421.282766-1-mani@xxxxxxxxxx/
> >>>
> >>> (These touch the same code part as this series and would otherwise conflict
> >>> with it. They should land upstream first in my opinion.)
> >
> > This series depends on the two prerequisite series listed above.
>
> They don't apply cleanly either. Do you have a public repo somewhere that has everything applied? That may be the easiest way to do things if a series has complicated dependencies.
I rechecked this with a fresh worktree based off of f5aa97125491:
git worktree add /tmp/f5aa97125491 f5aa97125491
cd /tmp/f5aa97125491
# 1st dependency
b4 am 20260304083028.1391068-1-den@xxxxxxxxxxxxx
git am ./20260304_den_ntb_epf_fix_ntb_hw_epf_isr_issues.mbx
# 2nd dependency
b4 am 20260407124421.282766-1-mani@xxxxxxxxxx
git am ./20260407_manivannan_sadhasivam_pci_endpoint_pci_epf_v_ntb_a_couple_of_fixes.mbx
# this v4 series
b4 am 20260513024923.451765-1-den@xxxxxxxxxxxxx
git am ./v4_20260513_den_pci_endpoint_pci_epf_vntb_ntb_epf_enable_per_doorbell_bit_handling.mbx
And this gives me the following commit log:
git log --reverse --oneline f5aa97125491..
# 1d1465542cc0 NTB: epf: Fix request_irq() unwind in ntb_epf_init_isr()
# 94bd43a9e435 NTB: epf: Avoid pci_irq_vector() from hardirq context
# 3cab4db0f647 PCI: endpoint: pci-epf-vntb: Add check to detect 'db_count' value of 0
# 5449f10512ae PCI: endpoint: pci-epf-ntb: Add check to detect 'db_count' value of 0
# e3dbdcceb2fd PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
# e1c167a58888 PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
# 6c54fd0c9c9c PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event()
# 28a30b79ff03 PCI: endpoint: pci-epf-vntb: Reject unusable doorbell counts
# cb5fbd37bdc6 PCI: endpoint: pci-epf-vntb: Guard configfs writes after EPC attach
# 9ed0f4db85b4 PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
# 575bd3aa530b PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for doorbells
# 41f26aa279af NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set()
# 6b2e2f80a61a NTB: epf: Make db_valid_mask cover only real doorbell bits
# 44d1bb070521 NTB: epf: Report 0-based doorbell vector via ntb_db_event()
# 4e4d37b4c2a8 NTB: epf: Fix doorbell bitmask and IRQ vector handling
# 1e5829682361 (HEAD) NTB: epf: Implement db_vector_count/mask for doorbells
I don't have a public repo at the moment, sorry. Could you try the above? If it
still fails, please let me know.
Best regards,
Koichiro
>
> DJ
>
> >
> > Best regards,
> > Koichiro
> >
> >>>
> >>>
> >>> Practical benefit
> >>> =================
> >>>
> >>> This also enables ntb_netdev's multi-queue performance scale:
> >>>
> >>> [PATCH v3 0/4] net: ntb_netdev: Add Multi-queue support
> >>> https://lore.kernel.org/netdev/20260305155639.1885517-1-den@xxxxxxxxxxxxx/
> >>>
> >>> Each netdev queue maps to an ntb_transport queue pair. With correct
> >>> db_vector_count() and db_vector_mask() callbacks, ntb_transport can wake
> >>> only the queue pairs covered by the doorbell vector it received. Without
> >>> those callbacks, the NTB core falls back to the full valid doorbell mask,
> >>> so each doorbell interrupt can still wake all queue pairs. See "Testing /
> >>> Results" section in the above link.
> >>>
> >>>
> >>> What this series does
> >>> =====================
> >>>
> >>> - pci-epf-vntb:
> >>>
> >>> - Document the legacy offset.
> >>> - Defer MSI doorbell raises to process context to avoid sleeping in
> >>> atomic context. This becomes relevant once multiple doorbells are
> >>> raised concurrently at a high rate.
> >>> - Report doorbell vectors as 0-based to ntb_db_event().
> >>> - Reject unusable doorbell counts and guard configfs writes after EPC
> >>> attach with -EOPNOTSUPP.
> >>> - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().
> >>>
> >>> - ntb_hw_epf:
> >>>
> >>> - Document the legacy offset in ntb_epf_peer_db_set().
> >>> - Fix db_valid_mask to cover only real doorbell bits.
> >>> - Report 0-based db_vector to ntb_db_event() (accounting for the
> >>> unused slot).
> >>> - Pass per-vector context as request_irq() dev_id instead of deriving the
> >>> device vector from Linux IRQ numbers.
> >>> - Keep db_val as a bitmask and fix db_read/db_clear semantics
> >>> accordingly.
> >>> - Implement db_vector_count()/db_vector_mask().
> >>>
> >>>
> >>> Compatibility
> >>> =============
> >>>
> >>> By keeping the legacy offset intact, this series aims to remain
> >>> compatible across mixed kernel versions. The observable changes are
> >>> limited to correct mask/vector reporting and safer execution context
> >>> handling.
> >>>
> >>> Patches 1-7 (PCI Endpoint) and 8-12 (NTB) are independent and can be
> >>> applied separately through the respective trees. They are sent together so
> >>> the two sides of the same legacy layout do not drift apart, and the
> >>> resulting code stays easier to follow.
> >>>
> >>> The plan is still to take the whole series through the PCI EP tree once the
> >>> remaining NTB acks are collected. See:
> >>> https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/
> >>>
> >>>
> >>> Note
> >>> ====
> >>>
> >>> Most v3->v4 changes are responses to Sashiko feedback:
> >>> https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp
> >>>
> >>> Sashiko also flagged a broader driver-unbind lifetime concern around the
> >>> vNTB PCI driver. That is real, but it predates this series and needs a
> >>> proper .remove() path, including NTB unregistration and work cancellation.
> >>> I am keeping it as follow-up lifecycle work:
> >>>
> >>> https://lore.kernel.org/linux-pci/20260226084142.2226875-1-den@xxxxxxxxxxxxx/
> >>>
> >>> This v4 only clears the deferred doorbell state added by this series around
> >>> EPC init/cleanup.
> >>>
> >>>
> >>> ---
> >>> Changelog
> >>> =========
> >>>
> >>> Changes since v3:
> >>> - Rebased on the dependencies listed above.
> >>> - For Sashiko feedback on state added by this series: clear
> >>> peer_db_pending around work enable/disable, mask peer_db_set() with
> >>> db_valid_mask, and use __ffs64().
> >>> - For pre-existing issues made more relevant by per-vector semantics:
> >>> stop deriving ntb_hw_epf vectors from Linux IRQ numbers, pass
> >>> per-vector request_irq() context, ignore the reserved slot, and
> >>> validate vectors before BIT_ULL()/ntb_db_event().
> >>> - Added adjacent hardening: pci-epf-vntb db_count range/configfs guards,
> >>> bounded db_vector masks, ntb_hw_epf mw_count precheck, and helper guard
> >>> cleanups.
> >>> - Dropped Reviewed-by tags from patches that changed to non-trivial extent.
> >>>
> >>> Changes since v2:
> >>> - No functional changes.
> >>> - Rebased onto current pci/endpoint
> >>> e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
> >>> * Patch 2 needed a trivial context-only adjustment while rebasing, due
> >>> to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
> >>> cmd_handler work in epf_ntb_epc_cleanup").
> >>> - Picked up additional Reviewed-by tags from Frank.
> >>> - Fixed the incorrect v2 series title.
> >>>
> >>> Changes since v1:
> >>> - Addressed feedback from Dave (add a source code comment, introduce
> >>> enum to eliminate magic numbers)
> >>> - Updated source code comment in Patch 2.
> >>> - No functional changes, so retained Reviewed-by tags by Frank and Dave.
> >>> Thank you both for the review.
> >>>
> >>> v3: https://lore.kernel.org/linux-pci/20260323031544.2598111-1-den@xxxxxxxxxxxxx/
> >>> v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@xxxxxxxxxxxxx/
> >>> v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@xxxxxxxxxxxxx/
> >>>
> >>>
> >>> Best regards,
> >>> Koichiro
> >>>
> >>>
> >>> Koichiro Den (12):
> >>> PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
> >>> PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
> >>> context
> >>> PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
> >>> ntb_db_event()
> >>> PCI: endpoint: pci-epf-vntb: Reject unusable doorbell counts
> >>> PCI: endpoint: pci-epf-vntb: Guard configfs writes after EPC attach
> >>> PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
> >>> PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
> >>> doorbells
> >>> NTB: epf: Document legacy doorbell slot offset in
> >>> ntb_epf_peer_db_set()
> >>> NTB: epf: Make db_valid_mask cover only real doorbell bits
> >>> NTB: epf: Report 0-based doorbell vector via ntb_db_event()
> >>> NTB: epf: Fix doorbell bitmask and IRQ vector handling
> >>> NTB: epf: Implement db_vector_count/mask for doorbells
> >>>
> >>> drivers/ntb/hw/epf/ntb_hw_epf.c | 133 ++++++++---
> >>> drivers/pci/endpoint/functions/pci-epf-vntb.c | 210 ++++++++++++++++--
> >>> 2 files changed, 296 insertions(+), 47 deletions(-)
> >>>
> >>
> >
>