Re: [EXTERNAL] Re: [PATCH rdma-next v2] RDMA/mana_ib: hardening: Clamp adapter capability values from MANA_IB_GET_ADAPTER_CAP
From: Leon Romanovsky
Date: Sun Mar 22 2026 - 14:50:48 EST
On Sat, Mar 21, 2026 at 12:56:39AM +0000, Long Li wrote:
> -next v2] RDMA/mana_ib: hardening:
> > Clamp adapter capability values from MANA_IB_GET_ADAPTER_CAP
> >
> > On Mon, Mar 16, 2026 at 08:50:39PM +0000, Long Li wrote:
> > > > On Thu, Mar 12, 2026 at 11:16:41AM -0700, Erni Sri Satya Vennela wrote:
> > > > > As part of MANA hardening for CVM, clamp hardware-reported adapter
> > > > > capability values from the MANA_IB_GET_ADAPTER_CAP response before
> > > > > they are used by the IB subsystem.
> > > > >
> > > > > The response fields (max_qp_count, max_cq_count, max_mr_count,
> > > > > max_pd_count, max_inbound_read_limit, max_outbound_read_limit,
> > > > > max_qp_wr, max_send_sge_count, max_recv_sge_count) are u32 but are
> > > > > assigned to signed int members in struct ib_device_attr. If
> > > > > hardware returns a value exceeding INT_MAX, the implicit
> > > > > u32-to-int conversion produces a negative value, which can cause
> > > > > incorrect behavior in the IB core and userspace applications.
> > > >
> > > > This sentence does not make sense in the context of the Linux kernel.
> > > > The fundamental assumption is that the underlying hardware behaves
> > > > correctly, and driver code should not attempt to guard against
> > > > purely hypothetical failures. The kernel only implements such
> > > > self‑protection when there is a documented hardware issue accompanied by
> > official errata.
> > > >
> > > > Thanks
> > >
> > > The idea is that a malicious hardware can't corrupt and steal other data from
> > the kernel.
> > >
> > > The assumption is that in a public cloud environment, you can't trust the
> > hardware 100%.
> >
> > You cannot separate functionality and claim that one line of code is trusted while
> > another is not.
> >
> > Thanks
>
> How we rephrase this in this way: the driver should not corrupt or overflow other parts of the kernel if its device is misbehaving (or has a bug).
It shouldn't be theoretical claim, do you have errata?
Thanks
>
> Long