Re: [PATCH] libceph: accept addrvecs with multiple entries of the same type
From: Ilya Dryomov
Date: Fri Apr 24 2026 - 14:55:29 EST
On Thu, Apr 23, 2026 at 1:44 PM Kefu Chai <k.chai@xxxxxxxxxxx> wrote:
>
> Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>
> > On Thu, Apr 23, 2026 at 12:09 PM Kefu Chai <k.chai@xxxxxxxxxxx> wrote:
> >>
> >> ceph_decode_entity_addrvec() rejects any addrvec containing more than
> >> one entry that matches the requested msgr type (LEGACY or MSGR2),
> >> logging "another match of type N in addrvec" and returning -EINVAL.
> >> This breaks legitimate deployments where a daemon advertises multiple
> >> addresses of the same type, most notably dual-stack (IPv4 + IPv6)
> >> clusters
> >
> > Hi Kefu,
> >
>
> Hi Ilya,
>
> >
> > My understanding is that dual-stack isn't supported in general:
> > https://tracker.ceph.com/issues/65631. The respective references were
> > purged from the documentation with Radoslaw (offline?) ack.
> >
>
> Yeah, you are right. I was overreaching. Dual-stack and
> heterogeneous-subnet clients are not served by multi-entry addrvecs, and
> the patch does not change that.
>
> >
> >> and multi-subnet deployments where tooling picks one address
> >> per listed public_network.
> >
> > Can you elaborate on when such tooling kicks in, what exactly does it
> > do and the use case in general? It's not immediately obvious to me how
> > having two addresses of the same type/stack and simply ignoring the
> > second one is better than insisting on having a just single address.
> >
>
> Sure. The narrow case that remains is compatibility. Admin tooling built
> around public_addrv and ceph mon set-addrs produces addrvecs with more
> than one entry of the same type on the back of that behavior, and the
Hi Kefu,
Sorry for being dense, but can you expand on _why_ the tooling in
question is doing that? With the dual-stack being explicitly not
supported and the userspace messenger unconditionally picking the
first address, what are the use cases for adding additional "dead"
addresses there?
I'm asking because Xie Xingguo's commit [1] mentions dual-stack
(already covered above) and in the ticket that I assume got you
involved [2] it seems like the user was able to implement the setup
they wanted without resorting to adding those "dead" addresses after
all.
> kernel's strict guard rejects the whole monmap. The handshake contains()
> check is the one concrete reason the extra entries need to be listed in
> the addrvec rather than dropped at advertise time.
Can you point me at this check?
>
> >
> >> Match the userspace messenger, which since Nautilus picks the first
> >> entry of the requested type and silently tolerates subsequent entries.
> >
> > Do you have a reference to a specific commit? I'm wondering if it
> > isn't on that "merged more or less accidentally" list.
> >
>
> The pick-first selector in AsyncMessenger::create_connect() landed in
> Sage's commit d1a783a5f733, and Xie Xingguo's commit 50d8c8a3cce3 fixed
> the loop to actually honor the "pick whichever is listed first" comment.
> Both shipped in Nautilus.
>
> Would you be willing to take this as a compatibility fix, with the
> commit message and the comment in ceph_decode_entity_addrvec() rewritten
> to state exactly that and nothing more? If you would rather keep the
> strict check and handle this on the tooling side instead, I am happy to
> withdraw the patch. Either way, thanks for the review.
I'm all for reducing the number of inconsistencies (whether intentional
or unintentional), but I would like to make sure I understand the
background first. I'm worried that by accepting multiple addresses we
just confuse users into thinking that dual-stack is supported and then
they go on attempting to implement "a second subnet", etc. AFAIK those
things aren't tested upstream at all.
[1] https://github.com/ceph/ceph/commit/50d8c8a3cce3bfbfc9be5acfa60bda165d59e2bc
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=7518
Thanks,
Ilya