Re: [PATCH v2] libceph: tolerate addrvecs with multiple entries of the same type

From: Viacheslav Dubeyko

Date: Thu Jun 04 2026 - 15:29:53 EST


On Thu, 2026-06-04 at 22:02 +0800, Kefu Chai wrote:
> ceph_decode_entity_addrvec() rejects any addrvec containing more than
> one entry that matches the requested msgr type (LEGACY or MSGR2),
> logging "another match of type N in addrvec" and returning -EINVAL.
>
> Some admin tooling (e.g. pveceph mon create from Proxmox VE)
> generates
> addrvecs with multiple same-type entries when public_network lists
> more
> than one CIDR: it picks one local IP per subnet and emits both a v2
> and
> a v1 entry for each IP.  Monmaps shaped this way cause:
>
>   libceph: mon0 (1)10.10.10.15:6789 session established
>   libceph: another match of type 1 in addrvec
>   libceph: problem decoding monmap, -22
>
> No Ceph code uses the extra entries: since Nautilus, the userspace
> messenger (AsyncMessenger) unconditionally picks the first address of
> the requested type and ignores any subsequent matches.
>
> Match that behavior: use the first matching entry and silently skip
> any
> subsequent ones.  This is a compatibility fix for existing
> deployments
> and does not enable dual-stack or multi-subnet address selection.
>
> Link: https://bugzilla.proxmox.com/show_bug.cgi?id=7518
> Signed-off-by: Kefu Chai <k.chai@xxxxxxxxxxx>
> ---
> Changes since v1:
> - Rewrite commit message to frame as compatibility fix; drop dual-
> stack/
>   multi-subnet framing and the two ceph tracker links
> - Simplify comment in ceph_decode_entity_addrvec()
>
> Tested by reproducing the Proxmox BZ 7518 scenario against a vstart
> cluster whose mon addrvec was edited to contain two v1 + two v2
> entries:
>
>     ceph mon set-addrs a \
>         '[v2:$ip1:$p2/0,v1:$ip1:$p1/0,v2:$ip2:$p2/0,v1:$ip2:$p1/0]'
>
> A Debian VM booted with the patched kernel via 'qemu -kernel' then
> ran 'mount -t ceph ...:$p1:/ /mnt -o name=admin'.  Pre-patch kernels
> fail at monmap decode with "another match of type 1 in addrvec"
> (-EINVAL).  Post-patch, decode succeeds and the mount proceeds to
> the auth / MDS-discovery stages.
>
> Also verified the decoder logic on the monmap.bin attached to BZ 7518
> using a userspace port of ceph_decode_entity_addrvec(): the pre-patch
> form returns -EINVAL on both msgr1 and msgr2 lookups; the post-patch
> form returns 0 and picks the first matching entry.
>
>  net/ceph/decode.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/net/ceph/decode.c b/net/ceph/decode.c
> index bc109a1a4616..18f0a7c71950 100644
> --- a/net/ceph/decode.c
> +++ b/net/ceph/decode.c
> @@ -87,8 +87,9 @@ ceph_decode_entity_addr(void **p, void *end, struct
> ceph_entity_addr *addr)
>  EXPORT_SYMBOL(ceph_decode_entity_addr);
>  
>  /*
> - * Return addr of desired type (MSGR2 or LEGACY) or error.
> - * Make sure there is only one match.
> + * Return addr of desired type (MSGR2 or LEGACY) or error.  If
> multiple
> + * entries of the desired type are present, use the first one for
> + * compatibility with existing deployments.
>   *
>   * Assume encoding with MSG_ADDR2.
>   */
> @@ -120,13 +121,7 @@ int ceph_decode_entity_addrvec(void **p, void
> *end, bool msgr2,
>   return ret;
>  
>   dout("%s i %d addr %s\n", __func__, i,
> ceph_pr_addr(&tmp_addr));
> - if (tmp_addr.type == my_type) {
> - if (found) {
> - pr_err("another match of type %d in
> addrvec\n",
> -        le32_to_cpu(my_type));

Maybe, we still need to have some debugging output here? What's about
dout()?

Thanks,
Slava.

> - return -EINVAL;
> - }
> -
> + if (tmp_addr.type == my_type && !found) {
>   memcpy(addr, &tmp_addr, sizeof(*addr));
>   found = true;
>   }
> --
> 2.47.3