Re: [PATCH v1 13/13] libceph: force host network namespace for kernel CephFS mounts
From: Ionut Nechita (Wind River)
Date: Fri Apr 03 2026 - 11:13:04 EST
Hi Ilya,
I've identified the root cause. You were right -- this is an
orchestration issue, not a kernel bug.
The problem is caused by the Rook "holder pod" mechanism in
Rook v1.13.7 (used in our older release). Here is the full picture:
In Rook v1.13.7, when Multus is present or CSI_ENABLE_HOST_NETWORK
is false, Rook deploys a "csi-cephfsplugin-holder" DaemonSet. This
holder pod does NOT have hostNetwork: true -- it runs in a Calico
pod network namespace. Its purpose is to expose its network namespace
via a symlink:
ln -s /proc/$$/ns/net /var/lib/kubelet/plugins/<driver>/<ns>.net.ns
Ceph-CSI then uses this network namespace when performing kernel
mounts. The holder pod template even has a comment:
"This pod is not expected to be updated nor restarted unless
the node reboots."
And uses updateStrategy: OnDelete to prevent rolling updates.
The condition for enabling holder pods (controller.go:206):
holderEnabled := !csiHostNetworkEnabled || cluster.Spec.Network.IsMultus()
Our cluster uses Calico + Multus, so holderEnabled is always true
regardless of CSI_ENABLE_HOST_NETWORK.
During the upgrade from Rook v1.13.7 to v1.16.6, the new Rook
version sets holderEnabled = false unconditionally and deletes the
holder DaemonSets. When the holder pod is deleted, Calico tears
down the veth interfaces in its network namespace. The kernel ceph
client still holds a reference to that namespace, but it no longer
has any network interfaces or routes, resulting in permanent
EADDRNOTAVAIL (-99).
Evidence from the live reproduction:
Kernel ceph client status:
instance: client.74244 (3)[dead:beef::a2bf:c94c:345d:bc6f]:0
The holder pod on compute-0 had the same address:
csi-cephfsplugin-holder-rook-ceph-dpnbl dead:beef::a2bf:c94c:345d:bc6f
After upgrade, the address ...bc6f is not present in any active
CNI namespace -- the holder pod was deleted and Calico cleaned up
the veth.
dmesg shows the session was initially established successfully
(at boot time, from the holder pod namespace), then lost when
the holder pod was destroyed during upgrade:
[ 204.515008] libceph: mon0 session established
[ 959.829581] libceph: mon0 session lost, hunting for new mon
[ 959.829698] libceph: connect error -99 (permanent)
Version details:
Old release (stx.10): Rook v1.13.7, ceph-csi v3.10.2, Ceph v18.2.2
New release (stx.11): Rook v1.16.6, ceph-csi v3.13.1, Ceph v18.2.5
The new release (Rook v1.16.6) eliminates holder pods entirely and
performs kernel mounts directly from the csi-cephfsplugin DaemonSet,
which has hostNetwork: true. After the upgrade completes and the
stale mount is cleared (umount -l + kubelet restart), new mounts
work correctly from the host namespace.
So to summarize: this was not a kernel bug. The kernel ceph client
correctly captured the network namespace of the mounting process
(the holder pod), as designed. The problem was that the orchestration
(Rook upgrade) destroyed the holder pod and its network namespace
while the kernel mount was still active.
I'll drop patch 13 from the series as previously agreed. Thank you
for pushing me to investigate this properly.
I've also updated the Ceph tracker:
https://tracker.ceph.com/issues/74897
Thanks,
Ionut