Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

From: Ira Weiny
Date: Fri Jun 14 2024 - 18:38:37 EST


Zhijian Li (Fujitsu) wrote:
>
>
> On 07/06/2024 00:49, Ira Weiny wrote:
> > Li Zhijian wrote:
> >> Don't allocate devs again when it's valid pointer which has pionted to
> >> the memory allocated above with size (count + 2 * sizeof(dev)).
> >>
> >> A kmemleak reports:
> >> unreferenced object 0xffff88800dda1980 (size 16):
> >> comm "kworker/u10:5", pid 69, jiffies 4294671781
> >> hex dump (first 16 bytes):
> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> >> backtrace (crc 0):
> >> [<00000000c5dea560>] __kmalloc+0x32c/0x470
> >> [<000000009ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 [libnvdimm]
> >> [<000000000e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> >> [<000000007b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> >> [<00000000a5f3da2e>] really_probe+0xc6/0x390
> >> [<00000000129e2a69>] __driver_probe_device+0x78/0x150
> >> [<000000002dfed28b>] driver_probe_device+0x1e/0x90
> >> [<00000000e7048de2>] __device_attach_driver+0x85/0x110
> >> [<0000000032dca295>] bus_for_each_drv+0x85/0xe0
> >> [<00000000391c5a7d>] __device_attach+0xbe/0x1e0
> >> [<0000000026dabec0>] bus_probe_device+0x94/0xb0
> >> [<00000000c590d936>] device_add+0x656/0x870
> >> [<000000003d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> >> [<000000003f4c52a4>] async_run_entry_fn+0x2e/0x110
> >> [<00000000e201f4b0>] process_one_work+0x1ee/0x600
> >> [<000000006d90d5a9>] worker_thread+0x183/0x350
> >>
> >> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
> >> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
> >> ---
> >> drivers/nvdimm/namespace_devs.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> >> index d6d558f94d6b..56b016dbe307 100644
> >> --- a/drivers/nvdimm/namespace_devs.c
> >> +++ b/drivers/nvdimm/namespace_devs.c
> >> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region *nd_region)
> >> /* Publish a zero-sized namespace for userspace to configure. */
> >> nd_mapping_free_labels(nd_mapping);
> >>
> >> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> >> + /* devs probably has been allocated */
> >
> > I don't think this is where the bug is. The loop above is processing the
> > known labels and should exit with a count > 0 if devs is not NULL.
> >
> > From what I can tell create_namespace_pmem() must be returning EAGAIN
> > which leaves devs allocated but fails to increment count. Thus there are
> > no valid labels but devs was not free'ed.
>
> Per the piece of the code, return EAGAIN and ENODEV could cause this issue in theory.
>
> 1980 dev = create_namespace_pmem(nd_region, nd_mapping, nd_label);
> 1981 if (IS_ERR(dev)) {
> 1982 switch (PTR_ERR(dev)) {
> 1983 case -EAGAIN:
> 1984 /* skip invalid labels */
> 1985 continue;
> 1986 case -ENODEV:
> 1987 /* fallthrough to seed creation */
> 1988 break;
> 1989 default:
> 1990 goto err;
> 1991 }
> 1992 } else
> 1993 devs[count++] = dev;
>
>
> >
> > Can you trace the error you are seeing a bit more to see if this is the
> > case?
>
>
> I just tried, but I cannot reproduce this leaking again.
> When it happened(100% reproduce at that time), I probably had a corrupted LSA(I see empty
> output with command 'ndctl list'). It seemed the QEMU emulated Nvdimm device was broken
> for some reasons.

I agree that it was probably a corrupted LSA. But that is where we need to fix
the bug.

The ENODEV will no longer be returned from create_namespace_pmem() AFAICS.
Which is why I pointed to the EAGAIN case. This could also be another
cleanup as shown in[1].

But to clean this up completely one must account for the case that some labels
may be ok with a final label being found corrupted. So the allocation of the
array should only occur when at least 1 valid label is found.

So combining these ideas I think the fix is as show in [2]. Could this case be
added as a test? And then the patch checked out as a fix?

Ira


[1] because select_pmem_id() is always called with a valid pmem_id the ENODEV
case can never happen. So From what I can see removing that error case thusly
is ok.

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index d6d558f94d6b..7069e7267a7d 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1612,9 +1612,6 @@ static int select_pmem_id(struct nd_region *nd_region, const uuid_t *pmem_id)
{
int i;

- if (!pmem_id)
- return -ENODEV;
-
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
@@ -1790,9 +1787,6 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
case -EINVAL:
dev_dbg(&nd_region->dev, "invalid label(s)\n");
break;
- case -ENODEV:
- dev_dbg(&nd_region->dev, "label not found\n");
- break;
default:
dev_dbg(&nd_region->dev, "unexpected err: %d\n", rc);
break;
@@ -1974,9 +1968,6 @@ static struct device **scan_labels(struct nd_region *nd_region)
case -EAGAIN:
/* skip invalid labels */
continue;
- case -ENODEV:
- /* fallthrough to seed creation */
- break;
default:
goto err;
}


[2] Fix, compile tested only.

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index d6d558f94d6b..6401ebee3db2 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1612,9 +1612,6 @@ static int select_pmem_id(struct nd_region *nd_region, const uuid_t *pmem_id)
{
int i;

- if (!pmem_id)
- return -ENODEV;
-
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
@@ -1790,9 +1787,6 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
case -EINVAL:
dev_dbg(&nd_region->dev, "invalid label(s)\n");
break;
- case -ENODEV:
- dev_dbg(&nd_region->dev, "label not found\n");
- break;
default:
dev_dbg(&nd_region->dev, "unexpected err: %d\n", rc);
break;
@@ -1961,12 +1955,6 @@ static struct device **scan_labels(struct nd_region *nd_region)
goto err;
if (i < count)
continue;
- __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
- if (!__devs)
- goto err;
- memcpy(__devs, devs, sizeof(dev) * count);
- kfree(devs);
- devs = __devs;

dev = create_namespace_pmem(nd_region, nd_mapping, nd_label);
if (IS_ERR(dev)) {
@@ -1974,15 +1962,18 @@ static struct device **scan_labels(struct nd_region *nd_region)
case -EAGAIN:
/* skip invalid labels */
continue;
- case -ENODEV:
- /* fallthrough to seed creation */
- break;
default:
goto err;
}
- } else
- devs[count++] = dev;
+ }

+ __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
+ if (!__devs)
+ goto err;
+ memcpy(__devs, devs, sizeof(dev) * count);
+ kfree(devs);
+ devs = __devs;
+ devs[count++] = dev;
}

dev_dbg(&nd_region->dev, "discovered %d namespace%s\n", count,