Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend

From: Sean Nyekjaer
Date: Fri Oct 08 2021 - 06:04:34 EST


On Thu, Oct 07, 2021 at 03:14:26PM +0200, Boris Brezillon wrote:
> On Thu, 7 Oct 2021 14:39:16 +0200
> Sean Nyekjaer <sean@xxxxxxxxxx> wrote:
>
> > >
> > > wait_queue doesn't really describe what this waitqueue is used for
> > > (maybe resume_wq), and the suspended state should be here as well
> > > (actually, there's one already).
> >
> > I'll rename to something meaningful.
> > >
> > > Actually, what we need is a way to prevent the device from being
> > > suspended while accesses are still in progress, and new accesses from
> > > being queued if a suspend is pending. So, I think you need a readwrite
> > > lock here:
> > >
> > > * take the lock in read mode for all IO accesses, check the
> > > mtd->suspended value
> > > - if true, release the lock, and wait (retry on wakeup)
> > > - if false, just do the IO
> > >
> > > * take the lock in write mode when you want to suspend/resume the
> > > device and update the suspended field. Call wake_up_all() in the
> > > resume path
> >
> > Could we use the chip->lock mutex for this? It's does kinda what you
> > described above?
>
> No you can't. Remember I suggested to move all of that logic to
> mtdcore.c, which doesn't know about the nand_chip struct.
>
> > If we introduce a new lock, do we really need to have the suspended as
> > an atomic?
>
> Nope, I thought we could do without a lock, but we actually need to
> track active IO requests, not just the suspended state.

I have only added wait_queue to read and write operations.
I'll have a look into where we should add further checks.

>
> >
> > I will test with some wait and retry added to nand_get_device().
>
> Again, I think there's a misunderstanding here: if you move it to the
> mtd layer, it can't be done in nand_get_device(). But once you've
> implemented it in mtdcore.c, you should be able to get rid of the
> nand_chip->suspended field.

I have moved the suspended atomic and wake_queue to mtdcore.c. And kept
the suspended variable in nand_base as is fine for chip level suspend
status.

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index c8fd7f758938..6492071eb4da 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -42,15 +42,24 @@ static int mtd_cls_suspend(struct device *dev)
{
struct mtd_info *mtd = dev_get_drvdata(dev);

- return mtd ? mtd_suspend(mtd) : 0;
+ if (mtd) {
+ atomic_inc(&mtd->suspended);
+ return mtd_suspend(mtd);
+ }
+ + return 0;
}

static int mtd_cls_resume(struct device *dev)
{
struct mtd_info *mtd = dev_get_drvdata(dev);

- if (mtd)
+ if (mtd) {
mtd_resume(mtd);
+ atomic_dec(&mtd->suspended);
+ wake_up_all(&mtd->resume_wq);
+ }
+
return 0;
}
@@ -678,6 +687,10 @@ int add_mtd_device(struct mtd_info *mtd)
if (error)
goto fail_nvmem_add;

+ init_waitqueue_head(&mtd->resume_wq);
+
+ atomic_set(&mtd->suspended, 0);
+
mtd_debugfs_populate(mtd);

device_create(&mtd_class, mtd->dev.parent, MTD_DEVT(i) + 1, NULL,
@@ -1558,6 +1571,8 @@ int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
struct mtd_ecc_stats old_stats = master->ecc_stats;
int ret_code;

+ wait_event(mtd->resume_wq, atomic_read(&mtd->suspended) == 0);
+
ops->retlen = ops->oobretlen = 0;

ret_code = mtd_check_oob_ops(mtd, from, ops);
@@ -1597,6 +1612,8 @@ int mtd_write_oob(struct mtd_info *mtd, loff_t to,
struct mtd_info *master = mtd_get_master(mtd);
int ret;

+ wait_event(mtd->resume_wq, atomic_read(&mtd->suspended) == 0);
+
ops->retlen = ops->oobretlen = 0;

if (!(mtd->flags & MTD_WRITEABLE))
diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index 88227044fc86..70ede36092a9 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -360,6 +360,9 @@ struct mtd_info {
int (*_get_device) (struct mtd_info *mtd);
void (*_put_device) (struct mtd_info *mtd);

+ atomic_t suspended;
+ wait_queue_head_t resume_wq;
+
/*
* flag indicates a panic write, low level drivers can take appropriate
* action if required to ensure writes go through