Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv
From: Pratyush Yadav
Date: Mon Jan 13 2025 - 09:08:59 EST
On Mon, Dec 30 2024, Miquel Raynal wrote:
> Hello Pratyush,
>
> On 24/12/2024 at 21:15:41 GMT, Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
>
>> On Tue, Dec 24 2024, Miquel Raynal wrote:
[...]
>>> As there are very few situations where this can actually happen, a
>>> status register write being the most likely one, another possibility
>>> might have been to use volatile writes instead of non-volatile writes,
>>> as most of the deviation comes from the action of writing the bit. But
>>> this would overlook other possible actions where both dies can be used
>>> at the same time like a chip erase (or any erase over the die boundary
>>> in general). This last approach would have the least impact but because
>>> it does not feel like it is totally safe to use and because the impact
>>> of the second solution presented above is also negligible, we keep this
>>> second approach for now (which can be further tuned later if it appears
>>> to be too impacting in the end).
>>
>> I am a bit confused by this paragraph. What do you mean by "this" in
>> the
>
> "this" = "the race condition"
>
>> first sentence? What do status register writes have to do with the ready
>> bit being racy?
>
> The bug that has been experienced followed this sequence:
> - send the write enable command (non-volatile)
> - wait for the ready/busy bit, ie. wait for the WEL bit to be set
> because it is non-volatile write
> - active die is ready, (but idle die is not!)
> - enter 4-byte address mode, only the die that is ready processes the
> command.
>
> We only observed the issue in this particular case which involves
> writing the status register, because it is one of the very few commands
> targeting all dies at the same time.
>
> I assume another sequence that might lead to a similar issue might be a
> chip erasure, as all dies are involved in parallel, but maybe there are
> other situations I did not think about which might be racy as well.
>
>> I would assume those would be nearly instant since
>> status registers are usually volatile. What do volatile writes mean in
>> this context?
>
> You are actually right. Status register bits can be volatile (in this
> case writing the bits themselves is almost instant) but currently when
> we allow this register to be writable by sending the write enable (06h)
> command, the non-volatile way is used, ie. the state of the bit itself
> is stored in non-volatile memory and write durations can vary from one
> die to another.
Okay, that is strange behaviour. Normally the status registers are
always volatile, and don't have a non-volatile counterpart.
>
> Winbond chips (maybe this is a shared capability?) accepts another
> command, "Write Enable for Volatile Status Register (50h)", which
> specifically change the status register bits to use the volatile method.
>
> Hence, if the only situation we want to solve is the status register
> access, then we may just enable this command (this is the third solution
> I tried to explain in the commit log), but if we think there are other
> racy situations, this approach is not complete and we must fallback to
> one of the approaches listed above.
I am not quite sure how you fix the write-enable-being-racy bug with
your patch. If you look at the code, spi_nor_write_enable() only calls
the write enable command (06h), and does not call
spi_nor_wait_till_ready() after that. After the write enable, it
immediately executes the program or erase operation. So you never
actually wait for all dies to be ready after a write enable.
You can see an example in spi_nor_write(). It does:
spi_nor_write_enable() -> spi_nor_write_data() -> spi_nor_wait_till_ready()
Do you have a consistent reproducer for the race? If so, does the patch
actually somehow make the race go away? If so, I would be curious to
know why.
>
>>>
>>> However, the fixup, whatever which one we pick, must be applied on
>>> multi-die chips, which hence must be properly flagged. The SFDP tables
>>> implemented give a lot of information but the die details are part of an
>>> optional table that is not implemented, hence we use a post parsing
>>> fixup hook to set the params->n_dice value manually.
>>>
[...]
--
Regards,
Pratyush Yadav