Re: [PATCH] pmem: report error on clear poison failure

From: Kani, Toshimitsu
Date: Thu Oct 13 2016 - 14:50:35 EST


On Thu, 2016-10-13 at 10:22 -0700, Dan Williams wrote:
> On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx
> > wrote:
> >
> > On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote:
> > >
> > > On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani <toshi.kani@xxxxxxx>
> > > wrote:
> > > >
> > > >
> > > > ACPI Clear Uncorrectable Error DSM function may fail or may be
> > > > unsupported on a platform.ÂÂpmem_clear_poison() returns without
> > > > clearing badblocks in such cases, which leads to a silent data
> > > > corruption.
> > > >
> > > > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
> > > > so that filesystem can log an error message.
> > >
> > > What's the silent data corruption scenario?ÂÂIf the clear poison
> > > fails I'm assuming that the poison will still be notified on the
> > > next
> > > read.
> >
> > I agree that the data is eventually read, but there is no guranteed
> > that when it is read soon enough, i.e. user might not access to the
> > data for a long time.
>
> ...but that's the same behavior for errors that we don't yet know
> about.ÂÂThat said, we indeed know that the write failed.ÂÂI'd feel
> better about this patch if the justification / impact was clearer in
> the changelog, because "silent data corruption" is not the impact.

Agreed. ÂHow about the following descritpion?

===
ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform.ÂÂpmem_clear_poison() returns without
clearing badblocks in such cases. ÂThis failure is detected at
the next read (-EIO).

This behavior can lead to an issue when user keeps writing but
does not read immedicately. ÂFor instance, flight recorder file
may be only read when it is necessary for troubleshooting.

Change pmem_do_bvec() and pmem_clear_poison() to return -EIO
so that filesystem can log an error message on a write error.
===

Thanks,
-Toshi