Re: Linux kernel - Libata bad block error handling to user mode program

From: Robert Hancock
Date: Sun Mar 14 2010 - 00:06:47 EST


On Sat, Mar 13, 2010 at 6:12 PM, s ponnusa <foosaa@xxxxxxxxx> wrote:
> Is it the case even during the blocking operation where the write op
> waits for the call return?

Unless you're using O_DIRECT, the write will generally go into cache,
not directly to the disk.

> Even, fsync does not catch the errors. (or alteast in the 2.6.27). I
> agree with you on the process flow. Will post more testing results and
> details within a couple of days.

If the drive is indeed reporting an error on writes to a file, and the
program doesn't detect an error on any calls when doing so, even when
calling fsync, that sounds like a bug somewhere..

> -
> SP
>
> On Sat, Mar 13, 2010 at 6:44 PM, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>> On 03/13/2010 04:44 PM, s ponnusa wrote:
>>>
>>> Had some issues with the libata in 2.6.27 kernel's libata code, but
>>> believe the issues were fixed in the subsequent versions. Atleast one
>>> prominent issue was with a Western Digital HDD of 40 GB size. The
>>> manufacturer specific LBA was 78125000 and was reported as correctly
>>> in Win32 and DOS applications. But the 2.6.27 kernel was reporting
>>> ~40000 sectors more. But the problem dissappeared with the 2.6.3x
>>> kernel and I did not bother to check the patches due to lack of time.
>>> But still, the write's failure is not being seen by the application. I
>>> can understand the fact of not checking the media errors during the
>>> write operation, and had posted a request for a quick suggestions of
>>> the locations which needs to be changed / checked for the return
>>> value. ( Should it be handled at the vfs or at the libata code?). Will
>>> surely update the testing results with the new kernel (Well, not
>>> exactly as I am not using the latest version though! Currently trying
>>> with 2.6.31). Thank you all for suggestions.
>>
>> It's quite likely for write errors not to be noticed by the application.
>> Even if the drive does report a write error, the application that wrote the
>> data could have completed the write and even closed the file or exited
>> before the data actually gets written to disk. Only if fsync (or related
>> functions) are called on the file is it guaranteed that the data has been
>> written out to the drive (and any generated errors should be seen at that
>> time).
>>
>>> -
>>> SP
>>>
>>> On Thu, Mar 11, 2010 at 1:29 PM, Greg Freemyer<greg.freemyer@xxxxxxxxx>
>>>  wrote:
>>>>>
>>>>> But really.. isn't "hdparm --security-erase NULL /dev/sdX" good enough
>>>>> ???
>>>>>
>>>>
>>>> This thread seems to have died off.  If there is a real problem, I
>>>> hope it picks back up.
>>>>
>>>> Mark, as to your question the few times I've tried that the bios on
>>>> the test machine blocked the command.  So it may have some specific
>>>> utility, but it's a not a generic solution in my mind.
>>>>
>>>> Greg
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/