Re: Bug with "fix partial page writes" [3.2-rc regression]

From: Yongqiang Yang
Date: Mon Dec 05 2011 - 22:44:09 EST


On Tue, Dec 6, 2011 at 11:33 AM, Tao Ma <tm@xxxxxx> wrote:
> On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
>> Hi Allison,
>>
>> I noticed another problem which has nothing to do with punching hole.
>>  __block_write_begin does not zero buffers beyond EOF.(I guess you
> yes, that is expected.
>> tried to zero them in your code, am I right? )  When users mapread
>> beyond EOF,  users get non-zero data.  I am not sure zero or non-zero
>> data should be, but fsx thinks they should be zero data and reports an
>> error.
> why users can read the data passing EOF? I am also puzzled. Punching
> hole will do this? I don't think it's right.
According to code, fiemap_fault handles the case right. But I met
the error - 'non-zero data beyond EOF' reported by fsx. It is
strange. It seems that uptodate status is set wrong. Just a guess:-)

I am guessing Allison met the problem before and tried to fix it in
write path by zeroing buffers beyond EOF.

Yongqiang.
>
> Thanks
> Tao
>>
>> It I understand the problem right, it happens more often with punch hole.
>>
>> Yongqiang.
>> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
>> <achender@xxxxxxxxxxxxxxxxxx> wrote:
>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>
>>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>>
>>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>>
>>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>>
>>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>>> It appears that there's a bug with this patch:
>>>>
>>>>
>>>> This has been outstanding for a month now, and we've heard no progress:
>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for rc5.
>>>>
>>>> The problems appear on a 1k-blocksize filesystem under memory pressure:
>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>> a page after generic_write_end() dropped our last reference to it; and
>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>> rare data corruption seen when kbuilding.
>>>>
>>>> Although I earlier reported that backing out the patch caused an fsx
>>>> test to fail earlier, I've since found great variation in how soon it
>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>> under memory pressure: you won't be surprised that that turned out to
>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>
>>>> Thanks,
>>>> Hugh
>>>>
>>>
>>>
>>> Hi there,
>>>
>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>> works well when blocksize < pagesize" ?  I have tried it and it does seem to
>>> help, but I am still running into some failures that I am trying to debug,
>>> but let please let us know if it helps the issues that you are seeing.  Thx!
>>>
>>> Allison Henderson
>>>
>>
>>
>>
>



--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/