Badari Pulavarty wrote:Here's another data point for your consideration. I've been seeing this error since I started running 2.6.18, I assumed it was hardware, so I've tried 3 different disks, a PATA and 2 SATA drives, with VIA and Promise controllers, the error has occurred on all of them. I see the error infrequently, always when downloading lots of small files from Usenet and building, copying and deleting large (200 - 300 MB). I haven't ever had an oops/panic, just this error. When I run fsck, I always see a single message that "deleted inode nnn has zero dtime". I hope this will be useful.
Here is what I think is happening..
journal_unmap_buffer() - cleaned the buffer, since its outside EOF, but
its a part of the same page. So it remained on the page->buffers
list. (at this time its not part of any transaction).
Then, ordererd_commit_write() called journal_dirty_data() and we added
all these buffers to BJ_SyncData list. (at this time buffer is clean -
not dirty).
Now msync() called __set_page_dirty_buffers() and dirtied *all* the
buffers attached to this page.
journal_submit_data_buffers() got around to this buffer and tried to
submit the buffer...
This seems about right, but one thing bothers me in the traces; it seems like there is some locking that is missing. In
http://people.redhat.com/esandeen/traces/eric_ext3_oops1.txt
for example, it looks like journal_dirty_data gets started, but then the buffer_head is acted on by journal_unmap_buffer, which decides this buffer is part of the running transaction, past EOF, and clears mapped, dirty, etc. Then journal_dirty_data picks up again, decides that the buffer is not on the right list (now BJ_None) and puts it back on BJ_SyncData. Then it gets picked up by journal_submit_data_buffers and submitted, and oops.
Talking with Stephen, it seemed like the page lock should synchronize these threads, but I've found that we can get to journal_dirty_data acting on the buffer heads w/o having the page locked...
I'm still digging, and, er, grasping at straws here... Am I off base?
-Eric
Andrew is right - only option for us to check the filesize in the
write out path and skip the buffers beyond EOF.
Thanks,
Badari