Jamie,
Did you try the patch I suggested for you to revert ?
On Thu, 11 Dec 2003, Jamie Clark wrote:
OK, no deadlock yet with 2.4.23aa1 however it oopsed under ext3_file_write() in __mark_inode_dirty().
Just to recap: this test is dual PIII, running several bonnie++ loads on an ext3+noatime+quota filesystem
mounted off
From the oops the fault happens on the last instruction of:
movl $0,8(%ebx)
movl $0,4(%edx)
movl 100(%edi),%eax movl %edx,4(%eax) <-- here
which appears to be this code in inode.c [line 221+]
if (!(inode->i_state & (I_LOCK|I_FREEING|I_CLEAR)) &&
!list_empty(&inode->i_hash)) {
list_del(&inode->i_list);
list_add(&inode->i_list, &sb->s_dirty);
After a quick browse of the assembler output the zeroing would appear to be part of the list_del inline, and edi seems to equate to &sb. If I have read that correctly then the oops happens at the beginning of
the list_add() inline and eax is the head of the s_dirty list - pointing into oblivion.
__mark_inode_dirty() does not appear to take sb_lock before adding to the s_dirty list. Could that
be the culprit? I'm completely unfamiliar with linux kernel so I might be way off here.
-Jamie
Andrea Arcangeli wrote:
On Tue, Nov 04, 2003 at 07:52:40PM +0800, Jamie Clark wrote:
I made the quick fix (disabling rq_mergeable) and started the load test.does your later recent email means it deadlocked again even with this
Will let it run for a week or so.
disabled?
Could you try again with 2.4.23aa1 again just in case?
FYI an observation from my last test: the read latency seems to be muchnice to hear! thanks.
improved and more consistent under this kernel (2.4.23pre6aa3, before
the oops and before this fix). The maximum latency seemed steady over
the whole test without any of the longish pauses that showed up under
2.4.19. Quite a difference.