Re: 2.6.16: Oops - null ptr in blk_recount_segments?
From: David Chinner
Date: Mon Mar 27 2006 - 01:13:25 EST
On Mon, Mar 27, 2006 at 06:58:23AM +0200, Peter Palfrader wrote:
> On Mon, 27 Mar 2006, David Chinner wrote:
>
> > > I've seen a few of these since upgrading to 2.6.16 on my x86_64 system:
> >
> > What did you upgrade from?
>
> 2.6.14.2.
>
>
> > > [14513.189101] Call Trace: <ffffffff80288e8d>{blk_recount_segments+125}
> > > [14513.189113] <ffffffff8045ab19>{schedule_timeout+153} <ffffffff8017df90>{__bio_clone+128}
> > > [14513.189134] <ffffffff8017dfed>{bio_clone+61} <ffffffff80144a70>{autoremove_wake_function+0}
> > > [14513.189143] <ffffffff882576c1>{:dm_crypt:crypt_alloc_buffer+65}
> > > [14513.189157] <ffffffff8825813e>{:dm_crypt:crypt_map+174} <ffffffff8823b463>{:dm_mod:__map_bio+83}
> > > [14513.189190] <ffffffff8823b6a8>{:dm_mod:__clone_and_map+168} <ffffffff8823b8de>{:dm_mod:__split_bio+174}
> > > [14513.189223] <ffffffff8823b9dc>{:dm_mod:dm_request+204} <ffffffff8028b532>{generic_make_request+258}
> > > [14513.189243] <ffffffff8028b628>{submit_bio+216} <ffffffff8017e20f>{__bio_add_page+463}
> > > [14513.189256] <ffffffff8026bd3e>{xfs_submit_ioend_bio+30} <ffffffff8026bec0>{xfs_submit_ioend+144}
> > > [14513.189269] <ffffffff8026cd03>{xfs_page_state_convert+1379} <ffffffff8026d2e0>{linvfs_writepage+176}
> >
> > This area of XFS changed in 2.6.16; it might be doing something wrong
> > with bios that is blowing up later. Then again, it might be something in dm
> > that is wrong. Does this happen on non-dmcrypt XFS filesystems you have?
>
> All the Oopses so far had dm_crypt in their backtrace, but then the
> non-crypted XFSs don't see that much action. Also, I recently added a
> new XFS filesytem to the system. Is there a way to tell which of them
> causes the Oops?
The only way I know involves prodding the decomposing corpse with a long,
pointy stick (i.e. kdb). That probably doesn't help you, though.
> > I understand that the recognised way to find introduced a reproducable problem
> > like this is to do a git bisect to find the mod that introduced the problem.
> > Is it possible for you to do this?
>
> I can certainly try but it usually takes several hours or a at least a
> few days to manifest here on my desktop.
Well, you can try ruling out the XFS changes - on this page:
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;h=9eeff2395e3cfd05c9b2e6074ff943a34b0c5c21;f=fs/xfs/linux-2.6/xfs_aops.c
These diffs:
2006-01-18
[XFS] Fix a race in xfs_submit_ioend() where we can ...
2006-01-11
[XFS] fix writeback control handling fix a reversed ...
[XFS] cluster rewrites We can cluster mapped pages ...
[XFS] pass full 64bit offsets to xfs_add_to_ioend
[XFS] consolidate some code in xfs_page_state_convert ...
[XFS] various fixes for xfs_convert_page fix various ...
[XFS] clean up the xfs_offset_to_map interface Currently ...
[XFS] use pagevec lookups This reduces the time spend ...
[XFS] Initial pass at going directly-to-bio on the ...
Are the ones that change the XFS I/O path from what is in 2.6.14. Try
reverting them and see if the problem goes away....
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/