Re: BUG: sleeping function called from invalid context in ext4_superblock_csum_set

From: Jan Kara
Date: Wed Nov 11 2020 - 09:31:10 EST


On Wed 04-11-20 14:12:35, Jan Kara wrote:
> On Tue 03-11-20 09:16:19, Costa Sapuntzakis wrote:
> > Jan, does this fixup from Hillf look ok to you? You originally argued for
> > lock_buffer/unlock_buffer.
> >
> > I think the problem here is that the ext4 code assumes that
> > ext4_commit_super will not sleep if sync == 0 (or at least
> > __ext4_grp_locked_error deos). Perhaps there should be a comment on
> > ext4_commit_super documenting this constraint.
>
> Hum, right. I forgot about that. The spinlock Hillf suggests kind of works
> but it still doesn't quite handle the case where superblock is modified in
> parallel from another place (that can still lead to sb checksum mismatch on
> next load). When we are going for a more complex solution I'd rather solve
> this as well... I'm looking into possible solutions now.

Just an update: I'm still working on this and it's like peeling an onion.
The mixing of journalled superblock updates with unjournalled one is really
evil and commit acaa532687cd fixes just a small part of the problems. So
for now I suggest to just revert acaa532687cd (to avoid sleep in atomic
context) and I'll submit larger set of fixes once they are ready.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR