Re: [BUG] rockpro64 board hangs in console_init() after commit 10e14073107d

From: Jan Kara
Date: Tue Jun 14 2022 - 09:58:59 EST


On Tue 14-06-22 14:23:32, Petr Mladek wrote:
> On Mon 2022-06-13 17:54:35, Alexandru Elisei wrote:

<snip>

> > Config can be found at [1] (expires after 6 months). I've also built the
> > kernel with gcc 10.3.1 [2] (aarch64-none-linux-gnu), same issue.
> >
> > I've bisected the build failure to commit 10e14073107d ("writeback: Fix
> > inode->i_io_list not be protected by inode->i_lock error"); I've confirmed
> > that that commit is responsible by successfully booting the board with a
> > kernel built from v5.19-rc2 + the above commit reverted.
>
> It is strange. I can't see how consoles are related to filesystem
> writeback.
>
> Anyway, the commit 10e14073107d ("writeback: Fix inode->i_io_list not
> be protected by inode->i_lock error") modifies some locking and
> might be source of possible deadlocks.

Yes, I've got other reports from ARM people that this commit causes issues
for them (kernel oops or so) so the locking changes are likely at fault...

> I am not familiar with the fs code. But I noticed the following.
> The patch adds:
>
> + if (!was_dirty) {
> + wb = locked_inode_to_wb_and_lock_list(inode);
> + spin_lock(&inode->i_lock);
>
> And locked_inode_to_wb_and_lock_list() is defined this way:
>
> /**
> * locked_inode_to_wb_and_lock_list - determine a locked inode's wb and lock it
> * @inode: inode of interest with i_lock held
> *
> * Returns @inode's wb with its list_lock held. @inode->i_lock must be
> * held on entry and is released on return. The returned wb is guaranteed
> * to stay @inode's associated wb until its list_lock is released.
> */
> static struct bdi_writeback *
> locked_inode_to_wb_and_lock_list(struct inode *inode)
> __releases(&inode->i_lock)
> __acquires(&wb->list_lock)
> {
> while (true) {
> struct bdi_writeback *wb = inode_to_wb(inode);
>
> /*
> * inode_to_wb() association is protected by both
> * @inode->i_lock and @wb->list_lock but list_lock nests
> * outside i_lock. Drop i_lock and verify that the
> * association hasn't changed after acquiring list_lock.
> */
> wb_get(wb);
> spin_unlock(&inode->i_lock);
>
> It expects that inode->i_lock is taken before. But the problematic
> commit takes it later. It might mess the lock and cause a deadlock.

No. AFAICS inode->i_lock is held on entry to
locked_inode_to_wb_and_lock_list(). The function releases it so we have to
grab it again. The locking is ugly here but correct in this regard.

It rather likely has to do something with reordering the checks and running
locked_inode_to_wb_and_lock_list() on inodes for which we previously didn't
do it but I have to yet fully understand why things crash...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR