Re: [syzbot] BUG: sleeping function called from invalid context in _copy_to_iter

From: Shoaib Rao
Date: Mon Aug 09 2021 - 18:39:08 EST

On 8/9/21 2:41 PM, Al Viro wrote:
On Mon, Aug 09, 2021 at 01:37:08PM -0700, Shoaib Rao wrote:

+ mutex_lock(&u->iolock);
+ unix_state_lock(sk);
+ err = unix_stream_recv_urg(state);
+ unix_state_unlock(sk);
+ mutex_unlock(&u->iolock);

is 100% broken, since you *are* attempting to copy data to userland between
spin_lock(&unix_sk(s)->lock) and spin_unlock(&unix_sk(s)->lock).
Yes, but why are we calling it unix_state_lock() why not
unix_state_spinlock() ?
We'd never bothered with such naming conventions; keep in mind that
locking rules can and do change from time to time, and encoding the
nature of locking primitive into the name would result in tons of
Rules/Order and Semantics can change, but naming IMHO helps out a lot. There are certain OS's where spinlocks only spin for a bit after that they block. However, they still are called spinlocks.

I have tons of experience doing kernel coding and you can never ever cover
everything, that is why I wanted to root cause the issue instead of just
turning off the check.

Imagine you or Eric make a mistake and break the kernel, how would you guys
feel if I were to write a similar email?
Moderately embarrassed, at a guess, but what would that have to do with
somebody pointing the bug out? Bonehead mistakes happen, they are embarrassing
no matter who catches them - trust me, it's no less unpleasant when you end
up being one who finds your own bug months after it went into the tree. Been
there, done that...

Since you asked, as far as my reactions normally go:
* I made a mistake that ended up screwing people over => can be
hideously embarrassing, no matter what. No cause for that in your case,
AFAICS - it hadn't even gone into mainline yet.
* I made a dumb mistake that got caught (again, doesn't matter
by whom) => unpleasant; shit happens (does it ever), but that's not
a tragedy. Ought to look for the ways to catch the same kind of mistakes
and see if I have stepped into the same problem anywhere else - often
enough the blind spots strike more than once. If the method of catching
the same kind of crap ends up being something like 'grep for <pattern>,
manually check the instances to weed out the false positive'... might
be worth running over the tree; often enough the blind spots are shared.
Would be partially applicable in your case ("if using an unfamiliar locking
helper, check what it does"), but not easily greppable.
* I kept looking at bug report, missing the relevant indicators
despite the increasingly direct references to those by other people =>
mildly embarrassing (possibly more than mildly, if that persists for long).
Ought to get some coffee, wake up properly (if applicable, that is) and make
notes for myself re what to watch out for. Partially applicable here;
I'm no telepath, but at a guess you missed the list of locks in the report
_and_ missed repeated references to some spinlock being involved.
Since the call chain had not (AFAICS) been missed, the question
"which spinlock do they keep blathering about?" wouldn't have been hard.
Might be useful to make note of, for the next time you have to deal with
such reports.
* Somebody starts asking whether I bloody understand something
trivial => figure out what does that have to do with the situation at
hand, reply with the description of what I'd missed (again, quite possibly
the answer will be "enough coffee") and move on to figuring out how to
fix the damn bug. Not exactly applicable here - the closest I can see
is Eric's question regarding the difference between mutex and spinlock.
In similar situation I'd go with something along the lines of "Sorry,
hadn't spotted the spinlock in question"; your reply had been a bit
more combative than that, but that's a matter of taste. None of my
postings would fit into that class, AFAICS...
* Somebody explains (in painful details) what's wrong with the
code => more or less the same as above, only with less temptation (for
me) to get defensive. Reactions vary - some folks find it more offensive
than the previous one, but essentially it's the same thing.

The above describes my reactions, in case it's not obvious -
I'm not saying that everyone should react the same way, but you've
asked how would I (or Eric) react in such-and-such case. And I can't
speak for Eric, obviously...


I really appreciate the time you have taken to write the email. I agree with what you have stated 99%. My displeasure is with the fact that when I asked what conditions trigger this error (not familiar with the checker), no one replied. As I said in the emails, I did suspect the locks but did not have time to look at the definition, your email arrived as I was looking at the definition. It would have been better and polite to say, are you sure you are not holding a spinlock? Would that not solve the issue? Why do we have to always assume that the other person is not knowledgeable and inferior to us.

Is there any documentation that lists possible reasons when the checker points to an error?

Thanks again for the email.