RE: lockdep: holding locks across syscall boundaries

From: David Laight
Date: Sun Oct 29 2023 - 18:02:35 EST


From: Peter Zijlstra
> Sent: 27 October 2023 17:00
>
> On Fri, Oct 27, 2023 at 09:14:53AM -0600, Jens Axboe wrote:
> > Hi,
> >
> > Normally we'd expect locking state to be clean and consistent across
> > syscall entry and exit, as that is always the case for sync syscalls.
>
> > We currently have a work-around for holding a lock from aio, see
> > kiocb_start_write(), which pretends to drop the lock from lockdeps
> > perspective, as it's held from submission to until kiocb_end_write() is
> > called at completion time.
>
> I was not aware of this, the only such hack I knew about was the
> filesystem freezer thing.
>
> The problem with holding locks past the end of a syscall is that you'll
> nest whatever random lock hierarchies possibly by every other syscall
> under that lock.
>
...
>
> Suppose syscall-a returns with your kiocb thing held, call it lock A
> Suppose syscall-b returns with your inode thing held, call it lock B
>
> Then userspace does:
>
> syscall-a
> syscall-b
>
> while it also does:
>
> syscall-b
> syscall-a
>
> and we're up a creek, no?

Isn't it also open to a massive denial-of-service attack?
syscall-a
sleep(infinity)

assuming you actually catch:
syscall-a
_exit()

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)