Re: Questions about the patch 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it")

From: libaokun (A)
Date: Tue Jan 11 2022 - 03:35:12 EST


在 2022/1/10 17:09, Jann Horn 写道:
On Wed, Dec 22, 2021 at 11:32 AM libaokun (A) <libaokun1@xxxxxxxxxx> wrote:
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 1 Dec 2021 10:06:14 -0800
Subject: fget: check that the fd still exists after getting a ref to it

Jann Horn points out that there is another possible race wrt Unix domain
socket garbage collection, somewhat reminiscent of the one fixed in
commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").

See the extended comment about the garbage collection requirements added
to unix_peek_fds() by that commit for details.

The race comes from how we can locklessly look up a file descriptor just
as it is in the process of being closed, and with the right artificial
timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
Unix domain socket garbage collector could see the reference count
decrement of the close() happen before fget() took its reference to the
file and the file was attached onto a new file descriptor.
I analyzed this CVE and tried to reproduce it.

I guess he triggered it like the stack below.


close_fd |
pick_file |
| __fget_files
file = files_lookup_fd_rcu(files, fd); |
|
rcu_assign_pointer(fdt->fd[fd], NULL);
filp_close |
fput |
| get_file_rcu_many // ned ref>=1
fput_many(file, 1); |
file_free(file); |
| return file
| // read-after-free
The race is more complicated than that; you also need to add unix_gc()
to the race. And if you want to get to memory corruption, you need one
or two more races involving unix_stream_read_generic() on top of that.

If you want to successfully execute the get_file_rcu_many function,

the reference counting of the file is greater than or equal to 1 and

is greater than or equal to 2 after the execution.

However, close releases only one reference count and does not release
the file,

so read-after-free does not occur. So how is the race triggered here?
This bug does not lead to a UAF of the file, it leads to a locking
inconsistency between the unix stream read path and the GC.

The question has been pondered for a long time without any results.

Could I get more details (e.g. reproduction methods or stacks) from you ?
See https://bugs.chromium.org/p/project-zero/issues/detail?id=2247 for
the original bug report. I'm also working on a more detailed blog
post, but that isn't finished yet.
.

Thank you very much for your reply!

With your help, I have understood the problem and successfully reproduced it.

Thanks a million!

--
With Best Regards,
Baokun Li