Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()

From: Baokun Li
Date: Mon May 20 2024 - 05:20:11 EST


On 2024/5/20 17:10, Jingbo Xu wrote:

On 5/20/24 4:38 PM, Baokun Li wrote:
Hi Jingbo,

Thanks for your review!

On 2024/5/20 15:24, Jingbo Xu wrote:
On 5/15/24 4:45 PM, libaokun@xxxxxxxxxxxxxxx wrote:
From: Baokun Li <libaokun1@xxxxxxxxxx>

We got the following issue in a fuzz test of randomly issuing the
restore
command:

==================================================================
BUG: KASAN: slab-use-after-free in
cachefiles_ondemand_daemon_read+0x609/0xab0
Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962

CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
Call Trace:
  kasan_report+0x94/0xc0
  cachefiles_ondemand_daemon_read+0x609/0xab0
  vfs_read+0x169/0xb50
  ksys_read+0xf5/0x1e0

Allocated by task 626:
  __kmalloc+0x1df/0x4b0
  cachefiles_ondemand_send_req+0x24d/0x690
  cachefiles_create_tmpfile+0x249/0xb30
  cachefiles_create_file+0x6f/0x140
  cachefiles_look_up_object+0x29c/0xa60
  cachefiles_lookup_cookie+0x37d/0xca0
  fscache_cookie_state_machine+0x43c/0x1230
  [...]

Freed by task 626:
  kfree+0xf1/0x2c0
  cachefiles_ondemand_send_req+0x568/0x690
  cachefiles_create_tmpfile+0x249/0xb30
  cachefiles_create_file+0x6f/0x140
  cachefiles_look_up_object+0x29c/0xa60
  cachefiles_lookup_cookie+0x37d/0xca0
  fscache_cookie_state_machine+0x43c/0x1230
  [...]
==================================================================

Following is the process that triggers the issue:

      mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
  cachefiles_ondemand_init_object
   cachefiles_ondemand_send_req
    REQ_A = kzalloc(sizeof(*req) + data_len)
    wait_for_completion(&REQ_A->done)

             cachefiles_daemon_read
              cachefiles_ondemand_daemon_read
               REQ_A = cachefiles_ondemand_select_req
               cachefiles_ondemand_get_fd
               copy_to_user(_buffer, msg, n)
             process_open_req(REQ_A)
                                   ------ restore ------
                                   cachefiles_ondemand_restore
                                   xas_for_each(&xas, req, ULONG_MAX)
                                    xas_set_mark(&xas,
CACHEFILES_REQ_NEW);

                                   cachefiles_daemon_read
                                    cachefiles_ondemand_daemon_read
                                     REQ_A =
cachefiles_ondemand_select_req

              write(devfd, ("copen %u,%llu", msg->msg_id, size));
              cachefiles_ondemand_copen
               xa_erase(&cache->reqs, id)
               complete(&REQ_A->done)
    kfree(REQ_A)
                                     cachefiles_ondemand_get_fd(REQ_A)
                                      fd = get_unused_fd_flags
                                      file = anon_inode_getfile
                                      fd_install(fd, file)
                                      load = (void *)REQ_A->msg.data;
                                      load->fd = fd;
                                      // load UAF !!!

This issue is caused by issuing a restore command when the daemon is
still
alive, which results in a request being processed multiple times thus
triggering a UAF. So to avoid this problem, add an additional reference
count to cachefiles_req, which is held while waiting and reading, and
then
released when the waiting and reading is over.


Note that since there is only one reference count for waiting, we
need to
avoid the same request being completed multiple times, so we can only
complete the request if it is successfully removed from the xarray.
Sorry the above description makes me confused.  As the same request may
be got by different daemon threads multiple times, the introduced
refcount mechanism can't protect it from being completed multiple times
(which is expected).  The refcount only protects it from being freed
multiple times.
The idea here is that because the wait only holds one reference count,
complete(&req->done) can only be called when the req has been
successfully removed from the xarry, otherwise the following UAF may
occur:

"complete(&req->done) can only be called when the req has been
successfully removed from the xarry ..."

How this is done? since the following xarray_erase() following the first
xarray_erase() will fail as the xarray slot referred by the same id has
already been erased?


@@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct
cachefiles_object *object,
      wake_up_all(&cache->daemon_pollwq);
      wait_for_completion(&req->done);
      ret = req->error;
-    kfree(req);
+    cachefiles_req_put(req);
      return ret;
  out:
      /* Reset the object to close state in error handling path.
Don't we need to also convert "kfree(req)" to cachefiles_req_put(req)
for the error path of cachefiles_ondemand_send_req()?

```
out:
    /* Reset the object to close state in error handling path.
     * If error occurs after creating the anonymous fd,
     * cachefiles_ondemand_fd_release() will set object to close.
     */
    if (opcode == CACHEFILES_OP_OPEN)
        cachefiles_ondemand_set_object_close(object);
    kfree(req);
    return ret;
```
When "goto out;" is called in cachefiles_ondemand_send_req(),
it means that the req is unallocated/failed to be allocated/failed to
be inserted into the xarry, and therefore the req can only be accessed
by the current function, so there is no need to consider concurrency
and reference counting.
Okay I understand. But this is indeed quite confusing. I see no cost of
also converting to cachefiles_req_put(req).


Yes, kfree(req) converts to cachefiles_req_put(req) at no cost,
but may trigger a NULL pointer dereference in cachefiles_req_put(req)
if the req has not been initialised.

--
With Best Regards,
Baokun Li