Re: Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs

From: duoming
Date: Tue Apr 26 2022 - 11:55:57 EST


hello,

On Mon, Tue, 26 Apr 2022 13:17:21 +0200 Paolo wrote:

> > There are destructive operations such as nfcmrvl_fw_dnld_abort and
> > gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> > gpio and so on could be destructed while the upper layer functions such as
> > nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> > to double-free, use-after-free and null-ptr-deref bugs.
> >
> > There are three situations that could lead to double-free bugs.
> >
> > The first situation is shown below:
> >
> > (Thread 1) | (Thread 2)
> > nfcmrvl_fw_dnld_start |
> > ... | nfcmrvl_nci_unregister_dev
> > release_firmware() | nfcmrvl_fw_dnld_abort
> > kfree(fw) //(1) | fw_dnld_over
> > | release_firmware
> > ... | kfree(fw) //(2)
> > | ...
> >
> > The second situation is shown below:
> >
> > (Thread 1) | (Thread 2)
> > nfcmrvl_fw_dnld_start |
> > ... |
> > mod_timer |
> > (wait a time) |
> > fw_dnld_timeout | nfcmrvl_nci_unregister_dev
> > fw_dnld_over | nfcmrvl_fw_dnld_abort
> > release_firmware | fw_dnld_over
> > kfree(fw) //(1) | release_firmware
> > ... | kfree(fw) //(2)
> >
> > The third situation is shown below:
> >
> > (Thread 1) | (Thread 2)
> > nfcmrvl_nci_recv_frame |
> > if(..->fw_download_in_progress)|
> > nfcmrvl_fw_dnld_recv_frame |
> > queue_work |
> > |
> > fw_dnld_rx_work | nfcmrvl_nci_unregister_dev
> > fw_dnld_over | nfcmrvl_fw_dnld_abort
> > release_firmware | fw_dnld_over
> > kfree(fw) //(1) | release_firmware
> > | kfree(fw) //(2)
> >
> > The firmware struct is deallocated in position (1) and deallocated
> > in position (2) again.
> >
> > The crash trace triggered by POC is like below:
> >
> > [ 122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> > [ 122.640457] Call Trace:
> > [ 122.640457] <TASK>
> > [ 122.640457] kfree+0xb0/0x330
> > [ 122.640457] fw_dnld_over+0x28/0xf0
> > [ 122.640457] nfcmrvl_nci_unregister_dev+0x61/0x70
> > [ 122.640457] nci_uart_tty_close+0x87/0xd0
> > [ 122.640457] tty_ldisc_kill+0x3e/0x80
> > [ 122.640457] tty_ldisc_hangup+0x1b2/0x2c0
> > [ 122.640457] __tty_hangup.part.0+0x316/0x520
> > [ 122.640457] tty_release+0x200/0x670
> > [ 122.640457] __fput+0x110/0x410
> > [ 122.640457] task_work_run+0x86/0xd0
> > [ 122.640457] exit_to_user_mode_prepare+0x1aa/0x1b0
> > [ 122.640457] syscall_exit_to_user_mode+0x19/0x50
> > [ 122.640457] do_syscall_64+0x48/0x90
> > [ 122.640457] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 122.640457] RIP: 0033:0x7f68433f6beb
> >
> > What's more, there are also use-after-free and null-ptr-deref bugs
> > in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> > set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> > then, we dereference firmware, gpio or the members of priv->fw_dnld in
> > nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> >
> > This patch reorders destructive operations after nci_unregister_device
> > to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> > is well synchronized and won't return if there is a running routine.
> > This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> > nfc_{un,}register_device").

> It looks like the above is not enough to close all the possible races,
> specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
> still race one vs another.

The fw_dnld_timeout() and fw_dnld_rx_work() could not race with each other.
Because fw_dnld_rx_work() is used to receive packets from firmware download
process, the release_firmware() in fw_dnld_rx_work() will not execute unless it
has received firmware packets, the code is shown below.

fw_dnld_rx_work
while ((skb = skb_dequeue(&fw_dnld->rx_q))) {
...
fw_dnld_over
release_firmware
}

The release_firmware() is only called when firmware download process failed, and
fw_dnld_rx_work() will not receive any packets from firmware download process.
As a result, the race is non-existent. The process is shown below:

fw_dnld_rx_work | nfcmrvl_fw_dnld_start
... | ...
| release_firmware
(wait for skb) | (the following will not send any skb)
while (skb = ...) { |
... |
fw_dnld_over |
release_firmware |
} |
(receive no packet) |

> I *think* that the approach you already suggested here:
>
> https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@xxxxxxxxxx/
>
> should be safer - but you have to protect with the same spinlock even
> every fw_dnld->fw modification.

The spinlock should not be used to protect release_firmware(), because there is vfree() in release_firmware().
The vfree() is a function that may sleep. If we use spinlock, the sleep in atomic bug will happen. The process
is shown below:

spin_lock()
release_firmware()
firmware_free_data()
vfree() //sleep in atomic
spin_unlock()

We reorders destructive operations after nci_unregister_device(),
using nci_unregister_device() to synchonize with firmware download routine.
But I found this is not enough, when nci_unregister_device() is return, there
are still firmware download routine is running.

[ 65.835462] BUG: KASAN: use-after-free in nci_fw_download+0x26/0x60
[ 65.840236] Read of size 8 at addr ffff88800c2f5008
...
[ 65.845755] Call Trace:
[ 65.856235] nci_fw_download+0x26/0x60
[ 65.856235] nfc_fw_download+0x99/0xe0
[ 65.856235] nfc_genl_fw_download+0x10b/0x170
...

In order to solve this problem, I propose the following solutions:

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
{
struct nci_dev *ndev = priv->ndev;

+ nci_unregister_device(ndev);
if (priv->ndev->nfc_dev->fw_download_in_progress)
nfcmrvl_fw_dnld_abort(priv);

@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
if (gpio_is_valid(priv->config.reset_n_io))
gpio_free(priv->config.reset_n_io);

- nci_unregister_device(ndev);
nci_free_device(ndev);
kfree(priv);
}

diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..da8199f67d4 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,7 @@
#define NFC_CHECK_PRES_FREQ_MS 2000

int nfc_devlist_generation;
+bool nfc_download;
DEFINE_MUTEX(nfc_devlist_mutex);

/* NFC device ID bitmap */
@@ -38,7 +39,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)

device_lock(&dev->dev);

- if (!device_is_registered(&dev->dev)) {
+ if (!device_is_registered(&dev->dev) || !nfc_download) {
rc = -ENODEV;
goto error;
}
@@ -1134,6 +1135,7 @@ int nfc_register_device(struct nfc_dev *dev)
dev->rfkill = NULL;
}
}
+ nfc_download = true;
device_unlock(&dev->dev);

rc = nfc_genl_device_added(dev);
@@ -1166,6 +1168,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
rfkill_unregister(dev->rfkill);
rfkill_destroy(dev->rfkill);
}
+ nfc_download = false;
device_unlock(&dev->dev);

if (dev->ops->check_presence) {
--

The new solution not only reorders destructive operations after nci_unregister_device,
but also adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

(Thread 1) | (Thread 2)
nfcmrvl_nci_unregister_dev |
nci_unregister_device |
nfc_unregister_device | nfc_fw_download
device_lock() |
... |
nfc_download = false;//(1)| ...
device_unlock() |
... | device_lock()
(destructive operations) | if(.. || !nfc_download)
| goto error; //(2)
| rc = dev->ops->fw_download();
| error:
| device_unlock()

If the device is detaching, nfc_download will be set to false in position (1) and
the download function will goto error.

(Thread 1) | (Thread 2)
nfcmrvl_nci_unregister_dev |
nci_unregister_device |
nfc_unregister_device | nfc_fw_download
| device_lock()
| rc = dev->ops->fw_download();
| dev->fw_download_in_progress = false;
| device_unlock()
device_lock() | ...
... |
nfc_download = false; | ...
device_unlock() |
... |
if (..->fw_download_in_progress) |
nfcmrvl_fw_dnld_abort |
fw_dnld_over |
release_firmware |

If the download function is executing, nci_unregister_device() will wait until
firmware download routine is finished. What`s more, when the firmware download
routine is finished, the fw_download_in_progress() in nfc_dev struct will be set
to false. the release_firmware() in nfcmrvl_nci_unregister_dev() will not execute,
which could synchonize with fw_dnld_rx_work() and fw_dnld_timeout().

I think the new solution is good, it has been tested enough. I sent "[PATCH net v3]
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs"
just now.

Best regards,
Duoming Zhou