[BUG] scsi: megaraid_sas: possible null pointer dereference in megasas_slave_alloc()

From: Zixuan Fu
Date: Fri Sep 02 2022 - 03:51:57 EST


Hello,

Our fault injection tool finds a possible null-pointer dereference in the
megaraid_sas driver in Linux 5.10.0:

In the file drivers/scsi/megaraid/megaraid_sas_base.c:
In megasas_get_seq_num(), the call to dma_alloc_coherent() may fail:
6459: el_info = dma_alloc_coherent(&instance->pdev->dev,
sizeof(struct megasas_evt_log_info),
&el_info_h,
GFP_KERNEL);

This error is propagated to its caller megasas_start_aen().
6749: if (megasas_get_seq_num(instance, &eli))
6750: return -1;

Then it is propagated again to its caller megasas_probe_one().
7428: if (megasas_start_aen(instance)) {
7429: dev_printk(KERN_DEBUG, &pdev->dev, "start aen failed\n");
7430: goto fail_start_aen;
7431: }

In error handling code of megasas_probe_one(), it removes the pointer
`instance` from `megasas_mgmt_info.instance`:
7445: megasas_mgmt_info.instance[megasas_mgmt_info.max_index] = NULL;

But it stores the pointer `instance` in the pdev by calling pci_set_drvdata()
before and do nothing about it in error handling code:
7401: pci_set_drvdata(pdev, instance);

Then, in another thread, megasas_slave_alloc() is called. This function calls
megasas_lookup_instance() to get the pointer `instance`, which can not be
found in `megasas_mgmt_info.instance`. Therefore, NULL is returned:
2087: instance = megasas_lookup_instance(sdev->host->host_no);

This causes a null-pointer dereference bug:
2095: if ((instance->pd_list_not_supported ||
instance->pd_list[pd_index].driveState == MR_PD_STATE_SYSTEM))

If we just add a check for `instance`, another bug is found.
megasas_fault_detect_work() is called by a thread. and it retrieves the
pointer `instance` from `work`:
In the file drivers/scsi/megaraid/megaraid_sas_base.c:
1901: struct megasas_instance *instance =
container_of(work, struct megasas_instance, fw_fault_work.work);

Because the structure `instance` points to is broken, the following calls
about `instance` causes some page-faults:
1907: fw_state = instance->instancet->read_fw_status_reg(instance) &
MFI_STATE_MASK;
1911: dma_state = instance->instancet->read_fw_status_reg(instance) &
MFI_STATE_DMADONE;
...

I am not quite sure how to fix this possible bug. Any feedback would be
appreciated, thanks!

Reported-by: TOTE Robot <oslab@xxxxxxxxxxxxxxx>

Best wishes,
Zixuan Fu