Re: [syzbot] [ide?] UBSAN: shift-out-of-bounds in ata_qc_issue

From: Damien Le Moal

Date: Thu Feb 19 2026 - 20:06:26 EST


On 2/20/26 09:55, Niklas Cassel wrote:
> On Thu, Feb 19, 2026 at 10:33:22AM +0900, Damien Le Moal wrote:
>>>> UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
>>>> shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
>>>
>>> 4210818301 is 0xfafbfcfd
>>>
>>> 0xfafbfcfd is ATA_TAG_POISON.
>>>
>>> ATA_TAG_POISON is set by ata_qc_free(), so it appears that
>>> ata_scsi_deferred_qc_work() is trying to issue a QC that has
>>> already been freed.
>>
>> I checked the code but I fail to see any path that can lead to this happening.
>> I did more tests using qemu q35 machine as used by syzbot, and everything looks
>> fine. So not sure what is happening here. I will dig further.
>
> Hello Damien,
>
>
> My best guess:
> since qc->tag is ATA_TAG_POISON, ata_qc_free() must have been called
> on ap->deferred_qc.
>
> If it was an NCQ abort, ata_eh_set_pending() would have been called to
> clear ap->deferred_qc. Since ap->deferred_qc is apparently set, it
> appears that we did not get an error IRQ.
>
> To me, that leaves a timeout as the most likely scenario.

Good point. I think the timeout case was completely overlooked...
That should be fairly easy to debug: I just need to add have the deferred work
do nothing to see the deferred qc timeout.

Let me hack something and come up with a fix.

>
> I.e. SCSI EH is called without ata_eh_set_pending() having been called.
> (Currently ata_eh_set_pending() is the function that clears
> ap->deferred_qc)
>
>
>
> If I look at ata_scsi_cmd_error_handler() it will only break if:
>
> if (qc->flags & ATA_QCFLAG_ACTIVE && qc->scsicmd == scmd)
>
> If the deferred QC times out, flag ATA_QCFLAG_ACTIVE will not be set
> (because ATA_QCFLAG_ACTIVE is only set by qc_issue()).
>
> Since ATA_QCFLAG_ACTIVE is not set i == ATA_MAX_QUEUE, so we will enter the
> else clause which calls:
> scsi_eh_finish_cmd(scmd, &ap->eh_done_q);
>
>
> That might potentially free the tag to the block layer to reuse,
> while ap->deferred_qc is still set (with the same tag).
>
> Possibly, next time ata_scsi_qc_issue() is called, ap->deferred_qc is still set,
> so it calls ata_qc_free(qc), which, since it wasn't cleared, might have the same
> tag? because block layer has now reused the tag (since SCSI completed the
> command).
>
> I would possibly have expected some kind of print from SCSI in this case.
> (But since the else clause finishes the command normally, perhaps not?)
>
> But perhaps it is wise to add some code to ata_scsi_cmd_error_handler()
> which clears ap->deferred_qc.
>
>
>
> Another possibility... again, timed out commands will not have called
> ata_eh_set_pending(). scsi_timeout() will call scsi_abort_command()
> which will queue delayed work, and the worker function scmd_eh_abort_handler()
> will call scsi_eh_scmd_add(), which calls
> scsi_host_set_state(shost, SHOST_RECOVERY).
>
> We did add a guard in libata in commit e20e81a24a4d ("ata: libata-core: do not
> issue non-internal commands once EH is pending"), so that we will defer commands
> even when EH is pending. But in the case of timeout, there will be no error IRQ,
> so we will not do an early return in __ata_scsi_queuecmd(), so we could set
> qc->deferred_qc up until the worker function scmd_eh_abort_handler() has called
> scsi_host_set_state(shost, SHOST_RECOVERY).
>
> Again, adding some code to ata_scsi_cmd_error_handler() to clear ap->deferred_qc
> should handle this case.
>
>
> I would probably hack some QEMU to not send a reply, so that we will get block
> layer timeouts, because right now, ata_scsi_cmd_error_handler() seems like the
> most likely problematic code to me.
>
>
> Kind regards,
> Niklas
>


--
Damien Le Moal
Western Digital Research