Re: Why is O_DSYNC on linux so slow / what's wrong with my SSD?

From: Chinmay V S
Date: Wed Nov 20 2013 - 10:23:14 EST


Hi Stefan,

> thanks for your great and detailed reply. I'm just wondering why an
> intel 520 ssd degrades the speed just by 2% in case of O_SYNC. intel 530
> the newer model and replacement for the 520 degrades speed by 75% like
> the crucial m4.
>
> The Intel DC S3500 instead delivers also nearly 98% of it's performance
> even under O_SYNC.

If you have confirmed the performance numbers, then it indicates that
the Intel 530 controller is more advanced and makes better use of the
internal disk-cache to achieve better performance (as compared to the
Intel 520). Thus forcing CMD_FLUSH on each IOP (negating the benefits
of the disk write-cache and not allowing any advanced disk controller
optimisations) has a more pronouced effect of degrading the
performance on Intel 530 SSDs. (Someone with some actual info on Intel
SSDs kindly confirm this.)

>> To simply disable this behaviour and make the SYNC/DSYNC behaviour and
>> performance on raw block-device I/O resemble the standard filesystem
>> I/O you may want to apply the following patch to your kernel -
>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
>>
>> The above patch simply disables the CMD_FLUSH command support even on
>> disks that claim to support it.
>
> Is this the right one? By assing ahci_dummy_read_id we disable the
> CMD_FLUSH?
>
> What is the risk of that one?

Yes, https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba is the
right one. The dummy read_id() provides a hook into the initial
disk-properties discovery process when the disk is plugged-in. By
explicitly negating the bits that indicate cache and
flush-cache(CMD_FLUSH) support, we can ensure that the block driver
does NOT issue CMD_FLUSH commands to the disk. Note that this does NOT
disable the write-cache on the disk itself i.e. performance improves
due to the on-disk write-cache in the absence of any CMD_FLUSH
commands from the host-PC.

Theoretically, it increases the chances of data loss i.e. if power is
removed while the write is in progress from the app. Personally though
i have found that the impact of this is minimal because SYNC on a raw
block device with CMD_FLUSH does NOT guarantee atomicity in case of a
power-loss. Hence, in the event of a power loss, applications cannot
rely on SYNC(with CMD_FLUSH) for data integrity. Rather they have to
maintain other data-structures with redundant disk metadata (which is
precisely what modern file-systems do). Thus, removing CMD_FLUSH
doesn't really result in a downside as such.

The main thing to consider when applying the above simple patch is
that it is system-wide. The above patch prevents the host-PC from
issuing CMD_FLUSH for ALL drives enumerated via SATA/SCSI on the
system.

If this patch works for you, then to restrict the change in behaviour
to a specific disk, you will need to:
1. Identify the disk by its model number within the dummy read_id().
2. Zero the bits ONLY for your particular disk.
3. Return without modifying anything for all other disks.

Try out the above patch and let me know if you have any further issues.

regards
ChinmayVS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/