Re: [PATCH 10/21] block: Add fops atomic write support

From: John Garry
Date: Mon Dec 04 2023 - 08:14:48 EST

Next message: Andy Shevchenko: "Re: [PATCH v3 02/10] gpio: wm831x: use gpiochip_dup_line_label()"
Previous message: Ayush Singh: "[PATCH V3] greybus: gb-beagleplay: Ensure le for values in transport"
In reply to: Ming Lei: "Re: [PATCH 10/21] block: Add fops atomic write support"
Next in thread: Ming Lei: "Re: [PATCH 10/21] block: Add fops atomic write support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I added this here (as opposed to the caller), as I was not really worried
about speeding up the failure path. Are you saying to call even earlier in
submission path?

atomic_write_unit_min is one hardware property, and it should be checked
in blk_queue_atomic_write_unit_min_sectors() from beginning, then you
can avoid this check every other where.

ok, but we still need to ensure in the submission path that the block device actually supports atomic writes - this was the initial check.

+ if (pos % atomic_write_unit_min_bytes)
+ return false;
+ if (iov_iter_count(iter) % atomic_write_unit_min_bytes)
+ return false;
+ if (!is_power_of_2(iov_iter_count(iter)))
+ return false;
+ if (iov_iter_count(iter) > atomic_write_unit_max_bytes)
+ return false;
+ if (pos % iov_iter_count(iter))
+ return false;

I am a bit confused about relation between atomic_write_unit_max_bytes and
atomic_write_max_bytes.

I think that naming could be improved. Or even just drop merging (and
atomic_write_max_bytes concept) until we show it to improve performance.

So generally atomic_write_unit_max_bytes will be same as
atomic_write_max_bytes, however it could be different if:
a. request queue nr hw segments or other request queue limits needs to
restrict atomic_write_unit_max_bytes
b. atomic_write_unit_max_bytes does not need to be a power-of-2 and
atomic_write_max_bytes does. So essentially:
atomic_write_unit_max_bytes = rounddown_pow_of_2(atomic_write_max_bytes)

plug merge often improves sequential IO perf, so if the hardware supports
this way, I think 'atomic_write_max_bytes' should be supported from the
beginning, such as:

- user space submits sequential N * (4k, 8k, 16k, ...) atomic writes, all can
be merged to single IO request, which is issued to driver.

Or

- user space submits sequential 4k, 4k, 8k, 16K, 32k, 64k atomic writes, all can
be merged to single IO request, which is issued to driver.

Right, we do expect userspace to use a fixed block size, but we give scope in the API to use variable size.

The hardware should recognize unit size by start LBA, and check if length is
valid, so probably the interface might be relaxed to:

1) start lba is unit aligned, and this unit is in the supported unit
range(power_2 in [unit_min, unit_max])

2) length needs to be:

- N * this_unit_size
- <= atomic_write_max_bytes

Please note that we also need to consider:
- any atomic write boundary (from NVMe)
- virt boundary (from NVMe)

And, as I mentioned elsewhere, I am still not 100% comfortable that we don't pay attention to regular max_sectors_kb...

Here the max IO length is limited to be <= atomic_write_unit_max_bytes,
so looks userspace can only submit IO with write-atomic-unit naturally
aligned IO(such as, 4k, 8k, 16k, 32k, ...),

correct

but these user IOs are
allowed to be merged to big one if naturally alignment is respected and
the merged IO size is <= atomic_write_max_bytes.

correct, but the resultant merged IO does not have have to be naturally
aligned.

Is my understanding right?

Yes, but...

If yes, I'd suggest to document the point,
and the last two checks could be change to:

/* naturally aligned */
if (pos % iov_iter_count(iter))
return false;

if (iov_iter_count(iter) > atomic_write_max_bytes)
return false;

.. we would not be merging at this point as this is just IO submission to
the block layer, so atomic_write_max_bytes does not come into play yet. If
you check patch 7/21, you will see that we limit IO size to
atomic_write_max_bytes, which is relevant merging.

I know the motivation of atomic_write_max_bytes, and now I am wondering
atomic_write_max_bytes may be exported to userspace for the sake of
atomic write performance.

It is available from sysfs for the request queue, but in an earlier series Dave Chinner suggested doing more to expose to the application programmer. So here that would mean a statx member. I'm still not sure... it just didn't seem like a detail which the user would need to know or be able to do much with.

Thanks,
John

Next message: Andy Shevchenko: "Re: [PATCH v3 02/10] gpio: wm831x: use gpiochip_dup_line_label()"
Previous message: Ayush Singh: "[PATCH V3] greybus: gb-beagleplay: Ensure le for values in transport"
In reply to: Ming Lei: "Re: [PATCH 10/21] block: Add fops atomic write support"
Next in thread: Ming Lei: "Re: [PATCH 10/21] block: Add fops atomic write support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]