Re: [PATCH v5 1/1] Allow non-extending parallel direct writes on the same file.

From: Bernd Schubert
Date: Tue Sep 13 2022 - 04:44:20 EST




On 6/17/22 14:43, Miklos Szeredi wrote:
On Fri, 17 Jun 2022 at 11:25, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote:

Hi Miklos,

On 6/17/22 09:36, Miklos Szeredi wrote:
On Fri, 17 Jun 2022 at 09:10, Dharmendra Singh <dharamhans87@xxxxxxxxx> wrote:

This patch relaxes the exclusive lock for direct non-extending writes
only. File size extending writes might not need the lock either,
but we are not entirely sure if there is a risk to introduce any
kind of regression. Furthermore, benchmarking with fio does not
show a difference between patch versions that take on file size
extension a) an exclusive lock and b) a shared lock.

I'm okay with this, but ISTR Bernd noted a real-life scenario where
this is not sufficient. Maybe that should be mentioned in the patch
header?


the above comment is actually directly from me.

We didn't check if fio extends the file before the runs, but even if it
would, my current thinking is that before we serialized n-threads, now
we have an alternation of
- "parallel n-1 threads running" + 1 waiting thread
- "blocked n-1 threads" + 1 running

I think if we will come back anyway, if we should continue to see slow
IO with MPIIO. Right now we want to get our patches merged first and
then will create an updated module for RHEL8 (+derivatives) customers.
Our benchmark machines are also running plain RHEL8 kernels - without
back porting the modules first we don' know yet what we will be the
actual impact to things like io500.

Shall we still extend the commit message or are we good to go?

Well, it would be nice to see the real workload on the backported
patch. Not just because it would tell us if this makes sense in the
first place, but also to have additional testing.


Sorry for the delay, Dharmendra and me got busy with other tasks and Horst (in CC) took over the patches and did the MPIIO benchmarks on 5.19.

Results with https://github.com/dchirikov/mpiio.git

unpatched patched patched
(extending) (extending) (non-extending)
----------------------------------------------------------
MB/s MB/s MB/s
2 threads 2275.00 2497.00 5688.00
4 threads 2438.00 2560.00 10240.00
8 threads 2925.00 3792.00 25600.00
16 threads 3792.00 10240.00 20480.00


(Patched-nonextending is a manual operation on the file to extend the size, mpiio does not support that natively, as far as I know.)



Results with IOR (HPC quasi standard benchmark)

ior -w -E -k -o /tmp/test/home/hbi/test/test.1 -a mpiio -s 1280 -b 8m -t 8m


unpatched patched
(extending) (extending)
-------------------------------------------
MB/s MB/s
2 threads 2086.10 2027.76
4 threads 1858.94 2132.73
8 threads 1792.68 4609.05
16 threads 1786.48 8627.96


(IOR does not allow manual file extension, without changing its code.)

We can see that patched non-extending gives the best results, as Dharmendra has already posted before, but results are still
much better with the patches in extending mode. My assumption is here instead serializing N-writers, there is an alternative
run of
- 1 thread extending, N-1 waiting
- N-1 writing, 1 thread waiting
in the patched version.



Thanks,
Bernd