Re: fuse uring / wake_up on the same core

From: Bernd Schubert
Date: Fri Mar 24 2023 - 18:44:24 EST


On 3/24/23 20:50, Bernd Schubert wrote:
> Ingo, Peter,
>
> I would like to ask how to wake up from a waitq on the same core. I have
> tried __wake_up_sync()/WF_SYNC, but I do not see any effect.
>
> I'm currently working on fuse/uring communication patches, besides uring
> communication there is also a queue per core. Basic bonnie++ benchmarks
> with a zero file size to just create/read(0)/delete show a ~3x IOPs
> difference between CPU bound bonnie++ and unbound - i.e. with these
> patches it _not_ fuse-daemon that needs to be bound, but the application
> doing IO to the file system. We basically have
>

[...]

> With less files the difference becomes a bit smaller, but is still very
> visible. Besides cache line bouncing, I'm sure that CPU frequency and
> C-states will matter - I could tune that it in the lab, but in the end I
> want to test what users do (I had recently checked with large HPC center
> - Forschungszentrum Juelich - their HPC compute nodes are not tuned up,
> to save energy).
> Also, in order to really tune down latencies, I want want to add a
> struct file_operations::uring_cmd_iopoll thread, which will spin for a
> short time and avoid most of kernel/userspace communication. If
> applications (with n-nthreads < n-cores) then get scheduled on different
> core differnent rings will be used, result in
> n-threads-spinning > n-threads-application
>
>
> There was already a related thread about fuse before
>
> https://lore.kernel.org/lkml/1638780405-38026-1-git-send-email-quic_pragalla@xxxxxxxxxxx/
>
> With the fuse-uring patches that part is basically solved - the waitq
> that that thread is about is not used anymore. But as per above,
> remaining is the waitq of the incoming workq (not mentioned in the
> thread above). As I wrote, I have tried
> __wake_up_sync((x), TASK_NORMAL), but it does not make a difference for
> me - similar to Miklos' testing before. I have also tried struct
> completion / swait - does not make a difference either.
> I can see task_struct has wake_cpu, but there doesn't seem to be a good
> interface to set it.
>
> Any ideas?
>

How much of hack is this patch?

[RFC] fuse: wake on the same core / disable migrate before wait

From: Bernd Schubert <bschubert@xxxxxxx>

Avoid bouncing cores on wake, especially with uring everything
is core affine - bouncing badly decreases performance.
With read/write(/dev/fuse) it is not good either - needs to be tested
for negative impacts.
---
fs/fuse/dev.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index e82db13da8f6..d47b6a492434 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -372,12 +372,17 @@ static void request_wait_answer(struct fuse_req *req)
struct fuse_iqueue *fiq = &fc->iq;
int err;

+ /* avoid bouncing between cores on wake */
+ pr_devel("task=%p before wait on core: %u wake_cpu: %u\n",
+ current, task_cpu(current), current->wake_cpu);
+ migrate_disable();
+
if (!fc->no_interrupt) {
/* Any signal may interrupt this */
err = wait_event_interruptible(req->waitq,
test_bit(FR_FINISHED, &req->flags));
if (!err)
- return;
+ goto out;

set_bit(FR_INTERRUPTED, &req->flags);
/* matches barrier in fuse_dev_do_read() */
@@ -391,7 +396,7 @@ static void request_wait_answer(struct fuse_req *req)
err = wait_event_killable(req->waitq,
test_bit(FR_FINISHED, &req->flags));
if (!err)
- return;
+ goto out;

spin_lock(&fiq->lock);
/* Request is not yet in userspace, bail out */
@@ -400,7 +405,7 @@ static void request_wait_answer(struct fuse_req *req)
spin_unlock(&fiq->lock);
__fuse_put_request(req);
req->out.h.error = -EINTR;
- return;
+ goto out;
}
spin_unlock(&fiq->lock);
}
@@ -410,6 +415,11 @@ static void request_wait_answer(struct fuse_req *req)
* Wait it out.
*/
wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
+
+out:
+ migrate_enable();
+ pr_devel("task=%p after wait on core: %u\n", current, task_cpu(current));
+
}

static void __fuse_request_send(struct fuse_req *req)