RE: [PATCH v3 6/7] fs: introduce write-hint start point for in-kernel hints

From: kanchan
Date: Wed Apr 03 2019 - 10:30:40 EST


> Which means that when a new userspace hint is defined, all the kernel
hints change numbers and, AIUI, that changes how the kernel hints are mapped
to the underlying device.

Currently adding a new user-space hint requires modifying code and
installing modified kernel. So I felt it would be less probable to encounter
that situation while in production workload.


>The kernel hints need to be mapped to the highest supported number a work
down, while userspace starts at the lowest and works up.

Actually, I initially implemented "blk_write_hint_to_streamid" function like
that i.e. as per the table you've put. But that code involved more
checks/branches (condition checks) than the current one.
Also, request queue contained this statically defined array called
"write_hints", which nvme driver updated to gather stream stats.
Snippet below -

if (streamid < ARRAY_SIZE(req->q->write_hints))
req->q->write_hints[streamid] += blk_rq_bytes(req) >> 9;

That requires nvme driver doing a reverse conversion from streamid to
array-index(some more conditional checks) if kernel-hints get mapped to
highest possible stream numbers.


Overall, will it not be about adding additional run-time checks in I/O path
(which we will always execute) for the condition which will happen only if
one chooses to extend user-space hint count in between?


Thanks,

-----Original Message-----
From: Dave Chinner [mailto:david@xxxxxxxxxxxxx]
Sent: Monday, April 01, 2019 10:43 AM
To: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx;
linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
linux-ext4@xxxxxxxxxxxxxxx; axboe@xxxxxx; prakash.v@xxxxxxxxxxx;
anshul@xxxxxxxxxxx; joshiiitr@xxxxxxxxx
Subject: Re: [PATCH v3 6/7] fs: introduce write-hint start point for
in-kernel hints

On Fri, Mar 29, 2019 at 01:23:51PM +0530, Kanchan Joshi wrote:
> kernel-mode components can define own write-hints using
> "WRITE_LIFE_KERN_MIN" as base.
>
> Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
> ---
> include/linux/fs.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h index
> 29d8e2c..6a2673e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -291,6 +291,8 @@ enum rw_hint {
> WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
> WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
> WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
> +/* Kernel should use write-hint starting from this */
> + WRITE_LIFE_KERN_MIN,

Which means that when a new userspace hint is defined, all the kernel hints
change numbers and, AIUI, that changes how the kernel hints are mapped to
the underlying device.

The kernel hints need to be mapped to the highest supported number a work
down, while userspace starts at the lowest and works up. The "kernel to
device stream id" needs to translate the kernel hints down to the upper
range of the device hints.

I think the mapping range the code uses should be:

HINT Type device
0 USER 0 0
1 USER 1 1
......
n USER MAX n

{n,65535-m} UNUSED {n,dev_max-m}

65535 - m KERN_MIN, dev_max - m
......
65532 KERN 3 dev_max - 3
65533 KERN 2 dev_max - 2
65534 KERN 1 dev_max - 1
65535 KERN 0 dev_max

i.e. if you look at the mapping as a signed short, >= 0 are user hints, < 0
are kernel hints. This provides an obvious, simple way to map the kernel
hints to the upper range of the device hint range. It also provides a simple
way to compress both user and kernel hints into a limited device hint range
- kernel always uses the top device hint, user is limited to the rest of the
range....

This means the ranges don't overlap or change at either the code or the
device level as we add more user and kernel hint channels in the future.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx