Re: [PATCH 1/3] x86,mm/pat: Use generic interval trees

From: Michel Lespinasse
Date: Wed Aug 21 2019 - 17:57:13 EST


On Tue, Aug 13, 2019 at 03:46:18PM -0700, Davidlohr Bueso wrote:
> o The border cases for overlapping differ -- interval trees are closed,
> while memtype intervals are open. We need to maintain semantics such
> that conflict detection and getting the lowest match does not change.

Agree on the need to maintain semantics.

As I had commented some time ago, I wish the interval trees used [start,end)
intervals instead of [start,last] - it would be a better fit for basically
all of the current interval tree users.

I'm not sure where to go with this - would it make sense to add a new
interval tree header file that uses [start,end) intervals (with the
thought of eventually converting all current interval tree users to it)
instead of adding one more use of the less-natural [start,last]
interval trees ?

> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
> index fa16036fa592..1be4d1856a9b 100644
> --- a/arch/x86/mm/pat_rbtree.c
> +++ b/arch/x86/mm/pat_rbtree.c
> @@ -12,7 +12,7 @@
> #include <linux/seq_file.h>
> #include <linux/debugfs.h>
> #include <linux/kernel.h>
> -#include <linux/rbtree_augmented.h>
> +#include <linux/interval_tree_generic.h>
> #include <linux/sched.h>
> #include <linux/gfp.h>
>
> @@ -34,68 +34,41 @@
> * memtype_lock protects the rbtree.
> */
>
> -static struct rb_root memtype_rbroot = RB_ROOT;
> +static struct rb_root_cached memtype_rbroot = RB_ROOT_CACHED;
> +
> +#define START(node) ((node)->start)
> +#define END(node) ((node)->end)
> +INTERVAL_TREE_DEFINE(struct memtype, rb, u64, subtree_max_end,
> + START, END, static, memtype_interval)
>
> static int is_node_overlap(struct memtype *node, u64 start, u64 end)
> {
> - if (node->start >= end || node->end <= start)
> + /*
> + * Unlike generic interval trees, the memtype nodes are ]a, b[

I think the memtype nodes are [a, b) (which one could also write as [a, b[
depending on their local customs - but either way, closed on the start side
and open on the end side) ?

> + * therefore we need to adjust the ranges accordingly. Missing
> + * an overlap can lead to incorrectly detecting conflicts,
> + * for example.
> + */
> + if (node->start + 1 >= end || node->end - 1 <= start)
> return 0;
>
> return 1;
> }

All right, now I am *really* confused.

My understanding is as follows:
* the PAT code wants to use [start, end( intervals
* interval trees are defined to use [start, last] intervals with last == end-1

At first, I thought that you were handling that by removing 1 from the
end of the interval, to adjust between the PAT and interval tree
definitions. But, I don't see you doing that anywhere.

Then, I thought that you were using [start, end( intervals everywhere,
and the interval tree functions memtype_interval_iter_first and
memtype_interval_iter_next would just return too many candidate
matches as as you are passing "end" instead of "last" == end-1 as the
interval endpoint, but then you would filter out the extra intervals
using is_node_overlap(). But, if that is the case, then I don't
understand why you need to redefine is_node_overlap() here.

Could you help me out by defining if the intervals are open or closed,
both when stored in the node->start and node->end values, and when
passed as start and end arguments to the functions in this file ?



Generally, I think using the interval tree code in this file is a good idea,
but 1- I do not understand how you are handling the differences in interval
definitions in this change, and 2- I wonder if it'd be better to just have
a version of the interval tree code that handles [start,end( half-open
intervals like we do everywhere else in the kernel.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.