Hi Suzuki,
I had to refresh my mind a fair bit to be able to review this, so I
thought it may be useful to just remind us all what the constraints of
this whole thing is, and make sure we agree on this:
1. We fix the IPA max width to 40 bits
2. We don't support systems with a PARange smaller than 40 bits (do we
check this anywhere or document this anywhere?)
3. We always assume we are running on a system with PARange of 40 bits
and we are therefore constrained to use concatination.
As an implication of (3) above, this code will attempt to allocate 256K
of physically contiguous memory for each VM on the system. That is
probably ok, but I just wanted to point it out in case it raises any
eyebrows for other people following this thread.
level: 0 1 2 3
bits : [47] [46 - 36] [35 - 25] [24 - 14] [13 - 0]
^ ^ ^
| | |
host entry | x---- stage-2 entry
|
IPA -----x
Isn't the stage-2 entry using bits [39:25], because you resolve
more than 11 bits on the initial level of lookup when you concatenate
tables?
The following conditions hold true for all cases(with 40bit IPA)
1) The stage-2 entry level <= 2
2) Number of fake page-table entries is in the inclusive range [0, 2].
nit: Number of fake levels of page tables
+/*
+ * At stage-2 entry level, upto 16 tables can be concatenated and
nit: Can you rewrite the first part of this comment to be in line with
the ARM ARM, such as: "The stage-2 page tables can concatenate up to 16
tables at the inital level" ?
+ * the hardware expects us to use concatenation, whenever possible.
I think the 'hardware expects us' is a bit vague. At least I find this
whole part of the architecture incredibly confusing already, so it would
help me in the future if we put something like:
"The hardware requires that we use concatenation depending on the
supported PARange and page size. We always assume the hardware's PASize
is maximum 40 bits in this context, and with a fixed IPA width of 40
bits, we concatenate 2 tables for 4K pages, 16 tables for 16K pages, and
do not use concatenation for 64K pages."
Did I get this right?
+ * So, number of page table levels for KVM_PHYS_SHIFT is always
+ * the number of normal page table levels for (KVM_PHYS_SHIFT - 4).
+ */
+#define HYP_PGTABLE_LEVELS ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4)
I see the math lines up, but I don't think it's intuitive, as I don't
understand why it's obvious that it's the 'normal' page table for
KVM_PHYS_SHIFT - 4.
I see this as an architectural limitation given in the ARM ARM, and we
should just refer to that, and do:
#if PAGE_SHIFT == 12
#define S2_PGTABLE_LEVELS 3
#else
#define S2_PGTABLE_LEVELS 2
#endif
+/* Number of bits normally addressed by HYP_PGTABLE_LEVELS */
+#define HYP_PGTABLE_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(HYP_PGTABLE_LEVELS + 1)
+#define HYP_PGDIR_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(HYP_PGTABLE_LEVELS)
+#define HYP_PGTABLE_ENTRY_LEVEL (4 - HYP_PGTABLE_LEVELS)
We are introducing a huge number of defines here, which are all more or
less opaque to anyone coming back to this code.
I may be extraordinarily stupid, but I really need each define explained
in a comment to be able to follow this code (those above and the
S2_ENTRY_TABLES below).
I actually wonder from looking at this whole patch if we even want to go
here. Maybe this is really the time to say that we should get rid of
the dependency between the host page table layout and the stage-2 page
table layout.
Since the rest of this series looks pretty good, I'm wondering if you
should just disable KVM in the config system if 16K pages is selected,
and then you can move ahead with this series while we fix KVM properly?