Re: [PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages

From: Christophe Leroy
Date: Thu Jun 18 2020 - 10:20:33 EST




Le 18/06/2020 Ã 03:00, Michael Ellerman a ÃcritÂ:
Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:
Le 17/06/2020 Ã 16:38, Peter Zijlstra a ÃcritÂ:
On Thu, Jun 18, 2020 at 12:21:22AM +1000, Michael Ellerman wrote:
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
On Mon, Jun 15, 2020 at 12:57:59PM +0000, Christophe Leroy wrote:

+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define __HAVE_ARCH_PTEP_GET
+static inline pte_t ptep_get(pte_t *ptep)
+{
+ pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0};
+
+ return pte;
+}
+#endif

Would it make sense to have a comment with this magic? The casual reader
might wonder WTH just happened when he stumbles on this :-)

I tried writing a helpful comment but it's too late for my brain to form
sensible sentences.

Christophe can you send a follow-up with a comment explaining it? In
particular the zero entries stand out, it's kind of subtle that those
entries are only populated with the right value when we write to the
page table.

static inline pte_t ptep_get(pte_t *ptep)
{
unsigned long val = READ_ONCE(ptep->pte);
/* 16K pages have 4 identical value 4K entries */
pte_t pte = {val, val, val, val);
return pte;
}

Maybe something like that?

This should work as well. Indeed nobody cares about what's in the other
three. They are only there to ensure that ptep++ increases the ptep
pointer by 16 bytes. Only the HW require 4 identical values, that's
taken care of in set_pte_at() and pte_update().

Right, but it seems less error-prone to have the in-memory
representation match what we have in the page table (well that's
in-memory too but you know what I mean).

So we should use the most efficient. Thinking once more, maybe what you
propose is the most efficient as there is no need to load another
register with value 0 in order to write it in the stack.

On 64-bit I'd say it makes zero difference, the only thing that's going
to matter is the load from ptep->pte. I don't know whether that's true
on the 8xx cores though.

On 8xx core, loading a register with value 0 will take one cycle unless there is some bubble left by another instruction (like a load from memory or a taken branch). But that's in the noise.

Christophe