Re: More thoughts about hwpoison and pageflags compression

From: Andi Kleen
Date: Sat May 30 2009 - 03:48:43 EST

On Sat, May 30, 2009 at 12:29:30AM -0700, Andrew Morton wrote:
> On Sat, 30 May 2009 09:27:58 +0200 Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> > On Fri, May 29, 2009 at 11:53:02PM -0700, Andrew Morton wrote:
> > > On Sat, 30 May 2009 08:37:10 +0200 Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> > >
> > > > So using a separate bit is a sensible choice imho.
> > >
> > > Could you make the feature 64-bit-only and use one of bits 32-63?
> >
> > We could, but these systems can run 32bit kernels too (although
> > it's probably not a good idea). Ok it would be probably possible
> > to make it 64bit only, but I would prefer to not do that.
> >
> > Also even 32bit has still flags free and even if we run out there's an easy
> > path to free more (see my earlier writeup)
> hm. Maybe that should be proven sooner rather than later.

The SPARSEMEM code already has some fallback. I don't know if it works, but
at least the code looks to be there.

* There are three possibilities for how page->flags get
* laid out. The first is for the normal case, without
* sparsemem. The second is for sparsemem when there is
* plenty of space for node and section. The last is when
* we have run out of space and have to fall back to an
* alternate (slower) way of determining the node.
* No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS |
* classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS |
* classic sparse no space for node: | SECTION | ZONE | ... | FLAGS |

* If we did not store the node number in the page then we have to
* do a lookup in the section_to_node_table in order to find which
* node the page belongs to.
#if MAX_NUMNODES <= 256
static u8 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;
static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned;

The other part that could be added is to use a separate hash to go from
page to SECTION (that would be very similar to the old discontig perfect hash
I did to go from pfn to node), then the "SECTION" part would be free for reuse too.

Then you could use the full 32bits. On 32bit we're right now at 22,
hwpoison would be 23. There's still some room.

> Plus we haven't looked into the complexity of the external flags yet.

It would be dumb to do external flags before you actually run out.
After all what good are free bits?

ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at