Re: How to handle a hugepage with bad physical memory?

From: Christoph Lameter
Date: Wed Nov 16 2005 - 14:58:11 EST


On Wed, 16 Nov 2005, Robin Holt wrote:

> Russ Anderson recently introduced a patch into ia64 that changes MCA
> behavior. When the MCA is caused by a user reference to a users memory,
> we put an extra reference on the page and kill the user. This leaves
> the working memory available for other jobs while causing a leak of the
> bad page.
>
> I don't know if Russ has done any testing with hugetlbfs pages. I preface
> the remainder of my comments with a huge "I don't know anything"
> disclaimer.
>
> With the new hugepages concept, would it be possible to only mark
> the default pagesize portion of a hugepage as bad and then return the
> remainder of the hugepage for normal use? What would we basically need
> to do to accomplish this? Are there patches in the community which we
> should wait to see how they progress before we do any work on this front?

On IA64 we have one PTE for a huge page in a different region, so we
cannot unmap a page sized section. Other architectures may have PTEs for
each page sized section of a huge page. For those it may make sense
(but then the management of the page is done via the first page_struct,
which likely results in some challenging VM issues).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/