Re: [PATCH 3/3] ia64: Call migration code on correctable errors v2
From: Russ Anderson
Date: Fri May 02 2008 - 14:50:44 EST
On Fri, May 02, 2008 at 07:45:55PM +0200, Andi Kleen wrote:
> Russ Anderson <rja@xxxxxxx> writes:
>
> > Migrate data off pages with correctable memory errors. This patch is the
> > ia64 specific piece. It connects the CPE handler to the page migration
> > code. It is implemented as a kernel loadable module, similar to the mca
> > recovery code (mca_recovery.ko). This allows the feature to be turned off
> > by uninstalling the module. Creates /proc/badram to display bad page
> > information and free bad pages.
>
> How do you know what pages have excessive errors? And how is excessive defined?
> Surely you don't keep a per page error count? It's unclear from your patch.
The code migrates on the first correctable error on a page, so "excessive"
is one. Yes, keeping a per page count gets problematic, especially as
memories get larger. The issue of the right metric for when to migrate
will always be debatable and the "right" answer likely will depend on
the physical memory technology.
> Anyways I don't think this should be ia64 specific, but generic code.
The actual migration code is generic, in mm/migrate.c.
The ia64 kernel module piece is very arch specific. It ties into the
ia64 CPE handler. It gets the page address from the CPE record, based
on the ia64 error handling spec. Each arch will have a different way of
determining the physical address of the correctable error (for example).
> I also have my doubts about making such small code subsystems modules. Modules
> always get rounded to pages so it ultimatively just wastes memory.
CONFIG_IA64_CPE_MIGRATE=m builds it as module.
CONFIG_IA64_CPE_MIGRATE=y builds it as part of the kernel.
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@xxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/