Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

From: Jan-Bernd Themann
Date: Wed Feb 13 2008 - 10:18:29 EST


Hi Dave,

On Monday 11 February 2008 17:47, Dave Hansen wrote:
> Also, just ripping down and completely re-doing the entire mass of cards
> every time a 16MB area of memory is added or removed seems like an
> awfully big sledgehammer to me. I would *HATE* to see anybody else
> using this driver as an example to work off of? Can't you just keep
> track of which areas the driver is actually *USING* and only worry about
> changing mappings if that intersects with an area having hotplug done on
> it?


to form a base for the eHEA memory add / remove concept discussion:

Explanation of the current eHEA memory add / remove concept:

Constraints imposed by HW / FW:
- eHEA has own MMU
- eHEA  Memory Regions (MRs) are used by the eHEA MMU  to translate virtual
  addresses to absolute addresses (like DMA mapped memory on a PCI bus)
- The number of MRs is limited (not enough to have one MR per packet)
- Registration of MRs is comparativley slow as done via slow firmware call
(H_CALL)
- MRs can have a maximum size of the memory available under linux
- MRs cover a contiguous virtual memory block (no holes)

Because of this there is just one big MR that covers entire kernel memory.
We also need a mapping table from kernel addresses to this
contiguous "virtual memory IO space" (here called ehea_bmap).

- When memory is added / removed to LPAR (and linux), the MR has to be updated.
  This can only be done by destroying and recreating the MR. There is no H_CALL
  to modify MR size. To find holes in the linux kernel memory layout we have to
  iterate over the memory sections for recreating a ehea_bmap
(otherwise MR would be bigger then available memory causing the
registration to fail)

- DLPAR userspace tools, kernel, driver, firmware and HMC are involved in that
  process on System p

Memory add: version without a external memory notifier call
- new memory used in a transfer_xmit will result in a "ehea_bmap
translation miss", which triggers a rebuild and reregistration
  of the ehea_bmap based on the current kernel memory setup.
- advantage: the number of MR rebuilds is reduced significantly compared to
a rebuild for each 16MB chunk of memory added.

Memory add: version with external notifier call:
- We still need a ehea_bmap (whatever structure it has)

Memory remove with notifier:
- We have to rebuild the ehea_bmap instantly to remove the pages that are
no longer available. Without doing that, the firmware (pHYP) cannot remove
that memory from the LPAR. As we don't know if or how many additional
sections are to be removed before the DLPAR user space tool tells the
firmware to remove the memory, we can't wait with the rebuild.


Our current understanding about the current Memory Hotplug System are
(please correct me
if I'm wrong):

- depends on sparse mem
- only whole memory sections are added / removed
- for each section a memory resource is registered


>From the driver side we need:
- some kind of memory notification mechanism.
  For memory add we can live without any external memory notification
event. For memory remove we do need an external trigger (see explanation
above).
- a way to iterate over all kernel pages and a way to detect holes in the
kernel memory layout in order to build up our own ehea_bmap.


Memory notification trigger:
- These triggers exist, an exported "register_memory_notifier" /
  "unregister_memory_notifier" would work in this scheme

Functions to use while building ehea_bmap + MRs:
- Use either the functions that are used by the memory hotplug system as
well, that means using the section defines + functions (section_nr_to_pfn,
  pfn_valid)
- Use currently other not exported functions in kernel/resource.c, like
walk_memory_resource (where we would still need the maximum possible number
of pages NR_MEM_SECTIONS)
- Maybe some kind of new interface?

What would you suggest?

Regards,
Jan-Bernd & Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/