Re: [patch 1/4] mmu_notifier: Core code

From: Robin Holt
Date: Fri Jan 25 2008 - 14:36:11 EST


On Fri, Jan 25, 2008 at 11:03:07AM -0800, Christoph Lameter wrote:
> > > > Shouldn't this really be protected by the down_write(mmap_sem)? Maybe:
> > >
> > > Ok. We could switch this to mmap_sem protection for the mm_struct but the
> > > rmap notifier is not associated with an mm_struct. So we would need to
> > > keep it there. Since we already have a spinlock: Just use it for both to
> > > avoid further complications.
> >
> > But now you are putting a global lock in where it is inappropriate.
>
> The lock is only used during register and unregister. Very low level
> usage.

Seems to me that is the same argument used for lock_kernel. I am saying
we have a perfectly reasonable way to seperate the protections down to
their smallest. For the things hanging off the mm, mmap_sem, for the
other list, a list specific lock.

Keep in mind that on a 2048p SSI MPI job starting up, we have 2048 ranks
doing this at the same time 6 times withing their address range. That
seems like a lock which could get hot fairly quickly. It may be for a
short period during startup and shutdown, but it is there.


>
> > > > XPMEM, would also benefit from a call early. We could make all the
> > > > segments as being torn down and start the recalls. We already have
> > > > this code in and working (have since it was first written 6 years ago).
> > > > In this case, all segments are torn down with a single message to each
> > > > of the importing partitions. In contrast, the teardown code which would
> > > > happen now would be one set of messages for each vma.
> > >
> > > So we need an additional global teardown call? Then we'd need to switch
> > > off the vma based invalidate_range()?
> >
> > No, EXACTLY what I originally was asking for, either move this call site
> > up, introduce an additional mmu_notifier op, or place this one in two
> > locations with a flag indicating which call is being made.
>
> Add a new invalidate_all() call? Then on exit we do
>
> 1. invalidate_all()

That will be fine as long as we can unregister the ops notifier and free
the structure. Otherwise, we end up being called needlessly.

>
> 2. invalidate_range() for each vma
>
> 3. release()
>
> We cannot simply move the call up because there will be future range
> callbacks on vma invalidation.

I am not sure what this means. Right now, if you were to notify XPMEM
the process is exiting, we would take care of all the recalling of pages
exported by this process, clearing those pages cache lines from cache,
and raising memory protections. I would assume that moving the callout
earlier would expect the same of every driver.


Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/