On January 28, 2002 11:34 pm, Brian Gerst wrote:
> Rick Stevens wrote:
> >
> > Daniel Phillips wrote:
> > [snip]
> >
> > > I've been a little slow to 'publish' this on lkml because I wanted a working
> > > prototype first, as proof of concept. My efforts to dragoon one or more of
> > > the notably capable kernelnewbies crowd into coding it haven't been
> > > particularly successful, perhaps due to the opacity of the code in question
> > > (pgtable.h et al). So I've begun coding it myself, and it's rather slow
> > > going, again because of the opacity of the code. Oh, and the difficult
> > > nature of the problem itself, since it requires understanding pretty much all
> > > of Unix memory management semantics first, including the bizarre (and useful)
> > > properties of process forking.
> > >
> > > The good news is, the code changes required do fit very cleanly into the
> > > current design. Changes are required in three places I've identified so far:
> > >
> > > copy_page_range
> > > Intead of copying the page tables, just increment their use counts
> > >
> > > zap_page_range:
> > > If a page table is shared according to its use count, just decrement
> > > the use count and otherwise leave it alone.
> > >
> > > handle_mm_fault:
> > > If a page table is shared according to its use count and the faulting
> > > instruction is a write, allocate a new page table and do the work that
> > > would have normally been done by copy_page_range at fork time.
> > > Decrement the use count of the (perhaps formerly) shared page table.
> >
> > Perhaps I'm missing this, but I read that as the child gets a reference
> > to the parent's memory. If the child attempts a write, then new memory
> > is allocated, data copied and the write occurs to this new memory. As
> > I read this, it's only invoked on a child write.
> >
> > Would this not leave a hole where the parent could write and, since the
> > child shares that memory, the new data would be read by the child? Sort
> > of a hidden shm segment? If so, I think we've got problems brewing.
> > Now, if a parent write causes the same behaviour as a child write, then
> > my point is moot.
> >
> > Could you clarify this for me? I'm probably way off base here.
>
> Fortunately, the kernel's page mappings are shared by all processes
> (except the top level), so if you mark the page containing the user page
> table as read-only from the child, it will also be read-only in the
> parent.
Actually, the page tables don't even have to be marked read-only. It's
done with use counts instead: use count > 1 -> shared.
We don't have to take faults on page tables themselves since we are
always servicing a fault, or we are carrying out some explicit VM
operation when we encounter the page table. That means we are already
in control and don't have to get clever. Or as it's been expressed
here before, we don't have to 'play stupid fault tricks', which would
be far less efficient and likely more complex to code.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Jan 31 2002 - 21:00:55 EST