Re: [PATCH] cowlinks v2

From: Eric W. Biederman
Date: Sun Apr 04 2004 - 03:17:22 EST


Jamie Lokier <jamie@xxxxxxxxxxxxx> writes:

> Eric W. Biederman wrote:
> > > We'd like cowlinks that are an invisible filesystem optimisation.
> > > That means you "copy" a file and it behaves the same as if you copy a file.
> >
> > Exactly so they would not share the same pages in RAM.
>
> That is one way to implement it.
>
> > > Btw, I'm not suggesting sharing page cache entries.
> >
> > It sounded like you assumed sharing of page cache entries above.
> > How do you get to step 2 if the cow copies don't share the same page
> > cache entries?
>
> Ah. A misunderstanding on my part.
>
> I mean not sharing page cache entries between different
> address_spaces, but sharing between different cowlinks which use the
> same underlying address_space.
>
> I had in mind that since each cowlink is a separate inode, but both
> inodes point to a shared data structure in the filesystem, they would
> map pages out of a shared address_space representing that data
> structure. You've pointed out that it isn't necessary to do that, and
> it's probably simpler not to.

Especially since the VM layer has no concept of COW except for private
anonymous pages. Which does not map to a POSIX cow on files.

> Now I see your point. Page sharing could be avoided completely, if
> when mapping a cowlink the page was _copied_ from the shared
> address_space to the cowlink's own address_space. Copying also solves
> the mlock() problem. (A shared address_space is still required, because
> you may cowlink a file which has dirty pages in RAM).

Either that your you call fsync(file) as part of generating the cow
copy.

> Copying raises a different problem: what to do when a non-cowlink file
> is mapped (PROT_READ), and then it's cowlinked while the mapping is in
> place. The non-cowlink inode gets converted to a cowlink inode. The
> pages are hashed in the original address_space, and you now have a
> mapping of a cowlink file where the mapped pages are _not copies_ of
> pages in the shared address_space.

If you don't flush things to disk first there are certainly some sharing
issues. Flushing the data to the address_space of the shared inode
should work just as well though.

What you must have is a clear state change from caching a normal file
to cache a cow file. Where certain parts of the behavior changes.

What this needs to do depends on the VFS at the time.

If sharing is introduced past that state change things get tricky to
manage. I just don't want to think about those issues just now...

Eric




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/