Re: [PATCH V2 0/7] Cleancache (was Transcendent Memory): overview

From: Minchan Kim
Date: Wed Jun 02 2010 - 02:03:30 EST


Hello.

I think cleancache approach is cool. :)
I have some suggestions and questions.

On Sat, May 29, 2010 at 2:35 AM, Dan Magenheimer
<dan.magenheimer@xxxxxxxxxx> wrote:
> [PATCH V2 0/7] Cleancache (was Transcendent Memory): overview
>
> Changes since V1:
> - Rebased to 2.6.34 (no functional changes)
> - Convert to sane types (Al Viro)
> - Define some raw constants (Konrad Wilk)
> - Add ack from Andreas Dilger
>
> In previous patch postings, cleancache was part of the Transcendent
> Memory ("tmem") patchset. ÂThis patchset refocuses not on the underlying
> technology (tmem) but instead on the useful functionality provided for Linux,
> and provides a clean API so that cleancache can provide this very useful
> functionality either via a Xen tmem driver OR completely independent of tmem.
> For example: Nitin Gupta (of compcache and ramzswap fame) is implementing
> an in-kernel compression "backend" for cleancache; some believe
> cleancache will be a very nice interface for building RAM-like functionality
> for pseudo-RAM devices such as SSD or phase-change memory; and a Pune
> University team is looking at a backend for virtio (see OLS'2010).
>
> A more complete description of cleancache can be found in the introductory
> comment in mm/cleancache.c (in PATCH 2/7) which is included below
> for convenience.
>
> Note that an earlier version of this patch is now shipping in OpenSuSE 11.2
> and will soon ship in a release of Oracle Enterprise Linux. ÂUnderlying
> tmem technology is now shipping in Oracle VM 2.2 and was just released
> in Xen 4.0 on April 15, 2010. Â(Search news.google.com for Transcendent
> Memory)
>
> Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
> Reviewed-by: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
>
> Âfs/btrfs/extent_io.c    |  Â9 +
> Âfs/btrfs/super.c      |  Â2
> Âfs/buffer.c        Â|  Â5 +
> Âfs/ext3/super.c      Â|  Â2
> Âfs/ext4/super.c      Â|  Â2
> Âfs/mpage.c         |  Â7 +
> Âfs/ocfs2/super.c      |  Â3
> Âfs/super.c         |  Â8 +
> Âinclude/linux/cleancache.h | Â 90 +++++++++++++++++++
> Âinclude/linux/fs.h     |  Â5 +
> Âmm/Kconfig         |  22 ++++
> Âmm/Makefile        Â|  Â1
> Âmm/cleancache.c      Â| Â203 +++++++++++++++++++++++++++++++++++++++++++++
> Âmm/filemap.c        |  11 ++
> Âmm/truncate.c       Â|  10 ++
> Â15 files changed, 380 insertions(+)
>
> Cleancache can be thought of as a page-granularity victim cache for clean
> pages that the kernel's pageframe replacement algorithm (PFRA) would like
> to keep around, but can't since there isn't enough memory. ÂSo when the
> PFRA "evicts" a page, it first attempts to put it into a synchronous
> concurrency-safe page-oriented pseudo-RAM device (such as Xen's Transcendent
> Memory, aka "tmem", or in-kernel compressed memory, aka "zmem", or other
> RAM-like devices) which is not directly accessible or addressable by the
> kernel and is of unknown and possibly time-varying size. ÂAnd when a
> cleancache-enabled filesystem wishes to access a page in a file on disk,
> it first checks cleancache to see if it already contains it; if it does,
> the page is copied into the kernel and a disk access is avoided.
> This pseudo-RAM device links itself to cleancache by setting the
> cleancache_ops pointer appropriately and the functions it provides must
> conform to certain semantics as follows:
>
> Most important, cleancache is "ephemeral". ÂPages which are copied into
> cleancache have an indefinite lifetime which is completely unknowable
> by the kernel and so may or may not still be in cleancache at any later time.
> Thus, as its name implies, cleancache is not suitable for dirty pages. ÂThe
> pseudo-RAM has complete discretion over what pages to preserve and what
> pages to discard and when.
>
> A filesystem calls "init_fs" to obtain a pool id which, if positive, must be
> saved in the filesystem's superblock; a negative return value indicates
> failure. ÂA "put_page" will copy a (presumably about-to-be-evicted) page into
> pseudo-RAM and associate it with the pool id, the file inode, and a page
> index into the file. Â(The combination of a pool id, an inode, and an index
> is called a "handle".) ÂA "get_page" will copy the page, if found, from
> pseudo-RAM into kernel memory. ÂA "flush_page" will ensure the page no longer
> is present in pseudo-RAM; a "flush_inode" will flush all pages associated
> with the specified inode; and a "flush_fs" will flush all pages in all
> inodes specified by the given pool id.
>
> A "init_shared_fs", like init, obtains a pool id but tells the pseudo-RAM
> to treat the pool as shared using a 128-bit UUID as a key. ÂOn systems
> that may run multiple kernels (such as hard partitioned or virtualized
> systems) that may share a clustered filesystem, and where the pseudo-RAM
> may be shared among those kernels, calls to init_shared_fs that specify the
> same UUID will receive the same pool id, thus allowing the pages to
> be shared. ÂNote that any security requirements must be imposed outside
> of the kernel (e.g. by "tools" that control the pseudo-RAM). ÂOr a
> pseudo-RAM implementation can simply disable shared_init by always
> returning a negative value.
>
> If a get_page is successful on a non-shared pool, the page is flushed (thus
> making cleancache an "exclusive" cache). ÂOn a shared pool, the page

Do you have any reason about force "exclusive" on a non-shared pool?
To free memory on pesudo-RAM?
I want to make it "inclusive" by some reason but unfortunately I can't
say why I want it now.

While you mentioned it's "exclusive", cleancache_get_page doesn't
flush the page at below code.
Is it a role of user who implement cleancache_ops->get_page?

+int __cleancache_get_page(struct page *page)
+{
+ int ret = 0;
+ int pool_id = page->mapping->host->i_sb->cleancache_poolid;
+
+ if (pool_id >= 0) {
+ ret = (*cleancache_ops->get_page)(pool_id,
+ page->mapping->host->i_ino,
+ page->index,
+ page);
+ if (ret == CLEANCACHE_GET_PAGE_SUCCESS)
+ succ_gets++;
+ else
+ failed_gets++;
+ }
+ return ret;
+}
+EXPORT_SYMBOL(__cleancache_get_page);

If backed device is ram(ie), Could we _move_ the pages from page cache
to cleancache?
I mean I don't want to copy page when get/put operation. we can just
move page in case of backed device "ram". Is it possible?

You send the patches which is core of cleancache but I don't see any use case.
Could you send use case patches with this series?
It could help understand cleancache's benefit.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/