RE: [PATCH V8 2/4] mm: frontswap: core code

From: Dan Magenheimer
Date: Thu Sep 08 2011 - 11:01:53 EST

Next message: Kautuk Consul: "[PATCH 1/1] Trivial: devtmpfs.c: No need to set the task state to TASK_RUNNING"
Previous message: Greg KH: "Re: [PATCH 25/62] staging: irq: Remove IRQF_DISABLED"
Next in thread: Andrew Morton: "Re: [PATCH V8 2/4] mm: frontswap: core code"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: Andrew Morton [mailto:akpm@xxxxxxxxxxxxxxxxxxxx]
> Subject: Re: [PATCH V8 2/4] mm: frontswap: core code

Thanks very much for taking the time for this feedback!

Please correct me if I am presumptuous or misreading
SubmittingPatches, but after making the changes below,
I am thinking this constitutes a "Reviewed-by"?

> > From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
> > Subject: [PATCH V8 2/4] mm: frontswap: core code
> >
> > This second patch of four in this frontswap series provides the core code
> > for frontswap that interfaces between the hooks in the swap subsystem and
> > +
> > +struct frontswap_ops {
> > + void (*init)(unsigned);
> > + int (*put_page)(unsigned, pgoff_t, struct page *);
> > + int (*get_page)(unsigned, pgoff_t, struct page *);
> > + void (*flush_page)(unsigned, pgoff_t);
> > + void (*flush_area)(unsigned);
> > +};
>
> Please don't use the term "flush". In both the pagecache code and the
> pte code it is interchangably used to refer to both writeback and
> invalidation. The way to avoid this ambiguity and confusion is to use
> the terms "writeback" and "invalidate" instead.
>
> Here, you're referring to invalidation.

While the different name is OK, changing this consistently would now
require simultaneous patches in cleancache, zcache, and xen (not
to mention lots of docs inside and outside the kernel). I suspect
it would be cleaner to do this later across all affected code
with a single commit. Hope that's OK.

(Personally, I find "invalidate" to be inaccurate because common
usage of the term doesn't imply that the space used in the cache
is recovered, i.e. garbage collection, which is the case here.
To me, "flush" implies invalidate PLUS recover space.)

> > +static struct frontswap_ops frontswap_ops;
>
> __read_mostly?

Yep. Will fix.

> > +/*
> > + * This global enablement flag reduces overhead on systems where frontswap_ops
> > + * has not been registered, so is preferred to the slower alternative: a
> > + * function call that checks a non-global.
> > + */
> > +int frontswap_enabled;
>
> __read_mostly?

Yep. Will fix.

> > +/*
> > + * Useful stats available in /sys/kernel/mm/frontswap. These are for
> > + * information only so are not protected against increment/decrement races.
> > + */
> > +static unsigned long frontswap_gets;
> > +static unsigned long frontswap_succ_puts;
> > +static unsigned long frontswap_failed_puts;
> > +static unsigned long frontswap_flushes;
>
> If they're in /sys/kernel/mm then they rather become permanent parts of
> the exported kernel interface. We're stuck with them. Plus they're
> inaccurate and updating them might be inefficient, so we don't want to
> be stuck with them.
>
> I suggest moving these to debugfs from where we can remove them if we
> feel like doing so.

The style (and code) for this was mimicked from ksm and hugepages, which
expose the stats the same way... as does cleancache now. slub is also
similar. I'm OK with using a different approach (e.g. debugfs), but
think it would be inconsistent and confusing to expose these stats
differently than cleancache (or ksm and hugepages). I'd support
and help with a massive cleanup commit across all of mm later though.
Hope that's OK for now.

> > +/*
> > + * Frontswap, like a true swap device, may unnecessarily retain pages
> > + * under certain circumstances; "shrink" frontswap is essentially a
> > + * "partial swapoff" and works by calling try_to_unuse to attempt to
> > + * unuse enough frontswap pages to attempt to -- subject to memory
> > + * constraints -- reduce the number of pages in frontswap
> > + */
> > +void frontswap_shrink(unsigned long target_pages)
>
> It's unclear whether `target_pages' refers to the number of pages to
> remove or to the number of pages to retain. A comment is needed.

OK. Will fix.

> > +{
> > + int wrapped = 0;
> > + bool locked = false;
> > +
> > + /* try a few times to maximize chance of try_to_unuse success */
>
> Why? Is this necessary? How often does try_to_unuse fail?
>
> > + for (wrapped = 0; wrapped < 3; wrapped++) {
>
> `wrapped' seems an inappropriate identifier.

Hmmm... this loop was mimicking some swap code that now
seems to be long gone. I agree it's not really necessary
and will remove the loop (which also removes the identifier).

> > +
>
> unneeded newline

Doh! Will fix.

> > + if (security_vm_enough_memory_kern(pages))
>
> What's this doing here? Needs a comment please.
>
> > + continue;
> > + vm_unacct_memory(pages);
>
> hm, is that accurate? Or should we account for the pages which
> try_to_unuse() actually unused?

These are mimicking code in swapoff. I think the code is
correct but agree a clarifying comment or two is in order.

> > +/*
> > + * Count and return the number of pages frontswap pages across all
>
> s/pages//

Doh! Will fix.

> > + * swap devices. This is exported so that a kernel module can
> > + * determine current usage without reading sysfs.
>
> Which kernel module might want to do this?

The tmem backends (currently zcache and xen tmem). I'll change
the wording in the comment to clarify.

> > +#ifdef CONFIG_SYSFS
>
> Has the code been tested with CONFIG_SYSFS=n?

Yes, in a previous version. I'll doublecheck though.

Thanks again for the feedback! I'll publish a V9 with
the corrections no later than early next week.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kautuk Consul: "[PATCH 1/1] Trivial: devtmpfs.c: No need to set the task state to TASK_RUNNING"
Previous message: Greg KH: "Re: [PATCH 25/62] staging: irq: Remove IRQF_DISABLED"
Next in thread: Andrew Morton: "Re: [PATCH V8 2/4] mm: frontswap: core code"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]