[PATCH 10/10] staging: ramster: add how-to for ramster
From: Wanpeng Li
Date: Tue Apr 09 2013 - 20:27:37 EST
Add how-to for ramster.
Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
Signed-off-by: Wanpeng Li <liwanp@xxxxxxxxxxxxxxxxxx>
drivers/staging/zcache/ramster/HOWTO.txt | 257 ++++++++++++++++++++++++++++++
1 file changed, 257 insertions(+)
create mode 100644 drivers/staging/zcache/ramster/HOWTO.txt
diff --git a/drivers/staging/zcache/ramster/HOWTO.txt b/drivers/staging/zcache/ramster/HOWTO.txt
new file mode 100644
@@ -0,0 +1,257 @@
+ Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
+This is a how-to document for RAMster. It applies to the March 9, 2013
+version of RAMster, re-merged with the new zcache codebase, built and tested
+on the 3.9 tree and submitted for the staging tree for 3.9.
+Note that this document was created from notes taken earlier. I would
+appreciate any feedback from anyone who follows the process as described
+to confirm that it works and to clarify any possible misunderstandings,
+or to report problems.
+1) Install two or more Linux systems that are known to work when upgraded
+ to a recent upstream Linux kernel version (e.g. v3.9). I used Oracle
+ Linux 6 ("OL6") on two Dell Optiplex 790s. Note that it should be possible
+ to use ocfs2 as a filesystem on your systems but this hasn't been
+ tested thoroughly, so if you do use ocfs2 and run into problems, please
+ report them. Up to eight nodes should work, but not much testing has
+ been done with more than three nodes.
+On each system:
+2) Configure, build and install then boot Linux (e.g. 3.9), just to ensure it
+ can be done with an unmodified upstream kernel. Confirm you booted
+ the upstream kernel with "uname -a".
+3) Install ramster-tools. The src.rpm and an OL6 rpm are available
+ in this directory. I'm not very good at userspace stuff and
+ would welcome any help in turning ramster-tools into more
+ distributable rpms/debs for a wider range of distros.
+B. BUILDING RAMSTER INTO THE KERNEL
+Do the following on each system:
+1) Ensure you have the new codebase for drivers/staging/zcache in your source.
+2) Change your .config to have:
+ You may have to reconfigure your kernel multiple times to ensure
+ all of these are set properly. I use:
+ # yes "" | make oldconfig
+ and then manually check the .config file to ensure my selections
+ have "taken".
+ Do not bother to build the kernel until you are certain all of
+ the above config selections will stick for the build.
+3) Build this kernel and "make install" so that you have a new kernel
+ in /etc/grub.conf
+4) Add "ramster" to the kernel boot line in /etc/grub.conf.
+5) Reboot and check dmesg to ensure there are some messages from ramster
+ and that "ramster_enabled=1" appears.
+ # dmesg | grep ramster
+ You should also see a lot of files in:
+ # ls /sys/kernel/debug/zcache
+ # ls /sys/kernel/debug/ramster
+ and a few files in:
+ # ls /sys/kernel/mm/ramster
+ RAMster now will act as a single-system zcache but doesn't yet
+ know anything about the cluster so can't do anything remotely.
+C. BUILDING THE RAMSTER CLUSTER
+This is the error prone part unless you are a clustering expert. We need
+to describe the cluster in /etc/ramster.conf file and the init scripts
+that parse it are extremely picky about the syntax.
+1) Create the /etc/ramster.conf file and ensure it is identical
+ on both systems. There is a good amount of similar documentation
+ for ocfs2 /etc/cluster.conf that can be googled for this, but I use:
+ name = ramster
+ node_count = 2
+ name = system1
+ cluster = ramster
+ number = 0
+ ip_address = my.ip.ad.r1
+ ip_port = 7777
+ name = system2
+ cluster = ramster
+ number = 0
+ ip_address = my.ip.ad.r2
+ ip_port = 7777
+ You must ensure that the "name" field in the file exactly matches
+ the output of "hostname" on each system. The following assumes
+ you use "ramster" as the name of your cluster.
+2) Enable the ramster service and configure it:
+ # chkconfig --add ramster
+ # service ramster configure
+ Set "load on boot" to "y", cluster to start is "ramster" (or whatever
+ name you chose in ramster.conf), heartbeat dead threshold as "500",
+ network idle timeout as "1000000". Leave the others as default.
+4) Reboot. After reboot, try:
+ # service ramster status
+ You should see "Checking ramster cluster ramster: Online". If you do
+ not, something is wrong and RAMster will not work. Note that you
+ should also see that the driver for "configfs" is loaded and mounted,
+ the driver for ocfs2_dlmfs is not loaded, and some numbers for network
+ parameters. You will also see "Checking ramster heartbeat: Not active".
+ That's all OK.
+5) Now you need to start the cluster heartbeat; the cluster is not "up"
+ until all nodes detect a heartbeat. Normally this is done via
+ a cluster filesystem, but you don't have one. Some hack-y
+ code in RAMster can start it for you though if you tell it what
+ nodes are "up". To enable it for nodes 0 and 1, do:
+ # echo 0 > /sys/kernel/mm/ramster/manual_node_up
+ # echo 1 > /sys/kernel/mm/ramster/manual_node_up
+ This must be done on ALL nodes. I usually put these lines
+ in /etc/rc.local as otherwise I forget. To confirm that
+ the cluster is now up, on both systems do:
+ # dmesg | grep ramster
+ You should see "Accepted connection" messages in dmesg after this.
+6) You must tell each node the node to which it should "remotify" pages.
+ For example if you have a three-node cluster and you want nodes
+ 1 and 2 to be "clients" and node 0 to be the "memory server", then
+ on nodes 1 and 2, you do:
+ # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
+ You should see "ramster: node N set as remotification target"
+ in dmesg. Again, /etc/rc.local is a good place to put this
+ so you don't forget to do it at each boot.
+7) One more step: By default, the RAMster code does not "remotify" any
+ pages; this is primarily for testing purposes, but sometimes it is
+ useful. This may change in the future, but for now, you must:
+ # echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
+ # echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable
+ The first enables remotifying swap (persistent, aka frontswap) pages,
+ the second enables remotifying of page cache (ephemeral, cleancache)
+ These lines can also be put in /etc/rc.local (AFTER the node_up
+ lines), or I often just put them at the beginning of my script that
+ runs a workload.
+8) Most testing has been done with both/all machines booted roughly
+ simultaneously. Ideally, you should do this too unless you are
+ trying to break RAMster rather than just use it. ;-)
+D. TESTING RAMSTER
+1) Note that RAMster has no value unless pages get "remotified". For
+ swap/frontswap/persistent pages, this doesn't happen unless/until
+ the workload would cause swapping to occur, at which point pages
+ are put into frontswap/zcache, and the remotification thread starts
+ working. To get to the point where the system swaps, you either
+ need a workload for which the working set exceeds the RAM in the
+ system; or you need to somehow reduce the amount of RAM one of
+ the system sees. This latter is easy when testing in a VM, but
+ harder on physical systems. In some cases, "mem=xxxM" on the
+ kernel command line restricts memory, but for some values of xxx
+ my kernel fails to boot. I may also try creating a fixed RAMdisk,
+ doing nothing with it, but ensuring that it eats up a fixed
+ amount of RAM.
+2) To see if RAMster is working, on the remote system, I do:
+ # watch -d 'cat /sys/kernel/debug/ramster/foreign_*'
+ to monitor the number (and max) ephemeral and persistent pages
+ that RAMster has sent. If these stay at 0, RAMster is not working
+ either because the workload isn't creating enough memory pressure
+ or because "remotifying" isn't working. On the system with the
+ workload, you can watch lots of useful information also, but beware
+ that you may be affecting the workload and performance. I use
+ # watch ./watchme
+ where the watchme file contains:
+ for i in /sys/kernel/debug/zcache/evicted_buddied_pages \
+ /sys/kernel/debug/zcache/evicted_raw_pages \
+ /sys/kernel/debug/zcache/evicted_unbuddied_pages \
+ /sys/kernel/debug/zcache/zbud_curr_raw_pages \
+ /sys/kernel/debug/zcache/zbud_curr_zbytes \
+ /sys/kernel/debug/zcache/zbud_curr_zpages \
+ /sys/kernel/debug/ramster/eph_pages_remoted \
+ /sys/kernel/debug/ramster/remote_eph_pages_succ_get \
+ /sys/kernel/debug/ramster/remote_pers_pages_succ_get \
+ echo $i ": " $(cat $i)
+ And if you have debugfs mounted (as /sys/kernel/debug), you can
+ add to the watchme script some interesting counters in
+ /sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*
+3) In v4, there are known issues in counting certain values. As a result
+ you may see periodic warnings from the kernel. Almost always you
+ will see "ramster: bad accounting for XXX". There are also "WARN_ONCE"
+ messages. If you see kernel warnings with a tombstone, please report
+ them. They are harmless but reflect bugs that need to be eventually fixed.
+AUTOMATIC SWAP REPATRIATION
+You may notice that while the systems are idle, the foreign persistent
+page count on the remote machine slowly decreases. This is because
+RAMster implements "frontswap selfshrinking": When possible, swap
+pages that have been remotified are slowly repatriated to the local
+machine. This is so that local RAM can be used when possible and
+so that, in case of remote machine crash, the probability of loss
+of data is reduced.
+REBOOTING / POWEROFF
+If a system is shut down while some of its swap pages still reside
+on a remote system, the system may lock up partially through the shutdown
+sequence. This is because the network is shut down before the
+swap mechansim is shut down. To avoid this annoying problem, simply
+shut off the swap subsystem before starting the shutdown sequence, e.g.:
+ # swapoff -a
+ # reboot
+v5-120214->120817: updated for merge into new zcache codebase
+v4-120126->v5-120214: updated for V5
+111227->v4-120126: added info on selfshrinking and rebooting
+111227->v4-120126: added more info for tracking RAMster stats
+111227->v4-120126: CONFIG_PREEMPT_NONE no longer necessary
+111227->v4-120126: cleancache now works completely so no need to disable it
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/