[RFC 00/31] Current state of MARS

From: Thomas Schoebel-Theuer
Date: Thu Dec 31 2015 - 06:38:50 EST


MARS Light is an asynchronous replication system for block storage over
long distances [1,2,3]. It is the base for HA over long distances (more than
50km) and over network bottlenecks, e.g. high / varying packet loss rates.

The out-of-tree (OOT) version of MARS (branch light0.1.y) is in production
at 1&1 Internet SE, currently at more than 1600 servers of various types,
more than 6 petabytes of total storage, and has collected more than 6 millions
of operating hours with less than 10 customer-visible incidents caused by
MARS (most of them in the early beta phase).

>From my last LKML posting in 2014, I got the following major TODOs
from you:

1) no kernel prepatch anymore, no additional EXPORT_SYMBOL()s anywhere
in the rest of the kernel.

=> accomplished in the enclosed patchset (however not yet fully bug-free).
See also branches WIP-PORTABLE, WIP-BASE and WIP-PROPOSED-UPSTREAM at [1]
for some out-of-tree (OOT) versions of MARS.

=> this means there will be zero impact from MARS to the rest of the kernel
when MARS is not used. Additionally I placed it into drivers/staging/
(configurabale via migration script ./rework-mars-for-upstream.pl ).

2) replace the symlink tree by something else.

=> not yet done, due to lack of time. Sorry. I am planning to work on it
in the forthcoming year, and hope to get enough time for it.

I think the attached patchset is not yet ready for submission, but please
help me by giving early feedback. I would be very glad if some experienced
upstream hacker would mentor me and help me at various points.

Here are some of my thoughts about further development:

In future, I would like to store the old symlink information (list of
key => value pairs) in status files instead, one instance for each resource,
and one for global configuration data (notice that the information _needs_
to be persistent for coping with node failures, and that dynamic storage
is needed anyway for huge masses of transaction logfiles).

This should enhance scalability in the distributed system. Here is a
use case: almost our full 1&1 webhosting machine park has been migrated
from DRBD to MARS, retaining the conventional pair structure. Thus more
than 800 DRBD clusters have been turned into MARS clusters.

In future, the whole 1&1 machine park could collapse into a _single_ big
cluster consisting of thousands of machines. This has some advantages,
not limited to greater flexibility with dynamic join-resource /
leave-resource operations throughout that big cluster. For example,
userspace can more easily detect failures cluster-wide and datacenter-wide,
because the Lamport Clock algorithm used by MARS [2,3] is already a kind
of "heartbeat", even remembering the timestamps of last successful
information exchange. Thus it should be relatively easy to implement
quorum algorithms etc on top of it in userspace.

Note that I am currently paid by 1&1 to ensure a seamless upgrade path
for servers to any new data format, without downtime (other than
temporarily switching the primary roles).

If you don't want the old symlink format in the upstream kernel, I would
first implement the new format OOT (and strip out the old code for
submission). Otherwise, I would agree to do this intree, hopefully with
help and advise from some interested kernel hackers.

Cheers and a happy new year,

Thomas

[1] https://github.com/schoebel/mars
[2] https://github.com/schoebel/mars/blob/master/docu/MARS_Froscon2015.pdf
[3] https://github.com/schoebel/mars/blob/master/docu/mars-manual.pdf


Thomas Schoebel-Theuer (31):
mars: add new module lamport
mars: add new module brick_say
mars: add new module brick_mem
mars: add new module brick_checking
mars: add new module meta
mars: add new module brick
mars: add new module lib_pairing_heap
mars: add new module lib_queue
mars: add new module lib_rank
mars: add new module lib_limiter
mars: add new module lib_timing
mars: add new module vfs_compat
mars: add new module xio
mars: add new module xio_net
mars: add new module lib_mapfree
mars: add new module lib_log
mars: add new module xio_bio
mars: add new module xio_sio
mars: add new module xio_client
mars: add new module xio_if
mars: add new module xio_copy
mars: add new module xio_trans_logger
mars: add new module xio_server
mars: add new module light_strategy
mars: add new module light_net
mars: add new module light_server_strategy
mars: add new module mars_proc
mars: add new module mars_light
mars: add new module Makefile
mars: add new module Kconfig
mars: activate build

drivers/staging/Kconfig | 2 +
drivers/staging/Makefile | 1 +
drivers/staging/mars/Kconfig | 266 +
drivers/staging/mars/Makefile | 61 +
drivers/staging/mars/brick.c | 728 +++
drivers/staging/mars/brick_mem.c | 1081 ++++
drivers/staging/mars/brick_say.c | 916 +++
drivers/staging/mars/lamport.c | 61 +
drivers/staging/mars/lib/lib_limiter.c | 129 +
drivers/staging/mars/lib/lib_rank.c | 87 +
drivers/staging/mars/lib/lib_timing.c | 71 +
drivers/staging/mars/mars_light/light_net.c | 109 +
.../mars/mars_light/light_server_strategy.c | 403 ++
drivers/staging/mars/mars_light/light_strategy.c | 2132 +++++++
drivers/staging/mars/mars_light/mars_light.c | 5880 ++++++++++++++++++++
drivers/staging/mars/mars_light/mars_proc.c | 369 ++
drivers/staging/mars/xio_bricks/lib_log.c | 505 ++
drivers/staging/mars/xio_bricks/lib_mapfree.c | 380 ++
drivers/staging/mars/xio_bricks/xio.c | 161 +
drivers/staging/mars/xio_bricks/xio_bio.c | 845 +++
drivers/staging/mars/xio_bricks/xio_client.c | 1055 ++++
drivers/staging/mars/xio_bricks/xio_copy.c | 1005 ++++
drivers/staging/mars/xio_bricks/xio_if.c | 961 ++++
drivers/staging/mars/xio_bricks/xio_net.c | 1830 ++++++
drivers/staging/mars/xio_bricks/xio_server.c | 486 ++
drivers/staging/mars/xio_bricks/xio_sio.c | 571 ++
drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3309 +++++++++++
include/linux/brick/brick.h | 642 +++
include/linux/brick/brick_checking.h | 104 +
include/linux/brick/brick_mem.h | 218 +
include/linux/brick/brick_say.h | 96 +
include/linux/brick/lamport.h | 26 +
include/linux/brick/lib_limiter.h | 49 +
include/linux/brick/lib_pairing_heap.h | 110 +
include/linux/brick/lib_queue.h | 166 +
include/linux/brick/lib_rank.h | 135 +
include/linux/brick/lib_timing.h | 181 +
include/linux/brick/meta.h | 106 +
include/linux/brick/vfs_compat.h | 45 +
include/linux/mars_light/light_strategy.h | 236 +
include/linux/mars_light/mars_proc.h | 34 +
include/linux/xio/lib_log.h | 329 ++
include/linux/xio/lib_mapfree.h | 84 +
include/linux/xio/xio.h | 313 ++
include/linux/xio/xio_bio.h | 85 +
include/linux/xio/xio_client.h | 105 +
include/linux/xio/xio_copy.h | 115 +
include/linux/xio/xio_if.h | 108 +
include/linux/xio/xio_net.h | 171 +
include/linux/xio/xio_server.h | 91 +
include/linux/xio/xio_sio.h | 68 +
include/linux/xio/xio_trans_logger.h | 263 +
52 files changed, 27284 insertions(+)
create mode 100644 drivers/staging/mars/Kconfig
create mode 100644 drivers/staging/mars/Makefile
create mode 100644 drivers/staging/mars/brick.c
create mode 100644 drivers/staging/mars/brick_mem.c
create mode 100644 drivers/staging/mars/brick_say.c
create mode 100644 drivers/staging/mars/lamport.c
create mode 100644 drivers/staging/mars/lib/lib_limiter.c
create mode 100644 drivers/staging/mars/lib/lib_rank.c
create mode 100644 drivers/staging/mars/lib/lib_timing.c
create mode 100644 drivers/staging/mars/mars_light/light_net.c
create mode 100644 drivers/staging/mars/mars_light/light_server_strategy.c
create mode 100644 drivers/staging/mars/mars_light/light_strategy.c
create mode 100644 drivers/staging/mars/mars_light/mars_light.c
create mode 100644 drivers/staging/mars/mars_light/mars_proc.c
create mode 100644 drivers/staging/mars/xio_bricks/lib_log.c
create mode 100644 drivers/staging/mars/xio_bricks/lib_mapfree.c
create mode 100644 drivers/staging/mars/xio_bricks/xio.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_bio.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_client.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_copy.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_if.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_net.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_server.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_sio.c
create mode 100644 drivers/staging/mars/xio_bricks/xio_trans_logger.c
create mode 100644 include/linux/brick/brick.h
create mode 100644 include/linux/brick/brick_checking.h
create mode 100644 include/linux/brick/brick_mem.h
create mode 100644 include/linux/brick/brick_say.h
create mode 100644 include/linux/brick/lamport.h
create mode 100644 include/linux/brick/lib_limiter.h
create mode 100644 include/linux/brick/lib_pairing_heap.h
create mode 100644 include/linux/brick/lib_queue.h
create mode 100644 include/linux/brick/lib_rank.h
create mode 100644 include/linux/brick/lib_timing.h
create mode 100644 include/linux/brick/meta.h
create mode 100644 include/linux/brick/vfs_compat.h
create mode 100644 include/linux/mars_light/light_strategy.h
create mode 100644 include/linux/mars_light/mars_proc.h
create mode 100644 include/linux/xio/lib_log.h
create mode 100644 include/linux/xio/lib_mapfree.h
create mode 100644 include/linux/xio/xio.h
create mode 100644 include/linux/xio/xio_bio.h
create mode 100644 include/linux/xio/xio_client.h
create mode 100644 include/linux/xio/xio_copy.h
create mode 100644 include/linux/xio/xio_if.h
create mode 100644 include/linux/xio/xio_net.h
create mode 100644 include/linux/xio/xio_server.h
create mode 100644 include/linux/xio/xio_sio.h
create mode 100644 include/linux/xio/xio_trans_logger.h

--
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/