[PATCH] dm pref-path: provides preferred path load balance policy

From: Ravikanth Nalla
Date: Thu Dec 03 2015 - 12:47:19 EST


In a dm multipath environment, providing end user with an option of
selecting preferred path for an I/O in the SAN based on path speed,
health status and user preference is found to be useful. This allows
a user to select a reliable path over flakey/bad paths thereby
achieving higher I/O success rate. The specific scenario in which
it is found to be useful is where a user has a need to eliminate
the paths experiencing frequent I/O errors due to SAN failures and
use the best performing path for I/O whenever it is available.
Another scenario where it is found to be useful is in providing
option for user to select a high speed path (say 16GB/8GB FC)
over alternative low speed paths (4GB/2GB FC).

A new dm path selector kernel loadable module named "dm_pref_path"
is introduced to handle preferred path load balance policy
(pref-path) operations. The key operations of this policy is to
select and return user specified path from the current discovered
online/ healthy paths. If the user specified path do not exist in
the online/ healthy paths list due to path being currently in
failed state or user has mentioned wrong device information, it
will fall back to round-robin policy, where all the online/ healthy
paths are given equal preference.

Functionality provided in this module is verified on wide variety
of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).

Signed-off-by: Ravikanth Nalla <ravikanth.nalla@xxxxxxx>
---
Documentation/device-mapper/dm-pref-path.txt | 52 ++++++
drivers/md/Makefile | 6 +-
drivers/md/dm-pref-path.c | 249 +++++++++++++++++++++++++++
3 files changed, 304 insertions(+), 3 deletions(-)
create mode 100644 Documentation/device-mapper/dm-pref-path.txt
create mode 100644 drivers/md/dm-pref-path.c

diff --git a/Documentation/device-mapper/dm-pref-path.txt b/Documentation/device-mapper/dm-pref-path.txt
new file mode 100644
index 0000000..0efb156b
--- /dev/null
+++ b/Documentation/device-mapper/dm-pref-path.txt
@@ -0,0 +1,52 @@
+dm-pref-path
+============
+
+dm-pref-path is a path selector module for device-mapper targets, which
+selects a user specified path for the incoming I/O.
+
+The key operations of this policy to select and return user specified
+path from the current discovered online/ healthy paths. If the user
+specified path do not exist in the online/ healthy path list due to
+path being currently in failed state or user has mentioned wrong device
+information, it will fall back to round-robin policy, where all the
+online/ healthy paths are given equal preference.
+
+The path selector name is 'pref-path'.
+
+Table parameters for each path: [<repeat_count>]
+
+Status for each path: <status> <fail-count>
+ <status>: 'A' if the path is active, 'F' if the path is failed.
+ <fail-count>: The number of path failures.
+
+Algorithm
+=========
+User is provided with an option to specify preferred path in DM
+Multipath configuration file (/etc/multipath.conf) under multipath{}
+section with a syntax "path_selector "pref-path 1 <device major>:<device minor>"".
+
+ 1. The pref-path selector would search and return the matching user
+ preferred path from the online/ healthy path list for incoming I/O.
+
+ 2. If the user preferred path do not exist in the online/ healthy
+ path list due to path being currently in failed state or user
+ has mentioned wrong device information, it will fall back to
+ round-robin policy, where all the online/ healthy paths are given
+ equal preference.
+
+ 3. If the user preferred path comes back online/ healthy, pref-path
+ selector would find and return this path for incoming I/O.
+
+Examples
+========
+Consider 4 paths sdq, sdam, sdbh and sdcc, if user prefers path sdbh
+with major:minor number 67:176 which has throughput of 8GB/s over other
+paths of 4GB/s, pref-path policy will chose this sdbh path for all the
+incoming I/O's.
+
+# dmsetup table Test_Lun_2
+0 20971520 multipath 0 0 1 1 pref-path 0 4 1 66:80 10000 67:160 10000
+68:240 10000 8:240 10000
+
+# dmsetup status Test_Lun_2
+0 20971520 multipath 2 0 0 0 1 1 A 0 4 0 66:80 A 0 67:160 A 0 68:240 A
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f34979c..5c9f4e9 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -20,8 +20,8 @@ md-mod-y += md.o bitmap.o
raid456-y += raid5.o raid5-cache.o

# Note: link order is important. All raid personalities
-# and must come before md.o, as they each initialise
-# themselves, and md.o may use the personalities when it
+# and must come before md.o, as they each initialise
+# themselves, and md.o may use the personalities when it
# auto-initialised.

obj-$(CONFIG_MD_LINEAR) += linear.o
@@ -41,7 +41,7 @@ obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
obj-$(CONFIG_DM_DELAY) += dm-delay.o
obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o
-obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
+obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o dm-pref-path.o
obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o
obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
obj-$(CONFIG_DM_SWITCH) += dm-switch.o
diff --git a/drivers/md/dm-pref-path.c b/drivers/md/dm-pref-path.c
new file mode 100644
index 0000000..6bf1c76
--- /dev/null
+++ b/drivers/md/dm-pref-path.c
@@ -0,0 +1,249 @@
+/*
+ * (C) Copyright 2015 Hewlett Packard Enterprise Development LP.
+ *
+ * dm-pref-path.c
+ *
+ * Module Author: Ravikanth Nalla
+ *
+ * This program is free software; you can redistribute it
+ * and/or modify it under the terms of the GNU General Public
+ * License, version 2 as published by the Free Software Foundation;
+ * either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * dm-pref-path path selector:
+ * Handles preferred path load balance policy operations. The key
+ * operations of this policy is to select and return user specified
+ * path from the current discovered online/ healthy paths(valid_paths).
+ * If the user specified path do not exist in the valid_paths list due
+ * to path being currently in failed state or user has mentioned wrong
+ * device information, it will fall back to round-robin policy, where
+ * all the valid-paths are given equal preference.
+ *
+ */
+
+#include "dm.h"
+#include "dm-path-selector.h"
+
+#include <linux/slab.h>
+#include <linux/ctype.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/atomic.h>
+
+#define DM_MSG_PREFIX "multipath pref-path"
+#define PP_MIN_IO 10000
+#define PP_VERSION "1.0.0"
+#define BUFF_LEN 16
+
+/* Flag for pref_path enablement */
+unsigned pref_path_enabled;
+
+/* pref_path major:minor number */
+char pref_path[BUFF_LEN];
+
+struct selector {
+ struct list_head valid_paths;
+ struct list_head failed_paths;
+};
+
+struct path_info {
+ struct list_head list;
+ struct dm_path *path;
+ unsigned repeat_count;
+};
+
+static struct selector *alloc_selector(void)
+{
+ struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL);
+
+ if (s) {
+ INIT_LIST_HEAD(&s->valid_paths);
+ INIT_LIST_HEAD(&s->failed_paths);
+ }
+
+ return s;
+}
+
+static int pf_create(struct path_selector *ps, unsigned argc, char
+**argv) {
+ struct selector *s = alloc_selector();
+
+ if (!s)
+ return -ENOMEM;
+
+ if ((argc == 1) && strlen(argv[0]) < BUFF_LEN) {
+ pref_path_enabled = 1;
+ snprintf(pref_path, (BUFF_LEN-1), "%s", argv[0]);
+ }
+
+ ps->context = s;
+ return 0;
+}
+
+static void pf_free_paths(struct list_head *paths)
+{
+ struct path_info *pi, *next;
+
+ list_for_each_entry_safe(pi, next, paths, list) {
+ list_del(&pi->list);
+ kfree(pi);
+ }
+}
+
+static void pf_destroy(struct path_selector *ps)
+{
+ struct selector *s = ps->context;
+
+ pf_free_paths(&s->valid_paths);
+ pf_free_paths(&s->failed_paths);
+ kfree(s);
+ ps->context = NULL;
+}
+
+static int pf_status(struct path_selector *ps, struct dm_path *path,
+ status_type_t type, char *result, unsigned maxlen) {
+ unsigned sz = 0;
+ struct path_info *pi;
+
+ /* When called with NULL path, return selector status/args. */
+ if (!path)
+ DMEMIT("0 ");
+ else {
+ pi = path->pscontext;
+
+ if (type == STATUSTYPE_TABLE)
+ DMEMIT("%u ", pi->repeat_count);
+ }
+
+ return sz;
+}
+
+static int pf_add_path(struct path_selector *ps, struct dm_path *path,
+ int argc, char **argv, char **error) {
+ struct selector *s = ps->context;
+ struct path_info *pi;
+
+ /*
+ * Arguments: [<pref-path>]
+ */
+ if (argc > 1) {
+ *error = "pref-path ps: incorrect number of arguments";
+ return -EINVAL;
+ }
+
+ /* Allocate the path information structure */
+ pi = kmalloc(sizeof(*pi), GFP_KERNEL);
+ if (!pi) {
+ *error = "pref-path ps: Error allocating path information";
+ return -ENOMEM;
+ }
+
+ pi->path = path;
+ pi->repeat_count = PP_MIN_IO;
+
+ path->pscontext = pi;
+
+ list_add_tail(&pi->list, &s->valid_paths);
+
+ return 0;
+}
+
+static void pf_fail_path(struct path_selector *ps, struct dm_path
+*path) {
+ struct selector *s = ps->context;
+ struct path_info *pi = path->pscontext;
+
+ list_move(&pi->list, &s->failed_paths); }
+
+static int pf_reinstate_path(struct path_selector *ps, struct dm_path
+*path) {
+ struct selector *s = ps->context;
+ struct path_info *pi = path->pscontext;
+
+ list_move_tail(&pi->list, &s->valid_paths);
+
+ return 0;
+}
+
+/*
+ * Return user preferred path for an I/O.
+ */
+static struct dm_path *pf_select_path(struct path_selector *ps,
+ unsigned *repeat_count, size_t nr_bytes) {
+ struct selector *s = ps->context;
+ struct path_info *pi = NULL, *best = NULL;
+
+ if (list_empty(&s->valid_paths))
+ return NULL;
+
+ if (pref_path_enabled) {
+ /* search for preferred path in the
+ * valid list and then return.
+ */
+ list_for_each_entry(pi, &s->valid_paths, list) {
+ if (!strcmp(pi->path->dev->name, pref_path)) {
+ best = pi;
+ *repeat_count = best->repeat_count;
+ break;
+ }
+ }
+ }
+
+ /* If preferred path is not enabled/ not available/
+ * offline chose the next path in the list.
+ */
+ if (best == NULL && !list_empty(&s->valid_paths)) {
+ pi = list_entry(s->valid_paths.next,
+ struct path_info, list);
+ list_move_tail(&pi->list, &s->valid_paths);
+ best = pi;
+ *repeat_count = best->repeat_count;
+ }
+
+ return best ? best->path : NULL;
+}
+
+static struct path_selector_type pf_ps = {
+ .name = "pref-path",
+ .module = THIS_MODULE,
+ .table_args = 1,
+ .info_args = 0,
+ .create = pf_create,
+ .destroy = pf_destroy,
+ .status = pf_status,
+ .add_path = pf_add_path,
+ .fail_path = pf_fail_path,
+ .reinstate_path = pf_reinstate_path,
+ .select_path = pf_select_path,
+};
+
+static int __init dm_pf_init(void)
+{
+ int r = dm_register_path_selector(&pf_ps);
+
+ if (r < 0) {
+ DMERR("register failed %d", r);
+ return r;
+ }
+
+ DMINFO("version " PP_VERSION " loaded");
+ return r;
+}
+
+static void __exit dm_pf_exit(void)
+{
+ dm_unregister_path_selector(&pf_ps);
+}
+
+module_init(dm_pf_init);
+module_exit(dm_pf_exit);
+
+MODULE_DESCRIPTION(DM_NAME "pref-path multipath path selector");
+MODULE_AUTHOR("ravikanth.nalla@xxxxxxx");
+MODULE_LICENSE("GPL");
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/