[PATCH] staging: lustre: update the TODO list

From: James Simmons
Date: Sun Feb 11 2018 - 18:10:59 EST


As more people become involved with the progression of the lustre
client it needs to more clear what needs to be done to leave
staging. Update the TODO list with the various bugs and changes
to accomplish this. Some are simple bugs and others are far more
complex task that will change many lines of code. Some even cover
updating the user land utilities to meet the kernel requirements.
Several bugs have already been addressed and just need to be
pushed to the staging tree.

Signed-off-by: James Simmons <jsimmons@xxxxxxxxxxxxx>
---
drivers/staging/lustre/TODO | 310 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 300 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/TODO b/drivers/staging/lustre/TODO
index f194417..94446487 100644
--- a/drivers/staging/lustre/TODO
+++ b/drivers/staging/lustre/TODO
@@ -1,12 +1,302 @@
-* Possible remaining coding style fix.
-* Remove deadcode.
-* Separate client/server functionality. Functions only used by server can be
- removed from client.
-* Clean up libcfs layer. Ideally we can remove include/linux/libcfs entirely.
-* Clean up CLIO layer. Lustre client readahead/writeback control needs to better
- suit kernel providings.
-* Add documents in Documentation.
-* Other minor misc cleanups...
+Currently all the work directed toward the lustre upstream client is tracked
+at the following link:
+
+https://jira.hpdd.intel.com/browse/LU-9679
+
+Under this ticket you will see the following work items that need to be
+addressed:
+
+******************************************************************************
+* libcfs cleanup
+*
+* https://jira.hpdd.intel.com/browse/LU-9859
+*
+* Track all the cleanups and simplification of the libcfs module. Remove
+* functions the kernel provides. Possible intergrate some of the functionality
+* into the kernel proper.
+*
+******************************************************************************
+
+https://jira.hpdd.intel.com/browse/LU-100086
+
+LNET_MINOR conflicts with USERIO_MINOR
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8130
+
+Fix and simplify libcfs hash handling
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8703
+
+The current way we handle SMP is wrong. Platforms like ARM and KNL can have
+core and NUMA setups with things like NUMA nodes with no cores. We need to
+handle such cases. This work also greatly simplified the lustre SMP code.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9019
+
+Replace libcfs time API with standard kernel APIs. Also migrate away from
+jiffies. We found jiffies can vary on nodes which can lead to corner cases
+that can break the file system due to nodes having inconsistent behavior.
+So move to time64_t and ktime_t as much as possible.
+
+******************************************************************************
+* Proper IB support for ko2iblnd
+******************************************************************************
+https://jira.hpdd.intel.com/browse/LU-9179
+
+Poor performance for the ko2iblnd driver. This is related to many of the
+patches below that are missing from the linux client.
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9886
+
+Crash in upstream kiblnd_handle_early_rxs()
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10394 / LU-10526 / LU-10089
+
+Default to default to using MEM_REG
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10459
+
+throttle tx based on queue depth
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9943
+
+correct WR fast reg accounting
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10291
+
+remove concurrent_sends tunable
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10213
+
+calculate qp max_send_wrs properly
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9810
+
+use less CQ entries for each connection
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10129 / LU-9180
+
+rework map_on_demand behavior
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10129
+
+query device capabilities
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10015
+
+fix race at kiblnd_connect_peer
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9983
+
+allow for discontiguous fragments
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9500
+
+Don't Page Align remote_addr with FastReg
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9448
+
+handle empty CPTs
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9507
+
+Don't Assert On Reconnect with MultiQP
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9472
+
+Fix FastReg map/unmap for MLX5
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9425
+
+Turn on 2 sges by default
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8943
+
+Enable Multiple OPA Endpoints between Nodes
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-5718
+
+multiple sges for work request
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9094
+
+kill timedout txs from ibp_tx_queue
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9094
+
+reconnect peer for REJ_INVALID_SERVICE_ID
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8752
+
+Stop MLX5 triggering a dump_cqe
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8874
+
+Move ko2iblnd to latest RDMA changes
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8875 / LU-8874
+
+Change to new RDMA done callback mechanism
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9164 / LU-8874
+
+Incorporate RDMA map/unamp API's into ko2iblnd
+
+******************************************************************************
+* sysfs/debugfs fixes
+*
+* https://jira.hpdd.intel.com/browse/LU-8066
+*
+* The original migration to sysfs was done in haste without properly working
+* utilities to test the changes. This covers the work to restore the proper
+* behavior. Huge project to make this right.
+*
+******************************************************************************
+
+https://jira.hpdd.intel.com/browse/LU-9431
+
+The function class_process_proc_param was used for our mass updates of proc
+tunables. It didn't work with sysfs and it was just ugly so it was removed.
+In the process the ability to mass update thousands of clients was lost. This
+work restores this in a sane way.
+
+------------------------------------------------------------------------------
+https://jira.hpdd.intel.com/browse/LU-9091
+
+One the major request of users is the ability to pass in parameters into a
+sysfs file in various different units. For example we can set max_pages_per_rpc
+but this can vary on platforms due to different platform sizes. So you can
+set this like max_pages_per_rpc=16MiB. The original code to handle this written
+before the string helpers were created so the code doesn't follow that format
+but it would be easy to move to. Currently the string helpers does the reverse
+of what we need, changing bytes to string. We need to change a string to bytes.
+
+******************************************************************************
+* Proper user land to kernel space interface for Lustre
+*
+* https://jira.hpdd.intel.com/browse/LU-9680
+*
+******************************************************************************
+
+https://jira.hpdd.intel.com/browse/LU-8915
+
+Don't use linux list structure as user land arguments for lnet selftest.
+This code is pretty poor quality and really needs to be reworked.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8834
+
+The lustre ioctl LL_IOC_FUTIMES_3 is very generic. Need to either work with
+other file systems with similar functionality and make a common syscall
+interface or rework our server code to automagically do it for us.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-6202
+
+Cleanup up ioctl handling. We have many obsolete ioctls. Also the way we do
+ioctls can be changed over to netlink. This also has the benefit of working
+better with HPC systems that do IO forwarding. Such systems don't like ioctls
+very well.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9667
+
+More cleanups by making our utilities use sysfs instead of ioctls for LNet.
+Also it has been requested to move the remaining ioctls to the netlink API.
+
+******************************************************************************
+* Misc
+******************************************************************************
+
+------------------------------------------------------------------------------
+https://jira.hpdd.intel.com/browse/LU-9855
+
+Clean up obdclass preprocessor code. One of the major eye sores is the various
+pointer redirections and macros used by the obdclass. This makes the code very
+difficult to understand. It was requested by the Al Viro to clean this up before
+we leave staging.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9633
+
+Migrate to sphinx kernel-doc style comments. Add documents in Documentation.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-6142
+
+Possible remaining coding style fix. Remove deadcode. Enforce kernel code
+style. Other minor misc cleanups...
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8837
+
+Separate client/server functionality. Functions only used by server can be
+removed from client. Most of this has been done but we need a inspect of the
+code to make sure.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-8964
+
+Lustre client readahead/writeback control needs to better suit kernel providings.
+Currently its being explored. We could end up replacing the CLIO read ahead
+abstract with the kernel proper version.
+
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9862
+
+Patch that landed for LU-7890 leads to static checker errors
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-9868
+
+dcache/namei fixes for lustre
+------------------------------------------------------------------------------
+
+https://jira.hpdd.intel.com/browse/LU-10467
+
+use standard linux wait_events macros work by Neil Brown
+
+------------------------------------------------------------------------------

Please send any patches to Greg Kroah-Hartman <greg@xxxxxxxxx>, Andreas Dilger
-<andreas.dilger@xxxxxxxxx>, and Oleg Drokin <oleg.drokin@xxxxxxxxx>.
+<andreas.dilger@xxxxxxxxx>, James Simmons <jsimmons@xxxxxxxxxxxxx> and
+Oleg Drokin <oleg.drokin@xxxxxxxxx>.
--
1.8.3.1