Possible bug with passive FTP and ip_vs_ftp module

From: Shawn Heisey
Date: Mon Jul 13 2015 - 11:43:32 EST


I hope I can get everything right for this bug report in the first
message. As this is the first time I've done this here, the chances of
getting it wrong may be high. I haven't included everything that's
mentioned in the LKML docs for reporting bugs, but some of that info
isn't really relevant.

One-line summary:
Passive FTP doesn't work through a load balancer using the ip_vs_ftp
kernel module.

Full description:
==============
I have a setup on CentOS 5 (kernel 2.6.18-128.1.6.el5.centos.plus,
ipvsadm v1.24, ldirectord v1.186-ha-2.1.3) that handles this perfectly.
I'm migrating because the software on that system is very old.

After migrating the config to Ubuntu 14, fully updated with aptitude,
only active FTP works. The kernel is 3.13.0-52-generic, ipvsadm is
v1.26, and ldirectord is v1.186-ha -- all are installed from Ubuntu
packages.

root@lb1:~# lsb_release -rd
Description: Ubuntu 14.04.2 LTS
Release: 14.04
root@lb1:~# uname -a
Linux lb1 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Passive FTP, which should be handled by the ip_vs_ftp module, doesn't
work properly. The control channel works, but data connections don't
establish. The ip_vs_ftp module is loaded from /etc/rc.local and the
system has been rebooted a number of times. The ldirectord process is
not started by upstart, it is started by pacemaker.

The LVS load balancer is being configured by ldirectord. This is the
ldirectord config:

checktimeout=5
checkinterval=10
negotiatetimeout=20
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no

virtual=XX.XXX.XXX.71:21
fallback=127.0.0.1:21
real=10.100.2.61:21 masq 65535
real=10.100.2.60:21 masq 1
service=ftp
request="monitortest.txt"
receive="good"
login="lbtest"
passwd="PASSWD"
scheduler=wrr
protocol=tcp
checktype=negotiate

On both CentOS 5 and Ubuntu 14, the machine has actual public IP
addresses on it, and that virtual address is a public IP. The firewall
is disabled.
==============

Additional relevant details: The FTP servers use the load balancer as a
default gateway.

CentOS 6 (my temporary band-aid host) -- works:
Linux lb5 2.6.32-504.16.2.el6.centos.plus.x86_64 #1 SMP Wed Apr 22
00:59:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 14 -- doesn't work:
Linux lb1 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

A 4.1 test kernel package from Ubuntu has been tried, and it didn't help.

The pacemaker package is used so the load balancer is redundant on two
hosts.

Multiple hosts with different CPU types, NIC types, and storage
subsystems have been tried.

Other details, possibly helpful but only indirectly relevant:
--------------
I first tried handling this problem through the distribution bug
tracking mechanism:

https://bugs.launchpad.net/bugs/1453180

This bug expired because nobody touched it for two months.

I will be switching to a solution based on haproxy instead of the
kernel, but I think this is worth fixing for other users.
--------------

Thanks,
Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/