On Sat, 31 Dec 2011 20:23:31 -0500
Jeff Layton<jlayton@xxxxxxxxxx> wrote:
On Sat, 31 Dec 2011 15:41:32 +0200
Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote:
On 31/12/2011 2:49 ÎÎ, Konstantinos Skarlatos wrote:On ÎÎÎÎÎÏÎ, 31 ÎÎÎÎÎÎÏÎÎÏ 2011 1:59:22 ÎÎ, Jeff Layton wrote:http://pastebin.com/scgyDjhT and attached tcpdumpOn Fri, 30 Dec 2011 20:00:46 +0200
Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote:
On 30/12/2011 3:11 ÎÎ, Jeff Layton wrote:On Fri, 30 Dec 2011 11:04:59 +0200I did not get a cFYI output from that test, but i redid a
Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote:
On 29/12/2011 3:54 ÎÎ, Konstantinos Skarlatos wrote:I tried reproducing this here, but wasn't able to. Testing against myOn ÎÎÎÏÏÎ, 29 ÎÎÎÎÎÎÏÎÎÏ 2011 3:39:30 ÎÎ, Jeff Layton wrote:Attached is a tcp dump of my session.On Thu, 29 Dec 2011 12:30:18 +0200140 folders and 20 files
Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote:
On 29/12/2011 4:04 ÏÎ, Jeff Layton wrote:On Thu, 29 Dec 2011 02:08:57 +0200Windows XP service pack 2 (greek)
Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx> wrote:
I mount via cifs a windows XP share, df gives me correct sizes,Hmmm, maybe. What makes you think that it's related? What sort of
but when
I ls the mount point i get input/output error.
strace: http://pastebin.com/WXf8M1nu
mount --verbose -t cifs -o
username=administrator,password=blahblah
//192.168.0.11/jobs /mnt/backups/montaz/jobs
mount.cifs kernel mount options:
ip=192.168.0.11,unc=\\192.168.0.11\jobs,,ver=1,user=administrator,pass=********
df
//192.168.0.11/jobs 114464
105196 9268 92% /mnt/backups/montaz/jobs
ls /mnt/backups/montaz/jobs/
ls: reading directory /mnt/backups/montaz/jobs/: Input/output
error
total 0
the fun thing is that i can cd to a lower level directory, and ls
works
fine there! only the mount point has the problem
ls /mnt/backups/montaz/jobs/test
total 44K
drwxr-xr-x 1 root root 0 Apr 30 2010 blah blah/
......
kernel version 3.2rc7
this seems to be related to :
https://lkml.org/lkml/2011/8/1/427
Re: [3.0.0+][Regression][Bisected] CIFS: getdents() broken for
large dirs
server are you seeing this against?
How many files are in the directory?
xp box worked fine.
Most likely, the FIND_FILE responses are falling afoul of the code in
coalesce_t2 or check2ndT2. Unfortunately that code is pretty
complicated and I'm not certain what the problem actually is...
One thing that's interesting is that the total data being sent in the
request is rather large (16336 bytes). I think that's legit, but maybe
it's exceeding the end of the buffer once we try to coalesce it.
Would it be possible to get the cFYI output from this test?
mount-ls-umount and am attaching the tcpdump
Also here http://pastebin.com/J20uC6kU you can find the cifsFYI and the
contents of /proc/fs/cifs/DebugData form the same test
I do not know, and i am a bit afraid to downgrade this machine below 3.0
Is this a regression? Did it work with earlier kernels and only
recently start failing?
due to some changes arch linux has introduced recently. I can always set
up a few virtual machines though, and i can even request permission from
my company to give you shell access if you like. Which kernel versions
would you like me to test?
Ok, that tells us a little:
-------------------[snip]---------------------
[96268.787078] fs/cifs/cifssmb.c: In FindFirst for
[96268.787083] fs/cifs/transport.c: For smb_command 50
[96268.787086] fs/cifs/transport.c: Sending smb: total_len 88
...FIND_FIRST command is sent
[96268.787690] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.787697] fs/cifs/connect.c: missing 12048 bytes from transact2,
check next response
[96268.787865] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.787870] fs/cifs/connect.c: missing 12036 bytes from transact2,
check next response
[96268.788037] fs/cifs/connect.c: RFC1002 header 0x1104
[96268.788042] fs/cifs/connect.c: missing 12036 bytes from transact2,
check next response
[96268.788371] fs/cifs/connect.c: RFC1002 header 0xdb0
[96268.788375] fs/cifs/connect.c: missing 12888 bytes from transact2,
check next response
...all four parts of the response are collected here
[96268.788391] fs/cifs/transport.c: cifs_sync_mid_result: cmd=50
mid=12 state=16
...but the state at this point is MID_RESPONSE_MALFORMED
[96268.788395] fs/cifs/cifssmb.c: Error in FindFirst = -5
[96268.788397] fs/cifs/readdir.c: initiate cifs search rc -5
[96268.788398] fs/cifs/readdir.c: CIFS VFS: leaving cifs_readdir (xid
= 737644) rc = -5
...which makes readdir return -EIO
-------------------[snip]---------------------
Based on that, it looks like something in one of these frames caused
coalesce_t2() to return an error. I don't see the problem right offhand
in the capture, but T2 response handling is pretty complex so it can be
hard to see.
Would it be possible for you to rebuild your kernel (or just cifs.ko)
with this patch? Once you do that, rerun the test with cFYI turned up,
and it should help point out what the problem is.
Thanks,
Ok i am now rebuilding the kernel and will report when i have results.
Very helpful -- I think I see the bug now. Can you test this patch and
let me know if it fixes the problem? You'll need to back out the debug
patch first, and then apply this one. If it works, I'll plan to clean
this up and send to Steve for inclusion.
Thanks...
Actually...that patch is off by 4 bytes. Since we're so late in the
cycle too, we should probably keep extraneous changes to a minimum.
I'll probably propose this patch for 3.2 once you've tested it, and
will spin up a patch to clean up coalesce_t2 for 3.3.
If you're already testing the other one, then it's probably good enough
to verify the fix. If you haven't tested it yet though, then you should
probably test this one instead. Either way, let me know soon if you can
so we can get this into 3.2.
---------------------------[snip]----------------------------
[PATCH] cifs: fix bad buffer length check in coalesce_t2
The current check looks to see if the RFC1002 length is larger than
CIFSMaxBufSize and fails if it is. The buffer is actually larger than
that by MAX_CIFS_HDR_SIZE. Fix the check so that it doesn't fail due
to this.
This bug has been around for a long time, but the fact that we used to
cap the clients MaxBufferSize at the same level as the server tended
to paper over it. Commit c974befa changed that however.
Reported-by: Konstantinos Skarlatos<k.skarlatos@xxxxxxxxx>
Signed-off-by: Jeff Layton<jlayton@xxxxxxxxxx>
---
fs/cifs/connect.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 8cd4b52..27c4f25 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -282,7 +282,7 @@ static int coalesce_t2(struct smb_hdr *psecond, struct smb_hdr *pTargetSMB)
byte_count = be32_to_cpu(pTargetSMB->smb_buf_length);
byte_count += total_in_buf2;
/* don't allow buffer to overflow */
- if (byte_count> CIFSMaxBufSize)
+ if (byte_count> CIFSMaxBufSize + MAX_CIFS_HDR_SIZE - 4)
return -ENOBUFS;
pTargetSMB->smb_buf_length = cpu_to_be32(byte_count);