Re: [PATCH] vfs: fix statfs64() does not handle errors

From: Li Wang
Date: Mon Nov 14 2016 - 06:08:46 EST


On Mon, Nov 07, 2016 at 11:03:11AM -0700, Andreas Dilger wrote:
> On Nov 7, 2016, at 3:21 AM, Li Wang <liwang@xxxxxxxxxx> wrote:
> >
> > statfs64() does NOT return -1 and setting errno to EOVERFLOW when some
> > variables(like: f_bsize) overflowed in the returned struct.
> >
> > reproducer:
> > step1. mount hugetlbfs with two different pagesize on ppc64 arch.
> >
> > $ hugeadm --pool-pages-max 16M:0
> > $ hugeadm --create-mount
> > $ mount | grep -i hugetlbfs
> > none on /var/lib/hugetlbfs/pagesize-16MB type hugetlbfs (rw,relatime,seclabel,pagesize=16777216)
> > none on /var/lib/hugetlbfs/pagesize-16GB type hugetlbfs (rw,relatime,seclabel,pagesize=17179869184)
> >
> > step2. compile & run this C program.
> >
> > $ cat statfs64_test.c
> >
> > #define _LARGEFILE64_SOURCE
> > #include <stdio.h>
> > #include <sys/statfs.h>
> >
> > int main()
> > {
> > struct statfs64 sb;
> > int err;
> >
> > err = statfs64("/var/lib/hugetlbfs/pagesize-16GB", &sb);
> > if (err)
> > return -1;
> >
> > printf("sizeof f_bsize = %d, f_bsize=%ld\n", sizeof(sb.f_bsize), sb.f_bsize);
> >
> > return 0;
> > }
> >
> > $ gcc -m32 statfs64_test.c
> > $ ./a.out
> > sizeof f_bsize = 4, f_bsize=0
> >
> > Signed-off-by: Li Wang <liwang@xxxxxxxxxx>
> > ---
> >
> > Notes:
> > This is my first patch to kernel fs part, I'm not sure if
> > this one useful, but just want someone have a look.
> >
> > thanks~
> >
> > fs/statfs.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/fs/statfs.c b/fs/statfs.c
> > index 083dc0a..849dde95 100644
> > --- a/fs/statfs.c
> > +++ b/fs/statfs.c
> > @@ -151,6 +151,23 @@ static int do_statfs64(struct kstatfs *st, struct statfs64 __user *p)
> > if (sizeof(buf) == sizeof(*st))
> > memcpy(&buf, st, sizeof(*st));
> > else {
> > + if (sizeof buf.f_bsize == 4) {
>
> Linux CodingStyle says this should be used like sizeof(buf.f_bsize).

agree.

>
> > + if ((st->f_blocks | st->f_bfree | st->f_bavail |
> > + st->f_bsize | st->f_frsize) &
> > + 0xffffffff00000000ULL)
> > + return -EOVERFLOW;
>
> I'm not sure I agree with this check. Sure, if sizeof(buf.f_bsize) == 4
> then the large st->f_bsize will overflow this field, and that is valid.

After thinking over, I feel that my fix in this patch is not right.

The reproducer.c running on ppc64 arch was build in 32bit, but it does
not call SYS_statfs64 in kernel. It calls compat_sys_statfs64 indeed.

# cat reproducer.c

#define _LARGEFILE64_SOURCE
#include <stdio.h>
#include <sys/statfs.h>
#include <sys/syscall.h>

int main()
{
struct statfs64 sb;
int err;

err = syscall(SYS_statfs64, "/var/lib/hugetlbfs/pagesize-16GB", sizeof(sb), &sb);
if (err)
return -1;

printf("sizeof f_bsize = %d, f_bsize=%ld\n", sizeof(sb.f_bsize), sb.f_bsize);
return 0;
}

# gcc reproducer.c -m32

# stap -e 'probe kernel.function("compat_sys_statfs64") {printf ("%s",
$$parms);}' -vvv &

# ./a.out
sizeof f_bsize = 4, f_bsize=0
# pathname=0x100006c4 sz=0x58 buf=0xff8a20b0


Guess the fix should be like:

diff --git a/fs/compat.c b/fs/compat.c
index bd064a2..3d923fd 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -253,7 +253,7 @@ static int put_compat_statfs(struct compat_statfs __user *ubuf, struct kstatfs *

static int put_compat_statfs64(struct compat_statfs64 __user *ubuf,
struct kstatfs *kbuf)
{
- if (sizeof ubuf->f_blocks == 4) {
+ if (sizeof ubuf->f_bsize == 4) {
if ((kbuf->f_blocks | kbuf->f_bfree | kbuf->f_bavail |
kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL)
return -EOVERFLOW;


I will test it and send a new patch.

Regards,
Li Wang

>
> However, that doesn't mean that large values for f_blocks, f_bfree, f_bavail
> should return an error. I assume you are concerned that the product of the
> large f_bsize and one of those values would overflow a 64-bit bytes value,
> but that is for userspace to worry about, since the values in the individual
> fields themselves are OK.
>
> We're already over 100PiB Lustre filesystems (2^57 bytes) today, and I
> don't want statfs() failing prematurely because userspace feels the need
> to multiply out the blocks and blocksize into bytes, instead of shifting
> the values to KB (which would allow filesystems up to 2^74-1024 bytes to
> be handled correctly in userspace).
>
> > + /*
> > + * f_files and f_ffree may be -1; it's okay to stuff
> > + * that into 32 bits
> > + */
> > + if (st->f_files != -1 &&
> > + (st->f_files & 0xffffffff00000000ULL))
> > + return -EOVERFLOW;
>
> > + if (st->f_ffree != -1 &&
> > + (st->f_ffree & 0xffffffff00000000ULL))
> > + return -EOVERFLOW;
>
> Why does sizeof(f_bsize) have anything to do with the number of free files?
> These two checks are just plain wrong, since f_files and f_ffree are 64-bit
> fields in struct statfs64.

right.

>
> Cheers, Andreas
>
> > + }
> > +
> > buf.f_type = st->f_type;
> > buf.f_bsize = st->f_bsize;
> > buf.f_blocks = st->f_blocks;
> > --
> > 1.8.3.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>