Re: Kernel crash in free_pipe_info()

From: Linus Torvalds
Date: Tue Oct 31 2017 - 15:00:28 EST

Next message: Long Li: "[PATCH] hv: kvp: Avoid reading past allocated blocks from KVP file"
Previous message: Waiman Long: "[PATCH v8 1/6] lib/dlock-list: Distributed and lock-protected lists"
In reply to: Al Viro: "Re: Kernel crash in free_pipe_info()"
Next in thread: Cong Wang: "Re: Kernel crash in free_pipe_info()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 30, 2017 at 9:44 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Oct 30, 2017 at 07:08:46PM -0700, Linus Torvalds wrote:
>>
>> Well, they're at 8(%rax), except for that last case.
>
> 0x10(%rax)?

Duh, yes.

>> Except the offset is that %r12*0x28+0x10, so we're talking a byte
>> offset of 330 bytes into the allocation, and apparently the eight
>> previous (0-7) iterations were fine.
>>
>> Which is really odd.
>
> I wonder what pipe->buffers is equal to here...

Sadly, we never even bother loading it so it doesn't show up in the
register state, we just iterate over the whole table.

The (one) ppc oops that looks like it might be the same issue has a
totally different pattern. Instead of having that "error number"
looking thing as the invalid pointer base, it has the magic number
from a spinlock. And rather than being about "pipe->bufs[]" array,
it's the pipe pointer itself that seems corrupted, and thus the oops
happens in the account_pipe_buffers() code instead of in the loop over
the buffers.

Of course, both are consistent with that "pipe_inode_info" simply
having been overwritten by something else (possibly, but not
necessarily, due to a use-after-free).

> FWIW, I would try to slap
> if (buf->ops && (unsigned long)buf->ops <= 0xffffffff)
> dump the living hell out of that thing
> and see what it catches...

Actually, I'm looking at *another* error path - the one in named pipes.

Named pipes are why we do that nasty "inode->i_pipe" thing - if it was
for just the regular pipes, we'd be able to just do
file->f_private_data and be done with it. But named pipes have to have
the pipe data associated with a particular inode.

And that code actually does look wrong.

Look at fifo_open(): it increments the pipe->files as it sets
filp->private_data to point to the pipe_inode_info. Good so far.

But look at the error case. It does that put_pipe_info() to release it
again, but filp->private_data still contains the pipe_inode_info
pointer.

So what happens on a failed open of a named pipe? The *normal* code
all is very careful to _not_ use "fput()", but instead use
"put_filp(f)", which will just free the file pointer.

But what if somebody does "vfs_open()" on one of those things, and
then does "fput()" in the failure case?

In "do_last()" we have that FILE_OPENED protection:

if (unlikely(error) && (*opened & FILE_OPENED))
fput(file);

and path_openat() is again very careful to then use put_filp(file); if
FILE_OPENED was never set. And do_o_path() does the same.

I'm not seeing anybody who does the wrong thing, but there's a number
of ways to get this entirely wrong, and I worry some path does.

I would be a *lot* happier if we didn't have that very subtle
fput()-vs-put_filp() issue going on.

Again, I cannot see anything wrong, but this feels very very fragile to me.

Linus

Next message: Long Li: "[PATCH] hv: kvp: Avoid reading past allocated blocks from KVP file"
Previous message: Waiman Long: "[PATCH v8 1/6] lib/dlock-list: Distributed and lock-protected lists"
In reply to: Al Viro: "Re: Kernel crash in free_pipe_info()"
Next in thread: Cong Wang: "Re: Kernel crash in free_pipe_info()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]