Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

From: David Hildenbrand
Date: Thu Oct 22 2020 - 05:19:36 EST


On 22.10.20 11:01, Greg KH wrote:
> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
>> On 22.10.20 10:40, David Laight wrote:
>>> From: David Hildenbrand
>>>> Sent: 22 October 2020 09:35
>>>>
>>>> On 22.10.20 10:26, Greg KH wrote:
>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>>>> From: David Laight <David.Laight@xxxxxxxxxx>
>>>>>>>>
>>>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>>>> much better code.
>>>>>>>>
>>>>>>>> Signed-off-by: David Laight <david.laight@xxxxxxxxxx>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
>>>>>>>> ---
>>>>>>>> fs/read_write.c | 179 ------------------------------------------------
>>>>>>>> lib/iov_iter.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> 2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>>>
>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>>>
>>>>>>> I can't really figure out what the regression is, only that this commit
>>>>>>> triggers a "large Android system binary" from working properly. There's
>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>>>> thing in the testing framework, so any hints that people can provide
>>>>>>> would be most appreciated.
>>>>>>
>>>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>>>> the functions involved are identical before and after that (just checked
>>>>>> that directly, by checking out the trees before and after, extracting two
>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>>>> after, resp.) and checking the diff between those.
>>>>>>
>>>>>> How certain is your bisection?
>>>>>
>>>>> The bisection is very reproducable.
>>>>>
>>>>> But, this looks now to be a compiler bug. I'm using the latest version
>>>>> of clang and if I put "noinline" at the front of the function,
>>>>> everything works.
>>>>
>>>> Well, the compiler can do more invasive optimizations when inlining. If
>>>> you have buggy code that relies on some unspecified behavior, inlining
>>>> can change the behavior ... but going over that code, there isn't too
>>>> much action going on. At least nothing screamed at me.
>>>
>>> Apart from all the optimisations that get rid off the 'pass be reference'
>>> parameters and strange conditional tests.
>>> Plenty of scope for the compiler getting it wrong.
>>> But nothing even vaguely illegal.
>>
>> Not the first time that people blame the compiler to then figure out
>> that something else is wrong ... but maybe this time is different :)
>
> I agree, I hate to blame the compiler, that's almost never the real
> problem, but this one sure "feels" like it.
>
> I'm running some more tests, trying to narrow things down as just adding
> a "noinline" to the function that got moved here doesn't work on Linus's
> tree at the moment because the function was split into multiple
> functions.
>
> Give me a few hours...

I might be wrong but

a) import_iovec() uses:
- unsigned nr_segs -> int
- unsigned fast_segs -> int
b) rw_copy_check_uvector() uses:
- unsigned long nr_segs -> long
- unsigned long fast_seg -> long

So when calling rw_copy_check_uvector(), we have to zero-extend the
registers used for passing the arguments. That's definitely done when
calling the function explicitly. Maybe when inlining something is messed up?

Just a thought ...

--
Thanks,

David / dhildenb