Re: WARNING in sysfs_remove_group
From: Dmitry Vyukov
Date: Fri Oct 27 2017 - 05:30:52 EST
On Fri, Oct 27, 2017 at 11:23 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Oct 27, 2017 at 11:10:08AM +0200, Dmitry Vyukov wrote:
>> On Fri, Oct 27, 2017 at 10:43 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Fri, Oct 27, 2017 at 01:29:31AM -0700, syzbot wrote:
>> >> Hello,
>> >>
>> >> syzkaller hit the following crash on
>> >> 4ed590271a65b0fbe3eb1cf828ad5af16603c8ce
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> >> compiler: gcc (GCC) 7.1.1 20170620
>> >> .config is attached
>> >> Raw console output is attached.
>> >>
>> >>
>> >>
>> >>
>> >> sysfs group 'loop' not found for kobject 'loop1'
>> >> ------------[ cut here ]------------
>> >> WARNING: CPU: 0 PID: 21947 at fs/sysfs/group.c:237
>> >> sysfs_remove_group+0x156/0x1a0 fs/sysfs/group.c:235
>> >> Kernel panic - not syncing: panic_on_warn set ...
>> >
>> > It's a warning that someone did something wrong with their code, any
>> > hint as to how you are triggering this?
>>
>>
>> There are some hints in the log. The warning says "Comm:
>> syz-executor1", which usually means that the warning was triggered by
>> the first preceding program labeled "executing program 1:". In this
>> case it's this one:
>>
>>
>> 2017/10/26 21:00:50 executing program 1:
>> getsockopt$inet_sctp6_SCTP_AUTO_ASCONF(0xffffffffffffffff, 0x84, 0x1e,
>> &(0x7f0000728000)=0x0, &(0x7f0000cc8000)=0x4)
>> mmap(&(0x7f0000000000/0xfe1000)=nil, 0xfe1000, 0x3, 0x32,
>> 0xffffffffffffffff, 0x0)
>> perf_event_open(&(0x7f0000a07000)={0x2, 0x78, 0xdb, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0xfe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x0, 0xffffffffffffffff,
>> 0xffffffffffffffff, 0x0)
>> mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32,
>> 0xffffffffffffffff, 0x0)
>> perf_event_open(&(0x7f00008a8000-0x78)={0x0, 0x78, 0xdc, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x0, 0xffffffffffffffff,
>> 0xffffffffffffffff, 0x0)
>> perf_event_open(&(0x7f000002f000-0x78)={0x0, 0x78, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0xd34, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
>> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x0, 0x0, 0xffffffffffffffff,
>> 0x0)
>> r0 = syz_open_dev$loop(&(0x7f000060f000-0xb)="2f6465762f6c6f6f702300",
>> 0x200000000000001, 0x800000102)
>> r1 = memfd_create(&(0x7f0000000000)="fff8", 0x0)
>> writev(r1, &(0x7f000005f000-0x90)=[{&(0x7f000008d000)="479ac0310b7c6b1fbf668c4ea110c3148a20ce155f1e909e4e6d0170137fd7e54f1516bc7a12a912f5e03a9264b0db2ec5e38d302349761abe4a49e3908adc",
>> 0x3f}, {&(0x7f0000842000-0x1000)="a88ea16465a8bb8f7edf2f5df1acb3fdfe4cc92f98b85d50410879df687a0348c62ec96530ce8f33c84088abff5e302fb8e9e90a4a812993d385b709b7d0ff42369ca37886eadbf5c8288383c02ba5c20f9bc216532d99ea7752c6263a143acc961d63eb08b7d19eaa86384883d333ca4d6e0955a3fbe38adf1b845f8eb9675afd785ed47f66237a6f7c96cae0336b27a4ec51557cdafffb70952daa2d781d6add47a5e015954586a97941f561b10ad8f348601db77d27d7724be43ece2fe6ae243ec1370f870110330f8926c781c0e388c68ce64263ce603e16741ed17c1048f71f614f2846ccaad9c8f295d89d3d3999548f4d0d35c955b5ba497291febd4f04209553c83da7215489f834887d4df09e99dcc6acc42b2ee73e0d07d6b7b17ceae2db3ca64b1b5b719127bc8de8ec3318faf1b42cd00ba90aabf52364118ae7faa38bb61748f92bc59cedc11e8f94fd59be548697d6adc63d67905c4700cb18aa5516270af8a55c076a2a5c28923bc50645fad5f0e0603c2838fcd956e9adc4aa65974a313b68aec078ec0991a344608c5c19425074147e1877c024d0fd51232e7b7bbaf49a310b724f7d2ac8c5c85b1222382121efc20bc138fbb5b097f5b7ec1ad0",
>> 0x1c3}], 0x2)
>> ioctl$LOOP_CHANGE_FD(r0, 0x4c00, r1)
>> sendfile(r0, r0, &(0x7f0000e59000)=0x0, 0x200000000000006)
>> ioctl$LOOP_CLR_FD(r0, 0x4c01)
>> ioctl$sock_inet_tcp_SIOCINQ(0xffffffffffffffff, 0x541b,
>> &(0x7f000099a000-0x4)=0x0)
>> getsockopt$inet_mtu(0xffffffffffffffff, 0x0, 0xa,
>> &(0x7f00001b0000-0x4)=0x0, &(0x7f0000d5a000-0x4)=0x4)
>> pipe(&(0x7f0000fc4000)={0x0, <r2=>0x0})
>> setsockopt$ALG_SET_KEY(0xffffffffffffffff, 0x117, 0x1,
>> &(0x7f0000316000-0x25)="", 0x0)
>> ioctl$EVIOCGSW(r1, 0x8040451b,
>> &(0x7f000048d000-0xb)="000000000000000000000000000000000000")
>> setsockopt(r2, 0x1, 0x9e7,
>> &(0x7f0000a82000)="",
>> 0x1000)
>>
>>
>> Syscalls related to loop device are:
>>
>> r0 = syz_open_dev$loop(&(0x7f000060f000-0xb)="2f6465762f6c6f6f702300",
>> 0x200000000000001, 0x800000102)
>> r1 = memfd_create(&(0x7f0000000000)="fff8", 0x0)
>> writev(r1, &(0x7f000005f000-0x90)=[{&(0x7f000008d000)="...", 0x3f},
>> {&(0x7f0000842000-0x1000)="...", 0x1c3}], 0x2)
>> ioctl$LOOP_CHANGE_FD(r0, 0x4c00, r1)
>> sendfile(r0, r0, &(0x7f0000e59000)=0x0, 0x200000000000006)
>> ioctl$LOOP_CLR_FD(r0, 0x4c01)
>>
>>
>> But there were other processes messing with loop 0 around the same
>> time. The preceding "executing program 3" and "executing program 2"
>> also mention syz_open_dev$loop.
>> And note that some syscalls in a program can be executed in parallel.
>> So unfortunately there were lots of things happening with the loop 0
>> (/dev/loopN is quite unfortunate interface; say memfd_create easily
>> allows each process to mess with own instance, but loops are global
>> and there are only 8 of them which does not even allow us to partition
>> them across processes).
>
> Ok, that's a mess.
>
> This kernel warning is there for the developer to say "hey, you did
> something dumb, fix your code", but then sysfs recovers and moves on as
> it's not a fatal error at all.
>
> I think this goes back to your other email about "how do we detect a
> real error warning or not". Right now we don't have a way to
> distinguish between warnings like this (where the kernel core is telling
> higher layers to not do foolish things, but it is a recoverable issue),
> or where a "real" warning happens that can cause real problems with the
> continued operation of the kernel.
>
> I don't know what to suggest here, maybe some way to distinguish between
> the two would make it easier for automated tests like what you are
> doing? Some different type of WARN_ON_DEVLOPER_IS_STUPID() message?
The overall problem is still an issue for us, which I would like to
resolve in future.
But I think if the problem with higher levels of kernel code (rather
than with user inputs, including insane concurrent sequences of
syscalls), it's still a problem with kernel code, for which a WARNING
is a reasonable thing. And an action point for such things would be to
forward this to higher-level code maintainers to fix the code to not
do stupid things with lower level APIs. If a firing WARNING is in a
source file, it does not mean that the root problem is also in this
file, it's just that this code detected the problem.