Re: recent -git: BUG in free_thread_xstate

From: Vegard Nossum
Date: Wed Jul 23 2008 - 17:36:28 EST


On Wed, Jul 23, 2008 at 10:28 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> On Wed, Jul 23, 2008 at 10:23 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
>> My test is basically stressing the network and running CPU hotplug at
>> the same time.
>
> FWIW, a third run gives us this additional clue before going down with
> the first error I posted in this thread:
>
> =============================================================================
> BUG task_struct: Poison overwritten
> -----------------------------------------------------------------------------
> INFO: 0xf3d00000-0xf3d0006b. First byte 0x1 instead of 0x6b

Note that the number of overwritten bytes is exactly 0x6b. This sounds
VERY much like a use-after-free, e.g. maybe something loaded 0x6b into
the "size" parameter for memcpy().

> INFO: Allocated in copy_process+0x68/0x1130 age=4 cpu=0 pid=4338
> INFO: Freed in free_task+0x2c/0x30 age=2 cpu=0 pid=4

Pid 4 seems to always be ksoftirqd/0 on this machine.

> INFO: Slab 0xc1c25c00 objects=8 used=3 fp=0xf3d00000 flags=0x400020c3
> INFO: Object 0xf3d00000 @offset=0 fp=0xf3d03fc0
> Object 0xf3d00000: 01 40 66 00 00 16 ec ee ad b9 00 1c 26 8a 70 f8
> .@xxxx<EC><U+EB79>..&.p<F8>

That's the "magic number": 0x00664001.

Why would this always get written in this position of the task struct?

> Object 0xf3d00010: 08 00 45 00 00 54 00 00 40 00 40 01 b7 e8 c0 a8
> ..E..T..@.@.<B7><E8><C0><A8>
> Object 0xf3d00020: 00 c4 c0 a8 00 ac 08 00 6e c0 df 24 55 33 75 af
> .<C4><C0><A8>.<AC>..n<C0><DF>$U3u<AF>
> Object 0xf3d00030: 87 48 69 ec 03 00 08 09 0a 0b 0c 0d 0e 0f 10 11
> .Hi<EC>............
> Object 0xf3d00040: 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21
> ...............!
> Object 0xf3d00050: 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31
> "#$%&'()*+,-./01
> Object 0xf3d00060: 32 33 34 35 36 37 89 e0 c8 4a fb e0 6b 6b 6b 6b
> 234567.<E0><C8>J<FB><E0>kkkk

Why is it writing the sequence of numbers from 0x08 to 0x37 here?

Also, the last line disassembles to this:

0: 89 e0 mov %esp,%eax
2: c8 4a 4b e0 enterq $0x4b4a,$0xe0


...Additional clues may be found... maybe :-)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/