Re: 2.2.15pre5: still very unstable

From: Andrea Arcangeli (andrea@suse.de)
Date: Thu Feb 03 2000 - 20:27:34 EST


[ answering to Chip too ]

On Thu, 3 Feb 2000, Manfred Spraul wrote:

>Andrea Arcangeli wrote:
>>
>> On Thu, 3 Feb 2000, Igor Mozetic wrote:
>>
>> >2.2.15pre5 keeps freezing when memory is near exhaustion.
>>
>> Try again with 2.2.14aa6 (with my alternate approch to the two
>> atomic-allocation-failed and GFP-called-in-non-runnable-state problems):
>>
>
>__pollwait() also calls GFP with !TASK_RUNNING.

The code and the basic design is robust against that.

As I just said there was a problem is GFP that could deadlock in schedule
and that can't happen anymore with this patch applyed on the top of
2.2.1[45] (see my posts on l-k for detailed explanation of why):

        ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.15pre4/GFP_hangs-1.gz

Then some other place that was relying on the dirty behaviour of GFP, has
to be fixed to move the schedule_yeild in the caller of course:

        ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.15pre4/FIN-GFP-loop-2.gz
        ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.15pre4/raid1-GFP-loop-1.gz

With the three above patches applyed the deadlock in GFP will go away and
all the patches floating around become automatically useless.

NOTE: the trap code in GFP it's still a bit interesting also for my way to
fix the deadlock, since since I am examining bit by bit all the places
found by the trap code to verify they are correct as I am expecting from
them.

Using GFP in the wrong way instead would mean to call GFP with __GFP_WAIT
set and assuming that the state of the task can't be clobbered after GFP
returned. But none of the case trapped by 2.2.15pre are been found by me
to be wrong and so none bug is been found by the trap code in GFP so far.

(BTW, the trap code is obviously wrong since it should simply make the
state a not clobbered information instead of running GFP in atomic mode
(see other my post on l-k too on this).)

The three mentioned patches are included into 2.2.14aa6 and I have not
included into my tree the other-(IMHO-wrong)-direction-patches that are
been instead merged in 2.2.15pre[45]. Thus 2.2.14aa6 has the potential
deadlock fixed (unlike the 2.2.15pre), and it also has fixed it in the
right way IMHO ;).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Feb 07 2000 - 21:00:10 EST