Re: [PATCH V2] MIPS: change type of asid_cache to unsigned long

From: Libin
Date: Fri May 30 2014 - 03:09:32 EST


On 2014/5/29 4:09, Aaro Koskinen wrote:
> Hi,
>
> On Tue, May 27, 2014 at 12:16:30PM +0800, Li Zefan wrote:
>> On 2014/5/21 13:36, Yong Zhang wrote:
>>> asid_cache must be unsigned long otherwise on 64bit system
>>> it will become 0 if the value in get_new_mmu_context()
>>> reaches 0xffffffff and in the end the assumption of
>>> ASID_FIRST_VERSION is not true anymore thus leads to
>>> more dangerous things.
>>
>> We should describe what problem this bug can lead to, which
>> will help people who encounter the same problem and google it.
>
> Please describe it, then. Even if the patch is already committed,
> googling would probably still find this e-mail thread.
>
> Thanks,
>
> A.
>
>

Problem description:
On our MIPS architecture product, after a long time running our business
service, a random cpu trigger the problem, that if running test cases
include the following code on this cpu will trigger bus error or
segment fault:
...
pid = fork();
if (pid < 0)
return 1;
if (0 == pid)
exit(0);
else
exit(0);
...

Root cause:
After doing a lot of fork/mmap/munmap operations, it will make the asid value
exceeds 0xffffffff in get_new_mmu_context function, which is truncated to 0:
|-get_new_mmu_context(struct mm_struct *mm, unsigned long cpu)
unsigned long asid = asid_cache(cpu); //if asid_cache(cpu) is 0xffffffff now
if (! ((asid += ASID_INC) & ASID_MASK) ) { //asid reaches 0x1 0000 0000
...
local_flush_tlb_all(); /* start new asid cycle */
if (!asid) /* fix version if needed */ //but here condition does not meet...
asid = ASID_FIRST_VERSION;
}
cpu_context(cpu, mm) = asid_cache(cpu) = asid; //and here cpu_context and asid_cache is truncated to 0

In do_fork()->dup_mmap(), adding write-protect flag for writable page but the
following tlb flush does not take effect, and breaks the normal COW:
do_fork()
|-copy_process()
|-copy_mm()
...
|-dup_mmap()
|-copy_page_range()
...
|-copy_one_pte()
...
if (is_cow_mapping(vm_flags)) {
ptep_set_wrprotect(src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
...
|-flush_tlb_mm(oldmm)
|-local_flush_tlb_mmïï
if (cpu_context(cpu, mm) != 0) {//cpu_context is 0, no tlb flush
drop_mmu_context(mm, cpu);
}

In addition, the condition ((cpu_context(cpu, next) ^ asid_cache(cpu))
& ASID_VERSION_MASK) can not be met in switch_mm(), and the tlb flush operation
can not be completed during the process switch.
|-switch_mm()
...
/* Check if our ASID is of an older version and thus invalid */
if ((cpu_context(cpu, next) ^ asid_cache(cpu)) & ASID_VERSION_MASK)
get_new_mmu_context(next, cpu);
write_c0_entryhi(cpu_asid(cpu, next));
...

In short, due to the truncation operation caused by inappropriate type conversion,
making tlb flush failure, causing problems of COW, triggering bus error or segment fault.

Thanks,
Libin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/