preempt_count overflow in CONFIG_PREEMPT

From: Minchan Kim
Date: Tue Apr 19 2016 - 02:58:08 EST


Hello Ingo, Peter.

I am implementing non-lru page migration and preparing v4 to resend.
https://lkml.org/lkml/2016/3/30/56

Although design was changed from v3, my issue I will say from now on is
still same so I think it's not hard to understand this problem with v3
although I didn't send v4 yet. :)

My problem is zsmalloc part for supporting page migration.
The zsmalloc stores several compressed pages in a page. Let's say
compressed page as 'object'.
If we are luck, we could store 113 objects(because minimum slot size is
36 byte) in a page.

If a page has internal fragmentation, zsmalloc try to migrate a object
from A page to B page. We call it as 'object migration'.
To prevent access from user during the object migration, we uses
spin lock in the atomic path to save memory space. Thus, it's object
granularity so user can access other objects in the page.
(Exactly speaking, it's not a spin_lock but owned-invented weired
bit spin-lock with test_and_set_bit in while loop. I know it's bad
buggy mess so I will change it with regular bit_spin_lock but the
issue is still there). During object migration, the spin lock will
be nested twice. One is source object, the othere is destination object.

Let's return back to the issue.
This time, not object but page migration, step is as follows.

migration trial A page to B page.
B is newly allocated page so it's empty.

1. freeze every objects in A page
for object in a page
bit_spin_lock(object)

2. memcpy(B, A, PAGE_SIZE);

3. unfreeze every objects in A page
for object in a page
bit_spin_unlock(object)

4. put_page(A);

The logic is rather staightforward, I guess. :)
Here, the problem is that unlike object migration, page migration
needs to prevent all objects access in a page all at once before step 2.
So, if we are luck, we can increase preempt_count as 113 every CPU so
easily preempt_count_add emits spinlock count overflow in
DEBUG_LOCKS_WARN_ON if we are multiple CPUs.(My machine is 12 CPU).

I think there are several choices to fix it but I'm not sure what's
the best so I want to hear your opinion.

1. increase preempt_count size?
2. support bit_spin_lock_no_preempt/bit_spin_unlock_no_preempt?
3. redesign zsmalloc page migration locking granularity?

I want to avoid 3 if possible because such design will make code
very complicated and may hurt scalabitity and performance, I guess.

I guess 8bit for PREEMPT_BITS is too small for considering the
number of CPUs in recent computer system?
I hope I'm not alone to see this issue until now. :)

Thanks.