Re: Regression due to "mm: fix-up zone present pages"

From: Bob Liu
Date: Wed Nov 14 2012 - 08:24:35 EST


On Wed, Nov 14, 2012 at 6:09 PM, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> Hi all,
>
> Our QA noticed a regression in one of our i915/GEM testcases in 3.7:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=56859
>
> Direct link to dmesg of the machine:
> https://bugs.freedesktop.org/attachment.cgi?id=70052 Note that the
> machine is 32bit, which seems to be important since Chris Wilson
> confirmed the bug on his 32bit Sandybridge machine, whereas mine here
> with a 64bit kernel works flawlessly.
>
> The testcase is gem_tiled_swapping:
>
> http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/tree/tests/gem_tiled_swapping.c
>
> Quick high-level description of the workload:
>
> It allocates a working set larger than available memory, then fills it
> by writing it through the gpu gart (required to get a linear view of
> tiled buffers) and afterwards reads it to check whether anything got
> corrupted. Since the working set is too large to fit into ram, this
> will force all buffers through swap. We've written this testcase to
> exercise the reswizzle swapin path since some platforms have a tiling
> layout depending upon physical pfn (awesome feature btw), but not snb.
> So within the kernel this workload simply grabs the backing storage
> from shmemfs with shmem_read_mapping_page_gfp and then binds them into
> the gpu pagetables (the GTT). This happens in the i915_gem_fault
> fucntion. Unbinding in this workload happens either directly (if the
> gem code can't get enough memory) or through our shrinker
> (i915_gem_inactive_shrink). Swapout is then left to shmemfs to handle.
> All the above stuff is in drivers/gpu/drm/i915_gem.c
>
> Testcase fails because it detects a mismatch between what has been
> written and what has been read back.
>
> Our qa people bisected the regression to
>
> commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
> Author: Jianguo Wu <wujianguo@xxxxxxxxxx>
> Date: Mon Oct 8 16:33:06 2012 -0700
>
> mm: fix-up zone present pages
>
> and confirmed the revert on top of the latest drm-intel-nightly branch
> (which is based on top of 3.7-rc2 and contains the -next stuff for
> 3.8). They've also tested the for-QA branch which had latest Linus
> upstream merged in, which did not fix the problem. For reference the
> intel trees are at (but I don't think it matters really that it's not
> plain upstream, nothing really changed in the relevant i915/gem paths
> compared to upstream):
>
> http://cgit.freedesktop.org/~danvet/drm-intel
>
> I have no idea how that early boot zone init fix could even corrupt
> swapping in such a fashion, so ideas highly welcome. QA people are
> cc'ed, and hopefully I haven't missed anyone else on the cc list.
>

You can take a look at this thread:
[PATCH] mm: fix a regression with HIGHMEM introduced by changeset 7f1290f2f2a4d
http://lkml.org/lkml/2012/11/5/866

I think it's the same problem.

--
Regards,
--Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/