On Mon, Dec 29, 2014 at 10:34:32AM +0100, Christian KÃnig wrote:
Am 29.12.2014 um 09:16 schrieb Laurent Pinchart:
Hi Oded,
On Sunday 28 December 2014 13:36:50 Oded Gabbay wrote:
On 12/26/2014 11:19 AM, Laurent Pinchart wrote:
On Thursday 25 December 2014 14:20:59 Thierry Reding wrote:My thoughts exactly. If this was some extreme use case, than it would be
On Mon, Dec 22, 2014 at 01:07:13PM +0200, Oded Gabbay wrote:While I agree with you that probe deferral is the way to go, I believe
This small patch-set, was created to solve the bug described atI strongly object to this patch set. This makes assumptions about how
https://bugzilla.kernel.org/show_bug.cgi?id=89661 (Kernel panic when
trying use amdkfd driver on Kaveri). It replaces the previous patch-set
called [PATCH 0/3] Use workqueue for device init in amdkfd
(http://lists.freedesktop.org/archives/dri-devel/2014-December/074401.ht
ml)
That bug appears only when radeon, amdkfd and amd_iommu_v2 are compiled
inside the kernel (not as modules). In that case, the correct loading
order, as determined by the exported symbol used by each driver, is
not enforced anymore and the kernel loads them based on who was linked
first. That makes radeon load first, amdkfd second and amd_iommu_v2
third.
Because the initialization of a device in amdkfd is initiated by radeon,
and can only be completed if amdkfd and amd_iommu_v2 were loaded and
initialized, then in the case mentioned above, this initalization fails
and there is a kernel panic as some pointers are not initialized but
used nontheless.
To solve this bug, this patch-set moves iommu/ before gpu/ in
drivers/Makefile and also moves amdkfd/ before radeon/ in
drivers/gpu/drm/Makefile.
The rationale is that in general, AMD GPU devices are dependent on AMD
IOMMU controller functionality to allow the GPU to access a process's
virtual memory address space, without the need for pinning the memory.
That's why it makes sense to initialize the iommu/ subsystem ahead of
the gpu/ subsystem.
the build system influences probe order. That's bad because seemingly
unrelated changes could easily break this in the future.
We already have ways to solve this kind of dependency (driver probe
deferral), and I think you should be using it to solve this particular
problem rather than some linking order hack.
linkage ordering can still be used as an optimization to avoid deferring
probe in the most common cases. I'm thus not opposed to moving iommu/
earlier in link order (provided we can properly test for side effects, as
the jump is pretty large), but not as a replacement for probe deferral.
justified to solve it with probe deferral. But I think that for most common
cases, GPU are dependent on IOMMU and *not* vice-versa.
Fixing this through deferred probing sounds like the correct long term
solution to me as well.
But what Thierry is referring to here is probably the approach of returning
-EAGAIN from the probe method (at least that was the last status when I
looked into this).
-EPROBE_DEFER would be the one that I was referring to.
The problem with this approach is the interface design between radeon and
amdkfd. amdkfd simply doesn't have a probe method which gets called when the
hardware is detected and can return -EAGAIN. Instead amdkfd is called by
radeon after hardware initialization when it is way to late for such a
thing.
That sounds like a pretty brittle design. It sounds like nowhere in the
code you've encoded this dependency and you rely on symbols only to
ensure probe ordering.
Couldn't you simply make radeon check for availability of the IOMMU
early and defer probing if it's not there yet? Or if radeon depends on
the IOMMU via amdkfd, then perhaps calling into amdkfd to check for the
availability of the IOMMU would be a more correct representation of the
dependency.
BTW, my first try at solving this was to use probe deferral (usingTo clarify my position, I believe changing the link order can be a worthwhile
workqueue), but the feedback I got from Christian and Dave was that moving
iommu/ linkage before gpu/ was a much more simpler solution.
optimization, but I'm uncertain about the long term viability of that change
as a fix. Probe deferral has been introduced because not all probe ordering
issues can be fixed through link ordering, so we should fix the problem
properly.
This being said, if modifying the link order can help for now without
introducing negative side effects, it would only postpone the real fix, so I'm
not opposed to it.
Yeah, that sounds like the right approach to me as well.
I don't think that's a good approach at all. It doesn't encode the
dependency anywhere. All you have is some Makefile that lists a bunch of
directories. What happens if anybody was to change that order for some
other reason? If you're not Cc'ed on the patch and nobody else NAK's it
because they are accidentally aware of the dependency that patch is
going to break radeon.
In general I would prefer that modules compiled into the kernel load
by the order of their symbol dependency just like standalone modules
do.
That I generally agree with. But it can't be a replacement for properly
describing the runtime dependencies between modules. There can be other
reasons than link order that influence probe ordering. What if the IOMMU
driver for some reason gains deferred probe support and therefore
doesn't succeed on the first probe? Or only sometimes.
Properly describing the dependency is the only way to prevent any of
that.
Thierry