Re: [PATCH v4 1/2] cxl/region: Find free cxl decoder by device_for_each_child()

From: Dan Williams
Date: Tue Sep 10 2024 - 12:02:19 EST


Zijun Hu wrote:
[..]
> > So I wanted to write a comment here to stop the next person from
> > tripping over this dependency on decoder 'add' order, but there is a
> > problem. For this simple version to work it needs 3 things:
> >
> > 1/ decoders are added in hardware id order: done,
> > devm_cxl_enumerate_decoders() handles that
> >
>
> do not known how you achieve it, perhaps, it is not simpler than
> my below solution:
>
> finding a free switch cxl decoder with minimal ID
> https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@xxxxxxxxxxx/
>
> which has simple logic and also does not have any limitation related
> to add/allocate/de-allocate a decoder.
>
> i am curious why not to consider this solution ?

Because it leaves region shutdown ordering bug in place.

> > 2/ search for decoders in their added order: done, device_find_child()
> > guarantees this, although it is not obvious without reading the internals
> > of device_add().
> >
> > 3/ regions are de-allocated from decoders in reverse decoder id order.
> > This is not enforced, in fact it is impossible to enforce. Consider that
> > any memory device can be removed at any time and may not be removed in
> > the order in which the device allocated switch decoders in the topology.
> >
>
> sorry, don't understand, could you take a example ?
>
> IMO, the simple change in question will always get a free decoder with
> the minimal ID once 1/ is ensured regardless of de-allocation approach.

No, you are missing the fact that CXL hardware requires that decoders
cannot be sparsely allocated. They must be allocated consecutively and
in increasing address order.

Imagine a scenario with a switch port with three decoders,
decoder{A,B,C} allocated to 3 respective regions region{A,B,C}.

If regionB is destroyed due to device removal that does not make
decoderB free to be reallocated in hardware. The destruction of regionB
requires regionC to be torn down first. As it stands the driver does not
force regionC to shutdown and it falsely clears @decoderB->region making
it appear free prematurely.

So, while regionB would be the next decoder to allocate after regionC is
torn down, it is not a free decoder while decoderC and regionC have not been
reclaimed.