From: Nate Watterson <nwatterson@xxxxxxxxxx>
NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware,
which extends the standard ARM SMMU v3 IP to support multiple
VCMDQs with virtualization capabilities. In-kernel of host OS,
they're used to reduce contention on a single queue. In terms
of command queue, they are very like the standard CMDQ/ECMDQs,
but only support CS_NONE in the CS field of CMD_SYNC command.
This patch adds a new nvidia-grace-cmdqv file and inserts its
structure pointer into the existing arm_smmu_device, and then
adds related function calls in the arm-smmu-v3 driver.
In the CMDQV driver itself, this patch only adds minimal part
for host kernel support. Upon probe(), VINTF0 is reserved for
in-kernel use. And some of the VCMDQs are assigned to VINTF0.
Then the driver will select one of VCMDQs in the VINTF0 based
on the CPU currently executing, to issue commands.
+struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+{
+ struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv;
+ struct nvidia_grace_cmdqv_vintf *vintf0 = &cmdqv->vintf0;
+ u16 qidx;
+
+ /* Check error status of vintf0 */
+ if (!FIELD_GET(VINTF_STATUS, vintf0->status))
+ return &smmu->cmdq;
+
+ /*
+ * Select a vcmdq to use. Here we use a temporal solution to
+ * balance out traffic on cmdq issuing: each cmdq has its own
+ * lock, if all cpus issue cmdlist using the same cmdq, only
+ * one CPU at a time can enter the process, while the others
+ * will be spinning at the same lock.
+ */
+ qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf;
+ return &vintf0->vcmdqs[qidx];
+}