Re: [PATCH] virtio_ring: Fix the stale index in available ring
From: Michael S. Tsirkin
Date: Mon Mar 18 2024 - 03:51:02 EST
On Mon, Mar 18, 2024 at 09:41:45AM +1000, Gavin Shan wrote:
> On 3/18/24 02:50, Michael S. Tsirkin wrote:
> > On Fri, Mar 15, 2024 at 09:24:36PM +1000, Gavin Shan wrote:
> > >
> > > On 3/15/24 21:05, Michael S. Tsirkin wrote:
> > > > On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote:
> > > > > > > Yes, I guess smp_wmb() ('dmb') is buggy on NVidia's grace-hopper platform. I tried
> > > > > to reproduce it with my own driver where one thread writes to the shared buffer
> > > > > and another thread reads from the buffer. I don't hit the out-of-order issue so
> > > > > far.
> > > >
> > > > Make sure the 2 areas you are accessing are in different cache lines.
> > > >
> > >
> > > Yes, I already put those 2 areas to separate cache lines.
> > >
> > > >
> > > > > My driver may be not correct somewhere and I will update if I can reproduce
> > > > > the issue with my driver in the future.
> > > >
> > > > Then maybe your change is just making virtio slower and masks the bug
> > > > that is actually elsewhere?
> > > >
> > > > You don't really need a driver. Here's a simple test: without barriers
> > > > assertion will fail. With barriers it will not.
> > > > (Warning: didn't bother testing too much, could be buggy.
> > > >
> > > > ---
> > > >
> > > > #include <pthread.h>
> > > > #include <stdio.h>
> > > > #include <stdlib.h>
> > > > #include <assert.h>
> > > >
> > > > #define FIRST values[0]
> > > > #define SECOND values[64]
> > > >
> > > > volatile int values[100] = {};
> > > >
> > > > void* writer_thread(void* arg) {
> > > > while (1) {
> > > > FIRST++;
> > > > // NEED smp_wmb here
> > > __asm__ volatile("dmb ishst" : : : "memory");
> > > > SECOND++;
> > > > }
> > > > }
> > > >
> > > > void* reader_thread(void* arg) {
> > > > while (1) {
> > > > int first = FIRST;
> > > > // NEED smp_rmb here
> > > __asm__ volatile("dmb ishld" : : : "memory");
> > > > int second = SECOND;
> > > > assert(first - second == 1 || first - second == 0);
> > > > }
> > > > }
> > > >
> > > > int main() {
> > > > pthread_t writer, reader;
> > > >
> > > > pthread_create(&writer, NULL, writer_thread, NULL);
> > > > pthread_create(&reader, NULL, reader_thread, NULL);
> > > >
> > > > pthread_join(writer, NULL);
> > > > pthread_join(reader, NULL);
> > > >
> > > > return 0;
> > > > }
> > > >
> > >
> > > Had a quick test on NVidia's grace-hopper and Ampere's CPUs. I hit
> > > the assert on both of them. After replacing 'dmb' with 'dsb', I can
> > > hit assert on both of them too. I need to look at the code closely.
> > >
> > > [root@virt-mtcollins-02 test]# ./a
> > > a: a.c:26: reader_thread: Assertion `first - second == 1 || first - second == 0' failed.
> > > Aborted (core dumped)
> > >
> > > [root@nvidia-grace-hopper-05 test]# ./a
> > > a: a.c:26: reader_thread: Assertion `first - second == 1 || first - second == 0' failed.
> > > Aborted (core dumped)
> > >
> > > Thanks,
> > > Gavin
> >
> >
> > Actually this test is broken. No need for ordering it's a simple race.
> > The following works on x86 though (x86 does not need barriers
> > though).
> >
> >
> > #include <pthread.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <assert.h>
> >
> > #if 0
> > #define x86_rmb() asm volatile("lfence":::"memory")
> > #define x86_mb() asm volatile("mfence":::"memory")
> > #define x86_smb() asm volatile("sfence":::"memory")
> > #else
> > #define x86_rmb() asm volatile("":::"memory")
> > #define x86_mb() asm volatile("":::"memory")
> > #define x86_smb() asm volatile("":::"memory")
> > #endif
> >
> > #define FIRST values[0]
> > #define SECOND values[640]
> > #define FLAG values[1280]
> >
> > volatile unsigned values[2000] = {};
> >
> > void* writer_thread(void* arg) {
> > while (1) {
> > /* Now synchronize with reader */
> > while(FLAG);
> > FIRST++;
> > x86_smb();
> > SECOND++;
> > x86_smb();
> > FLAG = 1;
> > }
> > }
> >
> > void* reader_thread(void* arg) {
> > while (1) {
> > /* Now synchronize with writer */
> > while(!FLAG);
> > x86_rmb();
> > unsigned first = FIRST;
> > x86_rmb();
> > unsigned second = SECOND;
> > assert(first - second == 1 || first - second == 0);
> > FLAG = 0;
> >
> > if (!(first %1000000))
> > printf("%d\n", first);
> > }
> > }
> >
> > int main() {
> > pthread_t writer, reader;
> >
> > pthread_create(&writer, NULL, writer_thread, NULL);
> > pthread_create(&reader, NULL, reader_thread, NULL);
> >
> > pthread_join(writer, NULL);
> > pthread_join(reader, NULL);
> >
> > return 0;
> > }
> >
>
> I tried it on host and VM of NVidia's grace-hopper. Without the barriers, I
> can hit assert. With the barriers, it's working fine without hitting the
> assert.
>
> I also had some code to mimic virtio vring last weekend, and it's just
> working well. Back to our original issue, __smb_wmb() is issued by guest
> while __smb_rmb() is executed on host. The VM and host are running at
> different exception level: EL2 vs EL1. I'm not sure it's the cause. I
> need to modify my code so that __smb_wmb() and __smb_rmb() can be executed
> from guest and host.
It is thinkably possible that on grace-hopper barriers work
differently somehow. We need to find out more though.
Anyone from Nvidia can chime in?
--
MST