RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

From: Long Li
Date: Sat Aug 07 2021 - 14:29:19 EST


> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> On Thu, Aug 05, 2021 at 06:24:57PM +0000, Long Li wrote:
> > > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > > accelerated access to Microsoft Azure Blob for Azure VM
> > >
> > > On 8/5/21 12:00 AM, longli@xxxxxxxxxxxxxxxxx wrote:
> > > > From: Long Li <longli@xxxxxxxxxxxxx>
> > > >
> > > > Azure Blob storage [1] is Microsoft's object storage solution for
> > > > the cloud. Users or client applications can access objects in Blob
> > > > storage via HTTP, from anywhere in the world. Objects in Blob
> > > > storage are accessible via the Azure Storage REST API, Azure
> > > > PowerShell, Azure CLI, or an Azure Storage client library. The
> > > > Blob storage interface is not designed to be a POSIX compliant
> interface.
> > > >
> > > > Problem: When a client accesses Blob storage via HTTP, it must go
> > > > through the Blob storage boundary of Azure and get to the storage
> > > > server through multiple servers. This is also true for an Azure VM.
> > > >
> > > > Solution: For an Azure VM, the Blob storage access can be
> > > > accelerated by having Azure host execute the Blob storage requests
> > > > to the backend storage server directly.
> > > >
> > > > This driver implements a VSC (Virtual Service Client) for
> > > > accelerating Blob storage access for an Azure VM by communicating
> > > > with a VSP (Virtual Service
> > > > Provider) on the Azure host. Instead of using HTTP to access the
> > > > Blob storage, an Azure VM passes the Blob storage request to the
> > > > VSP on the Azure host. The Azure host uses its native network to
> > > > perform Blob storage requests to the backend server directly.
> > > >
> > > > This driver doesn't implement Blob storage APIs. It acts as a fast
> > > > channel to pass user-mode Blob storage requests to the Azure host.
> > > > The user-mode program using this driver implements Blob storage
> > > > APIs and packages the Blob storage request as structured data to
> > > > VSC. The request data is modeled as three user provided buffers
> > > > (request, response and data buffers), that are patterned on the
> > > > HTTP model used by existing Azure Blob clients. The VSC passes
> > > > those buffers to VSP for Blob
> > > storage requests.
> > > >
> > > > The driver optimizes Blob storage access for an Azure VM in two ways:
> > > >
> > > > 1. The Blob storage requests are performed by the Azure host to
> > > > the Azure Blob backend storage server directly.
> > > >
> > > > 2. It allows the Azure host to use transport technologies (e.g.
> > > > RDMA) available to the Azure host but not available to the VM, to
> > > > reach to Azure Blob backend servers.
> > > >
> > > > Test results using this driver for an Azure VM:
> > > > 100 Blob clients running on an Azure VM, each reading 100GB Block
> Blobs.
> > > > (10 TB total read data)
> > > > With REST API over HTTP: 94.4 mins Using this driver: 72.5 mins
> > > > Performance (measured in throughput) gain: 30%.
> > > >
> > > > [1]
> > > >
> > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo
> > > cs
> > > > .microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-
> blobs-
> > > intro
> > > >
> > >
> duction&amp;data=04%7C01%7Clongli%40microsoft.com%7C6ba60a78f4e74
> > > aeb0b
> > > >
> > >
> b108d95833bf53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> > > 378015
> > > >
> > >
> 92577579%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> > > V2luMzIiL
> > > >
> > >
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ab5Zl2cQdmUhdT3l
> > > SotDwMl
> > > > DQuE0JaY%2B1REPQ0%2FjXa4%3D&amp;reserved=0
> > >
> > > Is the ioctl interface the only user space interface provided by
> > > this kernel driver? If so, why has this code been implemented as a
> > > kernel driver instead of e.g. a user space library that uses vfio to
> > > interact with a PCIe device? As an example, Qemu supports many
> different virtio device types.
> >
> > The Hyper-V presents one such device for the whole VM. This device is
> > used by all processes on the VM. (The test benchmark used 100
> > processes)
> >
> > Hyper-V doesn't support creating one device for each process. We cannot
> use VFIO in this model.
>
> I still think this "model" is totally broken and wrong overall. Again, you are
> creating a custom "block" layer with a character device, forcing all userspace
> programs to use a custom library (where is it at?) just to get their data.

The Azure Blob library (with source code) is available in the following languages:
Java: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/storage/azure-storage-blob
JavaScript: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/storage/storage-blob
Python: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob
Go: https://github.com/Azure/azure-storage-blob-go
.NET: https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/storage/Azure.Storage.Blobs
PHP: https://github.com/Azure/azure-storage-php/tree/master/azure-storage-blob
Ruby: https://github.com/azure/azure-storage-ruby/tree/master/blob
C++: https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/storage#azure-storage-client-library-for-c

>
> There's a reason the POSIX model is there, why are you all ignoring it?

The Azure Blob APIs are not designed to be POSIX compatible. This driver is used
to accelerate Blob access for a Blob client running in an Azure VM. It doesn't attempt
to modify the Blob APIs. Changing the Blob APIs will break the existing Blob clients.

Thanks,
Long