Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector
From: Greentime Hu
Date: Tue Nov 09 2021 - 04:50:17 EST
Darius Rad <darius@xxxxxxxxxxxx> 於 2021年10月27日 週三 下午8:58寫道:
>
> On Tue, Oct 26, 2021 at 12:44:31PM +0800, Greentime Hu wrote:
> > Darius Rad <darius@xxxxxxxxxxxx> 於 2021年10月26日 週二 上午12:22寫道:
> > >
> > > On Mon, Oct 25, 2021 at 12:47:49PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius@xxxxxxxxxxxx> 於 2021年10月22日 週五 下午6:40寫道:
> > > > >
> > > > > On Fri, Oct 22, 2021 at 11:52:01AM +0800, Vincent Chen wrote:
> > > > > > On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius@xxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > > > > > > > Hello Darius,
> > > > > > > >
> > > > > > > > On Tue, 5 Oct 2021, Darius Rad wrote:
> > > > > > > >
> > > > > > > > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > > > > > > > Darius Rad <darius@xxxxxxxxxxxx> 於 2021年9月29日 週三 下午9:28寫道:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > > > > > > > Darius Rad <darius@xxxxxxxxxxxx> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> > > > > > > >
> > > > > > > > [ ... ]
> > > > > > > >
> > > > > > > > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > > > > > > > instructions are used?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > > > > > > > choose to enable and allocate memory for user space program is because
> > > > > > > > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > > > > > > > vector version and it is called very often by every process. So that
> > > > > > > > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > > > > > > > would like to use vector by default. If we disable it by default and
> > > > > > > > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > > > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > > > > > > > that.
> > > > > > > > > > >
> > > > > > > > > > > Do you have any evidence to support the assertion that almost every process
> > > > > > > > > > > would use vector operations? One could easily argue that the converse is
> > > > > > > > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > > > > > > > process will not be using it.
> > > > > > > > > >
> > > > > > > > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > > > > > > > vector extension now and this patchset is testing based on those
> > > > > > > > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > > > > > > > support upstreaming and we will also upstream the vector version glibc
> > > > > > > > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > > > > > > > see platform with vector support can use vector version mem* and str*
> > > > > > > > > > functions automatically based on ifunc and platform without vector
> > > > > > > > > > will use the original one automatically. These could be done to select
> > > > > > > > > > the correct optimized glibc functions by ifunc mechanism.
> > > > > > > >
> > > > > > > > In your reply, I noticed that you didn't address Greentime's response
> > > > > > > > here. But this looks like the key issue. If common library functions are
> > > > > > > > vector-accelerated, wouldn't it make sense that almost every process would
> > > > > > > > wind up using vector instructions? And thus there wouldn't be much point
> > > > > > > > to skipping the vector context memory allocation?
> > > > > > > >
> > > > > > >
> > > > > > > This issue was addressed in the thread regarding Intel AMX I linked to in a
> > > > > > > previous message. I don't agree that this is the key issue; it is one of a
> > > > > > > number of issues. What if I don't want to take the potential
> > > > > > > power/frequency hit for the vector unit for a workload that, at best, uses
> > > > > > > it for the occasional memcpy? What if the allocation fails, how will that
> > > > > >
> > > > > > Hi Darius,
> > > > > > The memcpy function seems not to be occasionally used in the programs
> > > > > > because many functions in Glibc use memcpy() to complete the memory
> > > > > > copy. I use the following simple case as an example.
> > > > > > test.c
> > > > > > void main(void) {
> > > > > > return;
> > > > > > }
> > > > > > Then, we compile it by "gcc test.c -o a.out" and execute it. In the
> > > > > > execution, the memcpy() has been called unexpectedly. It is because
> > > > > > many libc initialized functions will be executed before entering the
> > > > > > user-defined main function. One of the example is __libc_setup_tls(),
> > > > > > which is called by __libc_start_main(). The __libc_setup_tls() will
> > > > > > use memcpy() during the process of creating the Dynamic Thread Vector
> > > > > > (DTV).
> > > > > >
> > > > > > Therefore, I think the memcpy() is widely used in most programs.
> > > > > >
> > > > >
> > > > > You're missing my point. Not every (any?) program spends a majority of the
> > > > > time doing memcpy(), and even if a program did, all of my points are still
> > > > > valid.
> > > > >
> > > > > Please read the discussion in the thread I referenced and the questions in
> > > > > my prior message.
> > > > >
> > > >
> > > > Hi Darius,
> > > >
> > > > As I mentioned before, we want to treat vector ISA like a general ISA
> > > > instead of a specific IP. User program should be able to use it
> > > > transparently just like FPU.
> > > > It seems that the use case you want is asking user to use vector like
> > > > a specific IP, user program should ask kernel before they use it and
> > > > that is not what we want to do in this patchset.
> > > >
> > >
> > > Hi Greentime,
> > >
> > > Right.
> > >
> > > But beyond what I want to do or what you want to do, is what *should* Linux
> > > do? I have attempted to provide evidence to support my position. You have
> > > not responded to or addressed the majority of my questions, which is
> > > concerning to me.
> >
> > Hi Darius,
> >
> > What is your majority questions?
> >
>
> 1. How will memory allocation failures for context state memory be reported
> to user space?
it will return -ENOMEM for some cases or show the warning messages for
some cases.
We know it's not perfect, we should enhance that in the future, but
let's take an example: 256 bits vector length system. 256 bits * 32
registers /8 = 1KB.
> 2. How will a system administrator (i.e., the user) be able to effectively
> manage a system where the vector unit, which could have a considerable area
> and/or power impact to the system, has one or more of the following
> properties:
As I mentioned before,
We would like to let user use vector transparently just like FPU or
other extensions.
If user knows that this system supports vector and user uses intrinsic
functions or assembly codes or compiler generating vector codes, user
can just use it just like FPU.
If user doesn't know that whether this system support vector or not,
user can just use the glibc or ifunc in his own libraries to detect
vector support dynamically.
> a. A single vector unit shared among two or more harts,
>
> b. Additional power consumption when the vector unit is enabled and idle
> versus not being enabled at all,
>
> c. For a system which supports variable operating frequency, a reduction
> in the maximum frequency when the vector unit is enabled, and/or
>
> d. The inability to enter low power states and/or delays to low power
> states transitions when the vector unit is enabled.
We also don't support this kind of system(a single vector unit shared
among 2 or more harts) in this implementation. I'll add more
assumptions in the next version patches.
For the frequency or power issues, I'll also not treat them as a
special case since we want to treat vector like an normal extension of
ISA instead of a specific IP.
> 3. You contend that the RISC-V V-extension resembles ARM SVE/SVE2, at least
> more than Intel AMX. I do not agree, but nevertheless, why then does this
> patchset not behave similar to SVE? On arm64, SVE is only enabled and
> memory is only allocated on first use, *not* unconditionally for all tasks.
As we mentioned before, almost every user space program will use glibc
ld.so/libc.so and these libraries will use the vector version
optimization porting in a system with vector support.
That's why we don't let it trigger the first illegal instruction
exception because vector registers will be used at very beginning of
the program and the illegal instruction exception handling is also
heavier because we need to go to M-mode first than S-mode which means
we will need to save context and restore context twice in both M-mode
and S-mode. Since the vstate will be allocated eventually, why not
save these 2 context save/restore overhead in both M-mode and S-mode
for handling this illegal instruction exception. And if the system
doesn't support vector, glibc won't use the vector porting and the
Linux kernel won't allocate vstate for every process too.