Re: [PATCH] lseek.2: SYNOPSIS: Use correct types

From: Michael Kerrisk (man-pages)
Date: Sun Nov 22 2020 - 17:38:51 EST


Hi Alex,

On Sat, 21 Nov 2020 at 18:45, Alejandro Colomar (man-pages)
<alx.manpages@xxxxxxxxx> wrote:
>
> Hi Michael,
>
> I'm a bit lost in all the *lseek* pages.
>
> You had a good read some months ago, so you may know it better.
> I don't know which of those functions come from the kernel,
> and which come from glibc (if any).

It always takes me too long to remind myself of the details here :-(.

This time, I'll try to write what I (re)learned.

Inside the kernel (5.9 sources), in fs/read_write.c, we have:

[[
SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
{
return ksys_lseek(fd, offset, whence);
}

#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset,
unsigned int, whence)
{
return ksys_lseek(fd, offset, whence);
}
#endif

#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
defined(__ARCH_WANT_SYS_LLSEEK)
SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
unsigned long, offset_low, loff_t __user *, result,
unsigned int, whence)
{
...
}
#endif
]]

The main pieces of interest here are the first and last
SYSCALL_DEFINEn. The first is the "standard" lseek() system call that
exists on 64-bit and 32-bit architectures.

The problem on 32-bit architectures is that the off_t type is a 32-bit
type, but files can be bigger than 2GB (2**32-1). That's why 32-bit
kernels also provide the llseek() system call. It receives the new
offset in two 32-bit pieces (offset_high, offset_low), and returns the
new offset via a 64-bit off_t argument (result). (I forget the
reason why there are 32-bit and 64-bit "offset" args in the syscall.)

One more thing... In arch/x86/entry/syscalls/syscall_32.tbl,
we see the following line:

[[
140 i386 _llseek sys_llseek
]]

This is essentially telling us that 'sys_llseek' (the name generated
by SYSCALL_DEFINE5(llseek...)) is exposed to user-space as system call
number 140, and that system call number will (IIUC) be exposed in
autogenerated headers with the name "__NR__llseek" (i.e., "_llseek").
The "i386" is
telling us that this happens in i386 (32-bit Intel). There is nothing
equivalent on x86-64, because 64 bit systems don't need an _llseek
system call.

Now, in ancient times (let's say Linux 2.2), there was a more
transparent situation (but the effect was the same):

#define __NR__llseek 140

and that system call number was tied to the implementation by this definition
linux-2.2.26/arch/i386/kernel/entry.S:

.long SYMBOL_NAME(sys_llseek) /* 140 */

==

lseek64() is a C library function. It takes and returns a 64-bit
offset. It exists to support seeking in large (>2GB) files. Its
implementation is in the glibc source file
sysdeps/unix/sysv/linux/lseek64.c, where it calls _llseek(2)

Returning to the <unistd.h> header file, we have:

[[
#ifndef __USE_FILE_OFFSET64
extern __off_t lseek (int __fd, __off_t __offset, int __whence) __THROW;
#else
# ifdef __REDIRECT_NTH
extern __off64_t __REDIRECT_NTH (lseek,
(int __fd, __off64_t __offset, int __whence),
lseek64);
# else
# define lseek lseek64
# endif
#endif
#ifdef __USE_LARGEFILE64
extern __off64_t lseek64 (int __fd, __off64_t __offset, int __whence)
__THROW;
#endif
]]

The name "lseek64" is exposed if _LARGEFILE64_SOURCE (which triggers
__USE_LARGEFILE64) is defined. That name was part of the so-called
Transitional Large FIle Systems (LFS) API (see page 105 in my book),
which existed to support the use of 64-bit file offsets on 32 bit
systems. It provided a set of interfaces with names of the form
"xxxxx64()" (e.g., "lseek64")) which provided for 64-bit offsets;
those names coexisted with the traditional 32-bit APIs (e.g.,
"lseek").

Alternatively, the LFS specified a macro, _FILE_OFFSET_BITS=64 (which
triggers __USE_FILE_OFFSET64) as another way of exposing 64-bit-offset
functionality on 32 bit systems. In this case, the traditional API
names (e.g., "lseek") are redirected to the 64-bit implementations
(e.g., "lseek64");

> In the kernel I only found the lseek, llseek, and 32_llseek

I'd ignore 32_llseek -- I guess that's an arch-specific equivalent of
_llseek/llseek.

> (as you can see in the patch).
> So if any other prototype needs to be updated, please do so.
> Especially, have a look at lseek64(3),
> which I suspect needs the same changes I propose in that patch.

I think that no changes to the types are needed in lseek64(3). But
maybe some of the info in this mail should be captured in that manual
page.

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/