On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:
在 2024/11/11 22:39, Chuck Lever III 写道:
On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
I'm in the cc list ,so I assume you saw my set, then I don't know why
you're ignoring my concerns.
1) next_offset is 32-bit and can overflow in a long-time running
machine.
2) Once next_offset overflows, readdir will skip the files that offset
is bigger.
I'm sorry, I'm a little busy these days, so I haven't responded to this
series of emails.
In that case, that entry won't be visible via getdents(3)
until the directory is re-opened or the process does an
lseek(fd, 0, SEEK_SET).
Yes.
That is the proper and expected behavior. I suspect you
will see exactly that behavior with ext4 and 32-bit
directory offsets, for example.
Emm...
For this case like this:
1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
2. open /tmp/dir with fd1
3. readdir and get /tmp/dir/file1
4. rm /tmp/dir/file2
5. touch /tmp/dir/file2
4. loop 4~5 for 2^32 times
5. readdir /tmp/dir with fd1
For tmpfs now, we may see no /tmp/dir/file2, since the offset has been
overflow, for ext4 it is ok... So we think this will be a problem.
I constructed a simple test program using the above steps:
/*
* 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
* 2. open /tmp/dir with fd1
* 3. readdir and get /tmp/dir/file1
* 4. rm /tmp/dir/file2
* 5. touch /tmp/dir/file2
* 6. loop 4~5 for 2^32 times
* 7. readdir /tmp/dir with fd1
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
static void list_directory(DIR *dirp)
{
struct dirent *de;
errno = 0;
do {
de = readdir(dirp);
if (!de)
break;
printf("d_off: %lld\n", de->d_off);
printf("d_name: %s\n", de->d_name);
} while (true);
if (errno)
perror("readdir");
else
printf("EOD\n");
}
int main(int argc, char **argv)
{
unsigned long i;
DIR *dirp;
int ret;
/* 1. */
ret = mkdir("/tmp/dir", 0755);
if (ret < 0) {
perror("mkdir");
return 1;
}
ret = creat("/tmp/dir/file1", 0644);
if (ret < 0) {
perror("creat");
return 1;
}
close(ret);
ret = creat("/tmp/dir/file2", 0644);
if (ret < 0) {
perror("creat");
return 1;
}
close(ret);
/* 2. */
errno = 0;
dirp = opendir("/tmp/dir");
if (!dirp) {
if (errno)
perror("opendir");
else
fprintf(stderr, "EOD\n");
closedir(dirp);
return 1;
}
/* 3. */
errno = 0;
do {
struct dirent *de;
de = readdir(dirp);
if (!de) {
if (errno) {
perror("readdir");
closedir(dirp);
return 1;
}
break;
}
if (strcmp(de->d_name, "file1") == 0) {
printf("Found 'file1'\n");
break;
}
} while (true);
/* run the test. */
for (i = 0; i < 10000; i++) {
/* 4. */
ret = unlink("/tmp/dir/file2");
if (ret < 0) {
perror("unlink");
closedir(dirp);
return 1;
}
/* 5. */
ret = creat("/tmp/dir/file2", 0644);
if (ret < 0) {
perror("creat");
fprintf(stderr, "i = %lu\n", i);
closedir(dirp);
return 1;
}
close(ret);
}
/* 7. */
printf("\ndirectory after test:\n");
list_directory(dirp);
/* cel. */
rewinddir(dirp);
printf("\ndirectory after rewind:\n");
list_directory(dirp);
closedir(dirp);
return 0;
}
Does that not directly address your concern? Or do you
mean that Erkun's patch introduces a new issue?
Yes, to be honest, my personal feeling is a problem. But for 64bit, it may
never been trigger.
I ran the test program above on this kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-testing
Note that it has a patch to restrict the range of directory offset
values for tmpfs to 2..4096.
I did not observe any unexpected behavior after the offset values
wrapped. At step 7, I can always see file2, and its offset is always
4. At step "cel" I can see all expected directory entries.
I tested on v6.12-rc7 with the same range restriction but using
Maple tree and 64-bit offsets. No unexpected behavior there either.
So either we're still missing something, or there is no problem. My
only theory is maybe it's an issue with an implicit integer sign
conversion, and we should restrict the offset range to 2..S32_MAX.
I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).
If there is a problem here, please construct a reproducer
against this patch set and post it.
Invitation still stands: if you have a solid reproducer, please post
it.