Re: 4.7.0-rc7 ext4 error in dx_probe

From: TÃrÃk Edwin
Date: Tue Aug 09 2016 - 03:19:47 EST


On 2016-08-09 05:37, Darrick J. Wong wrote:
> On Tue, Aug 09, 2016 at 12:13:01AM +0300, Török Edwin wrote:
>> On 2016-08-08 19:55, Darrick J. Wong wrote:
>>> On Mon, Aug 08, 2016 at 12:08:18PM -0400, Theodore Ts'o wrote:
>>>> On Sun, Aug 07, 2016 at 11:28:10PM -0700, Darrick J. Wong wrote:
>>>>>
>>>>> I have one lingering concern -- is it a bug that two processes could be
>>>>> computing the checksum of a buffer simultaneously? I would have thought ext4
>>>>> would serialize that kind of buffer_head access...
>>>>
>>>> Do we know how this is happening? We've always depended on the VFS to
>>>> provide this exclusion. The only way we should be modifying the
>>>> buffer_head at the same time if two CPU's are trying to modify the
>>>> directory at the same time, and that should _never_ be happening, even
>>>> with the new directory parallism code, unless the file system has
>>>> given permission and intends to do its own fine-grained locking.
>>>
>>> It's a combination of two things, I think. The first is that the
>>> checksum calculation routine (temporarily) set the checksum field to
>>> zero during the computation, which of course is a no-no. The patch
>>> fixes that problem and should go in.
>>
>> Thanks a lot for the patch.
>> I wrote a small testcase (see below) that triggers the problem quite soon on
>> my box with kernel 4.7.0, and seems to have survived so far with kernel
>> 4.7.0+patch.
>> When it failed it printed something like "readdir: Bad message".
>>
>> The drop caches part is quite important for triggering the bug, and might
>> explain why this bug was hard to reproduce: IIUC this race condition can
>> happen only if 2+ threads/processes try to access the same directory, and the
>> directory's inode is not in the cache (i.e. was never cached, or got kicked
>> out of the cache).
>
> Could you formulate this into an xfstest, please? It would be very useful to
> have this as a regression test.
>
> (Or attach a Signed-off-by and I'll take care of it eventually.)

I've attached a signed-off-by line and a copyright header (feel free to add yourself in the copyright header too):

Signed-off-by: Török Edwin <edwin@xxxxxxxxxx>

>> /*
>> $ gcc trigger.c -o trigger -pthread
>> $ ./trigger
>> */

/*
* Test concurrent readdir on uncached inodes
*
* Copyright (C) 2016 Skylable Ltd.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*/

>>
>> #include <stdio.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <dirent.h>
>> #include <string.h>
>> #include <stdlib.h>
>> #include <errno.h>
>> #include <pthread.h>
>> #include <unistd.h>
>> #include <fcntl.h>
>>
>> #define FILES 100000
>> #define THREADS 16
>> #define LOOPS 1000
>>
>> static void die(const char *msg)
>> {
>> perror(msg);
>> exit(EXIT_FAILURE);
>> }
>>
>> static void* list(void* arg)
>> {
>> for(int i=0;i<LOOPS;i++) {
>> DIR *d = opendir(".");
>> if (!d) {
>> die("opendir");
>> }
>> errno = 0;
>> while(readdir(d)) {}
>> if (errno) {
>> die("readdir");
>> }
>> closedir(d);
>> FILE *f = fopen("/proc/sys/vm/drop_caches", "w");
>> if (f) {
>> fputs("3", f);
>> fclose(f);
>> }
>> }
>> return NULL;
>> }
>>
>> int main()
>> {
>> pthread_t t[THREADS];
>>
>> if(mkdir("ext4test", 0755) < 0 && errno != EEXIST)
>> die("mkdir");
>> if(chdir("ext4test") < 0)
>> die("chdir");
>> for (unsigned i=0;i < FILES;i++) {
>> char name[16];
>> snprintf(name, sizeof(name), "%d", i);
>> int fd = open(name, O_WRONLY|O_CREAT, 0600);
>> if (fd < 0)
>> die("open");
>> close(fd);
>> }
>> for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) {
>> pthread_create(&t[i], NULL,list, NULL);
>> }
>> for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) {
>> pthread_join(t[i], NULL);
>> }
>> return 0;
>> }
>>
>>
>>
>> --
>> Edwin Török | Co-founder and Lead Developer
>>
>> Skylable open-source object storage: reliable, fast, secure
>> http://www.skylable.com
>


--
Edwin Török | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com