Re: [PATCH 0/5] *** Introduce new space allocation algorithm ***
From: Stephen Zhang
Date: Tue Dec 17 2024 - 20:31:53 EST
Dave Chinner <david@xxxxxxxxxxxxx> 于2024年11月26日周二 08:51写道:
>
> is simply restating what you said in the previous email that I
> explicitly told you didn't answer the question I was asking you.
>
> Please listen to what I'm asking you to do. You don't need to
> explain anything to me, I just want you to run an experiment and
> report the results.
>
> This isn't a hard thing to do: the inode32 filesystem should fill to
> roughly 50% before it really starts to spill to the lower AGs.
> Record and paste the 'xfs_spaceman -c "freesp -a X"' histograms for
> each AG when the filesystem is a little over half full.
>
> That's it. I don't need you to explain anything to me, I simply want
> to know if the inode32 allocation policy does, in fact, work the way
> it is expected to under your problematic workload.
>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
Hi, sorry for the delay.
Seeing that others(adding them to CC list) have also encountered this issue:
https://lore.kernel.org/all/20241216130551.811305-1-txpeng@xxxxxxxxxxx/
As an reference, maybe we should give out the result we got so far:
+---------------+--------+--------+--------+
| Space Used (%)| Normal | inode32| AF |
+---------------+--------+--------+--------+
| 30 | 35.11 | 35.25 | 35.11 |
| 41 | 57.35 | 57.58 | 55.96 |
| 46 | 71.48 | 71.74 | 54.04 |
| 51 | 88.40 | 88.68 | 49.49 |
| 56 | 100.00 | 100.00 | 43.91 |
| 62 | | | 37.00 |
| 67 | | | 28.12 |
| 72 | | | 16.32 |
| 77 | | | 19.51 |
+---------------+--------+--------+--------+
The raw data will be attached in the tail of the mail.
The first column represents the percentage of the space used.
The rest three columns represents the fragmentation of the free space,
which is the percentage of free extent in range [1,1] from the output
of "xfs_db -c 'freesp' $test_dev".
How to test the Normal vs AF yourself?
Apply the patches and follow the commands in:
https://lore.kernel.org/linux-xfs/20241104014439.3786609-1-zhangshida@xxxxxxxxxx/
How to test the inode32 yourself?
1. we need to do some hack to the kernel at first:
============
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 09dc44480d16..69fa9f8867df 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -253,9 +253,10 @@ xfs_set_inode_alloc_perag(
}
set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
- if (pag->pag_agno < max_metadata)
+ if (pag->pag_agno < max_metadata) {
+ pr_info("%s===agno:%d\n", __func__, pag->pag_agno);
set_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
- else
+ } else
clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
return true;
}
@@ -312,7 +313,7 @@ xfs_set_inode_alloc(
* sufficiently large, set XFS_OPSTATE_INODE32 if we must alter
* the allocator to accommodate the request.
*/
- if (xfs_has_small_inums(mp) && ino > XFS_MAXINUMBER_32)
+ if (xfs_has_small_inums(mp))
xfs_set_inode32(mp);
else
xfs_clear_inode32(mp);
==========
so that we can test inode32 in a small disk img and observe it in a
controllable way.
2. Do the same test as the method we used to test Normal vs AF, but with
a little change.
2.1. Create an 1g sized img file and format it as xfs:
dd if=/dev/zero of=test.img bs=1M count=1024
mkfs.xfs -f test.img
sync
2.2. Make a mount directory:
mkdir mnt
2.3. Run the auto_frag.sh script, which will call another scripts
To enable the inode32, you should change the mount option in frag.sh:
==========
- mount -o af1=1 $test_dev $test_mnt
+ mount -o inode32 $test_dev $test_mnt
==========
run:
./auto_frag.sh 1
And we are still hesitant about whether we should report these results since:
1. it's tested with the assumption that the hack that we did to the inode32
will have no impact on the estimation of the metadata preference method.
2. it's tested under an alternate-punching script instead of some real MySQL
workload.
And I am afraid that Dave will blame us for not doing exactly what you
told us to test. Sorry.:p
Maybe we should port the algorithm to a release version and do a few months
test with some users or database guys for the inode32 or the new algorithm
in a whole.
We should reply back at that time maybe.
And Tianxiang, would you mind working with us on the problem? Teamwork
will be quite efficient. We'll try our best to figure out a way to see how
to let everyone play an important role in this work.
Cheers,
Shida
===============Attachment 1: Normal=====================
test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount test.img mnt/
file:mnt//frag size:500MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 285M 676M 30% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 63630 63630 35.11
2048 4095 1 2923 1.61
32768 65536 2 114672 63.28
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount test.img mnt/
file:mnt//frag2 size:200MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 386M 575M 41% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 89127 89127 57.35
2048 4095 1 2923 1.88
8192 16383 1 14226 9.15
32768 65536 1 49144 31.62
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag3 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 436M 525M 46% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 101877 101877 71.48
2048 4095 1 2923 2.05
8192 16383 1 14226 9.98
16384 32767 1 23492 16.48
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag4 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 486M 475M 51% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 114579 114579 88.40
512 1023 1 811 0.63
8192 16383 1 14226 10.98
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount test.img mnt/
file:mnt//frag5 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 537M 424M 56% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 116730 116730 100.00
===============Attachment 2: inode 32=====================
test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount -o af1=1 test.img mnt/
file:mnt//frag size:500MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 285M 676M 30% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 63887 63887 35.25
2048 4095 1 2931 1.62
32768 65536 2 114407 63.13
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount -o af1=1 test.img mnt/
file:mnt//frag2 size:200MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 386M 575M 41% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 89487 89487 57.58
2048 4095 1 2931 1.89
8192 16383 1 13858 8.92
32768 65536 1 49144 31.62
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag3 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 435M 526M 46% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 102235 102235 71.74
2048 4095 1 2931 2.06
8192 16383 1 13858 9.72
16384 32767 1 23492 16.48
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag4 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 486M 475M 51% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 114937 114937 88.68
512 1023 1 819 0.63
8192 16383 1 13858 10.69
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag5 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 537M 424M 56% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 116730 116730 100.00
===============Attachment 3: AF=====================
test_dev:test.img test_mnt:mnt/ fize_size:512000KB
mount -o af1=1 test.img mnt/
file:mnt//frag size:500MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 285M 676M 30% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 63630 63630 35.11
2048 4095 1 2923 1.61
32768 65536 2 114672 63.28
test_dev:test.img test_mnt:mnt/ fize_size:204800KB
mount -o af1=1 test.img mnt/
file:mnt//frag2 size:200MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 385M 576M 41% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 86974 86974 55.96
2048 4095 1 2923 1.88
32768 65536 1 65528 42.16
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag3 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 436M 525M 46% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 77038 77038 54.04
32768 65536 1 65528 45.96
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag4 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 486M 475M 51% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 64186 64186 49.48
32768 65536 1 65528 50.52
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag5 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 536M 425M 56% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 51312 51312 43.91
2 3 11 22 0.02
32768 65536 1 65528 56.07
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag6 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 586M 375M 62% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 38486 38486 37.00
32768 65536 1 65528 63.00
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag7 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 637M 324M 67% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 25633 25633 28.12
32768 65536 1 65528 71.88
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag8 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 687M 274M 72% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 12783 12783 16.32
32768 65536 1 65528 83.68
test_dev:test.img test_mnt:mnt/ fize_size:102400KB
mount -o af1=1 test.img mnt/
file:mnt//frag9 size:100MB
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 960M 737M 224M 77% /data/proj/frag_test/mnt
umount test.img
from to extents blocks pct
1 1 12768 12768 19.51
8192 16383 1 16370 25.02
32768 65536 1 36295 55.47