[RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache

From: Kirill A. Shutemov
Date: Mon Feb 17 2014 - 13:39:36 EST


I've rewrote it from scratch since v1. Locking multiple pages at once was
not that great idea. lock_page() is mess. It seems we don't have good
locking ordering rules for lock_page() vs. lock_page() and it triggers
deadlocks.

Now we have ->fault_nonblock() to ask filesystem for a page, if it's
reachable without blocking. We request one page a time. It's not terribly
efficient and I will probably re-think the interface once again to expose
iterator or something...

Synthetic tests shows pretty impressive numbers: time to complete is
reduced by 80% on sequential multi-threaded access to a file.

Unfortunately, scripting use-case doesn't show any change. I've tried git
test-suite, kernel build and kernel rebuild: no significant difference.

I will try reduce FAULT_AROUND_PAGES to see if it makes any difference.

One important use-case which is not covered is shmem/tmpfs: we need to
implement separate ->fault_nonblock() for it.

Yet another feature to implement is MAP_POPULATE|MAP_NONBLOCK on top of
->fault_nonblock(). It should be useful for dynamic linker for example.

Benchmark data is below.

Any comments?

=========================================================================

Git test-suite make -j60 test:

Base:
1,529,278,935,718 cycles ( +- 0.23% ) [100.00%]
799,283,349,594 instructions # 0.52 insns per cycle
# 3.14 stalled cycles per insn ( +- 0.12% ) [100.00%]
2,509,889,276,415 stalled-cycles-frontend # 164.12% frontend cycles idle ( +- 0.18% )
115,493,976 minor-faults ( +- 0.00% ) [100.00%]
1 major-faults

59.686645587 seconds time elapsed ( +- 0.30% )
Patched:
1,613,998,434,527 cycles ( +- 0.24% ) [100.00%]
865,776,308,589 instructions # 0.54 insns per cycle
# 3.07 stalled cycles per insn ( +- 0.18% ) [100.00%]
2,661,812,494,310 stalled-cycles-frontend # 164.92% frontend cycles idle ( +- 0.17% )
47,428,068 minor-faults ( +- 0.00% ) [100.00%]
1 major-faults ( +- 50.00% )

60.241766430 seconds time elapsed ( +- 0.85% )

Run make -j60 on clean allmodconfig kernel tree:

Base:
19,511,808,321,876 cycles [100.00%]
17,551,075,994,147 instructions # 0.90 insns per cycle
# 1.55 stalled cycles per insn [100.00%]
27,227,785,020,522 stalled-cycles-frontend # 139.55% frontend cycles idle
268,039,365 minor-faults [100.00%]
0 major-faults

132.830612471 seconds time elapsed
Patched:
19,421,069,663,037 cycles [100.00%]
17,587,643,648,161 instructions # 0.91 insns per cycle
# 1.54 stalled cycles per insn [100.00%]
27,063,152,011,598 stalled-cycles-frontend # 139.35% frontend cycles idle
193,550,437 minor-faults [100.00%]
0 major-faults

132.851823758 seconds time elapsed

Run make -j60 on already built allmodconfig kernel tree:

Base:
286,539,116,594 cycles ( +- 0.05% ) [100.00%]
386,447,067,412 instructions # 1.35 insns per cycle
# 0.96 stalled cycles per insn ( +- 0.00% ) [100.00%]
372,549,722,846 stalled-cycles-frontend # 130.02% frontend cycles idle ( +- 0.04% )
4,967,540 minor-faults ( +- 0.06% ) [100.00%]
0 major-faults

27.215434226 seconds time elapsed ( +- 0.18% )
Patched:
286,435,472,207 cycles ( +- 0.07% ) [100.00%]
388,217,937,147 instructions # 1.36 insns per cycle
# 0.96 stalled cycles per insn ( +- 0.01% ) [100.00%]
373,681,951,018 stalled-cycles-frontend # 130.46% frontend cycles idle ( +- 0.10% )
2,285,563 minor-faults ( +- 0.26% ) [100.00%]
0 major-faults

27.292854546 seconds time elapsed ( +- 0.29% )

Access 16g file sequential:

Threads Base, seconds Patched, seconds Diff, %
1 8.284232583 5.424205564 -34,52
8 18.338648302 6.330400940 -65.48
32 54.004043136 10.774241948 -80.04
60 88.928810991 16.571960471 -81.36
120 144.729413453 28.875873029 -80.04

Access 1g file random:

Threads Base, seconds Patched, seconds Diff, %
1 18.247095738 15.003616508 -17.77
8 15.226716773 14.239478275 -6.48
32 22.080582106 16.797777162 -23.92
60 25.759900660 22.768611526 -11.61
120 44.443953873 38.197095497 -14.05

Below is `perf stat' output for the tests above.

=========================================================================

# usemem -t <threads> -f testfile -o 16g #

## 1 thread ##
### base ###
28,284,111,426 cycles ( +- 1.53% ) [100.00%]
44,120,803,329 instructions # 1.56 insns per cycle
# 1.02 stalled cycles per insn ( +- 0.00% ) [100.00%]
45,148,666,958 stalled-cycles-frontend # 159.63% frontend cycles idle ( +- 1.89% )
4,194,709 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

8.284232583 seconds time elapsed ( +- 1.67% )
### patched ###
18,615,784,628 cycles ( +- 8.46% ) [100.00%]
41,590,187,000 instructions # 2.23 insns per cycle
# 0.66 stalled cycles per insn ( +- 0.02% ) [100.00%]
27,374,671,499 stalled-cycles-frontend # 147.05% frontend cycles idle ( +- 11.48% )
131,260 minor-faults ( +- 0.03% ) [100.00%]
0 major-faults

5.424205564 seconds time elapsed ( +- 8.54% )

## 8 threads ##
### base ###
447,114,910,645 cycles ( +- 3.22% ) [100.00%]
350,993,124,470 instructions # 0.79 insns per cycle
# 2.29 stalled cycles per insn ( +- 0.01% ) [100.00%]
803,040,468,490 stalled-cycles-frontend # 179.60% frontend cycles idle ( +- 3.59% )
33,556,008 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

18.338648302 seconds time elapsed ( +- 3.20% )
### patched ###
136,437,995,805 cycles ( +- 0.87% ) [100.00%]
328,096,731,149 instructions # 2.40 insns per cycle
# 0.60 stalled cycles per insn ( +- 0.02% ) [100.00%]
197,237,317,698 stalled-cycles-frontend # 144.56% frontend cycles idle ( +- 1.19% )
1,338,884 minor-faults ( +- 3.11% ) [100.00%]
0 major-faults

6.330400940 seconds time elapsed ( +- 1.50% )

## 32 threads ##
### base ###
4,844,333,104,028 cycles ( +- 2.17% ) [100.00%]
1,401,233,609,645 instructions # 0.29 insns per cycle
# 6.61 stalled cycles per insn ( +- 0.02% ) [100.00%]
9,265,429,354,789 stalled-cycles-frontend # 191.26% frontend cycles idle ( +- 1.81% )
134,228,040 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

54.004043136 seconds time elapsed ( +- 2.03% )
### patched ###
512,226,985,443 cycles ( +- 2.21% ) [100.00%]
1,307,350,894,141 instructions # 2.55 insns per cycle
# 0.52 stalled cycles per insn ( +- 0.02% ) [100.00%]
677,259,628,341 stalled-cycles-frontend # 132.22% frontend cycles idle ( +- 1.56% )
4,820,979 minor-faults ( +- 3.88% ) [100.00%]
0 major-faults

10.774241948 seconds time elapsed ( +- 2.40% )
## 60 threads ##
### base ###
14,878,749,780,310 cycles ( +- 1.24% ) [100.00%]
2,633,166,794,060 instructions # 0.18 insns per cycle
# 10.75 stalled cycles per insn ( +- 0.01% ) [100.00%]
28,301,405,498,867 stalled-cycles-frontend # 190.21% frontend cycles idle ( +- 0.88% )
251,687,643 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

88.928810991 seconds time elapsed ( +- 1.44% )
### patched ###
1,156,172,977,262 cycles ( +- 3.24% ) [100.00%]
2,450,915,465,588 instructions # 2.12 insns per cycle
# 0.63 stalled cycles per insn ( +- 0.01% ) [100.00%]
1,534,996,614,819 stalled-cycles-frontend # 132.77% frontend cycles idle ( +- 4.37% )
9,239,706 minor-faults ( +- 1.66% ) [100.00%]
0 major-faults

16.571960471 seconds time elapsed ( +- 2.64% )
## 120 threads ##
### base ###
47,350,885,336,341 cycles ( +- 2.28% ) [100.00%]
5,282,921,722,672 instructions # 0.11 insns per cycle
# 8.96 stalled cycles per insn ( +- 0.04% ) [100.00%]
47,332,933,971,860 stalled-cycles-frontend # 99.96% frontend cycles idle ( +- 2.40% )
503,357,749 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

144.729413453 seconds time elapsed ( +- 1.75% )
### patched ###
3,791,855,572,557 cycles ( +- 2.15% ) [100.00%]
4,901,681,561,338 instructions # 1.29 insns per cycle
# 0.59 stalled cycles per insn ( +- 0.01% ) [100.00%]
2,880,236,854,539 stalled-cycles-frontend # 75.96% frontend cycles idle ( +- 3.69% )
18,080,707 minor-faults ( +- 1.22% ) [100.00%]
0 major-faults

28.875873029 seconds time elapsed ( +- 1.19% )

# usemem -t <threads> -R -f testfile -o 1g #

## 1 thread ##
### base ###
62,357,670,556 cycles ( +- 0.58% ) [100.00%]
11,612,293,315 instructions # 0.19 insns per cycle
# 10.31 stalled cycles per insn ( +- 0.03% ) [100.00%]
119,734,125,613 stalled-cycles-frontend # 192.01% frontend cycles idle ( +- 0.58% )
262,723 minor-faults ( +- 0.01% ) [100.00%]
0 major-faults

18.247095738 seconds time elapsed ( +- 0.58% )
### patched ###
51,421,735,331 cycles ( +- 11.53% ) [100.00%]
11,446,081,379 instructions # 0.22 insns per cycle
# 8.55 stalled cycles per insn ( +- 0.19% ) [100.00%]
97,900,718,100 stalled-cycles-frontend # 190.39% frontend cycles idle ( +- 12.09% )
8,629 minor-faults ( +- 1.20% ) [100.00%]
0 major-faults ( +-100.00% )

15.003616508 seconds time elapsed ( +- 11.56% )

## 8 threads ##
### base ###
395,785,250,840 cycles ( +- 0.60% ) [100.00%]
93,850,278,507 instructions # 0.24 insns per cycle
# 8.02 stalled cycles per insn ( +- 0.03% ) [100.00%]
752,951,585,171 stalled-cycles-frontend # 190.24% frontend cycles idle ( +- 0.61% )
2,097,988 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

15.226716773 seconds time elapsed ( +- 0.41% )
### patched ###
Performance counter stats for 'system wide' (5 runs):

369,032,514,042 cycles ( +- 0.42% ) [100.00%]
92,357,503,760 instructions # 0.25 insns per cycle
# 7.59 stalled cycles per insn ( +- 0.01% ) [100.00%]
700,555,940,601 stalled-cycles-frontend # 189.84% frontend cycles idle ( +- 0.44% )
89,186 minor-faults ( +- 1.88% ) [100.00%]
0 major-faults

14.239478275 seconds time elapsed ( +- 0.48% )

## 32 threads ##
### base ###
1,791,516,702,606 cycles ( +- 1.57% ) [100.00%]
369,936,392,907 instructions # 0.21 insns per cycle
# 9.07 stalled cycles per insn ( +- 0.07% ) [100.00%]
3,357,017,162,189 stalled-cycles-frontend # 187.38% frontend cycles idle ( +- 1.77% )
8,390,395 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

22.080582106 seconds time elapsed ( +- 9.50% )
### patched ###
1,517,335,667,819 cycles ( +- 0.63% ) [100.00%]
363,401,817,392 instructions # 0.24 insns per cycle
# 7.45 stalled cycles per insn ( +- 0.01% ) [100.00%]
2,705,627,050,641 stalled-cycles-frontend # 178.31% frontend cycles idle ( +- 1.15% )
269,287 minor-faults ( +- 0.16% ) [100.00%]
0 major-faults

16.797777162 seconds time elapsed ( +- 7.69% )

## 60 threads ##
### base ###
3,983,862,013,747 cycles ( +- 1.42% ) [100.00%]
693,310,458,766 instructions # 0.17 insns per cycle
# 10.75 stalled cycles per insn ( +- 0.01% ) [100.00%]
7,454,138,580,163 stalled-cycles-frontend # 187.11% frontend cycles idle ( +- 1.27% )
15,735,826 minor-faults ( +- 0.02% ) [100.00%]
1 major-faults ( +-100.00% )

25.759900660 seconds time elapsed ( +- 0.64% )
### patched ###
2,871,988,372,901 cycles ( +- 1.17% ) [100.00%]
681,452,765,745 instructions # 0.24 insns per cycle
# 7.61 stalled cycles per insn ( +- 0.03% ) [100.00%]
5,186,017,674,808 stalled-cycles-frontend # 180.57% frontend cycles idle ( +- 1.24% )
498,198 minor-faults ( +- 0.05% ) [100.00%]
0 major-faults

22.768611526 seconds time elapsed ( +- 3.13% )

## 120 threads ##
### base ###
14,891,491,866,813 cycles ( +- 0.83% ) [100.00%]
1,391,880,325,881 instructions # 0.09 insns per cycle
# 10.51 stalled cycles per insn ( +- 0.02% ) [100.00%]
14,629,621,106,104 stalled-cycles-frontend # 98.24% frontend cycles idle ( +- 1.04% )
31,468,057 minor-faults ( +- 0.00% ) [100.00%]
0 major-faults

44.443953873 seconds time elapsed ( +- 0.91% )
### patched ###
12,450,645,304,818 cycles ( +- 2.94% ) [100.00%]
1,367,989,003,638 instructions # 0.11 insns per cycle
# 8.98 stalled cycles per insn ( +- 0.04% ) [100.00%]
12,282,274,614,247 stalled-cycles-frontend # 98.65% frontend cycles idle ( +- 3.34% )
996,361 minor-faults ( +- 0.02% ) [100.00%]
0 major-faults

38.197095497 seconds time elapsed ( +- 2.

Kirill A. Shutemov (2):
mm: introduce vm_ops->fault_nonblock()
mm: implement ->fault_nonblock() for page cache

Documentation/filesystems/Locking | 8 ++++++++
fs/9p/vfs_file.c | 2 ++
fs/btrfs/file.c | 1 +
fs/cifs/file.c | 1 +
fs/ext4/file.c | 1 +
fs/f2fs/file.c | 1 +
fs/fuse/file.c | 1 +
fs/gfs2/file.c | 1 +
fs/nfs/file.c | 1 +
fs/nilfs2/file.c | 1 +
fs/ubifs/file.c | 1 +
fs/xfs/xfs_file.c | 1 +
include/linux/mm.h | 3 +++
mm/filemap.c | 35 +++++++++++++++++++++++++++++++++++
mm/memory.c | 38 +++++++++++++++++++++++++++++++++++++-
15 files changed, 95 insertions(+), 1 deletion(-)

--
1.9.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/