Re: + pid-delete-reserved_pids.patch added to -mm tree

From: Alexey Dobriyan
Date: Wed Oct 04 2017 - 16:12:55 EST


On Wed, Oct 04, 2017 at 06:36:31PM +0200, Oleg Nesterov wrote:
> On 10/04, Alexey Dobriyan wrote:
> >
> > On Tue, Oct 03, 2017 at 05:53:15PM +0200, Oleg Nesterov wrote:
> > > On 10/02, Andrew Morton wrote:
> > > >
> > > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > > Subject: pid: delete RESERVED_PIDS
> > > >
> > > > RESERVED_PIDS had a noble goal: to protect root from PID exhaustion since
> > > > at least ~2.5.40
> > >
> > > I am just curious, where did you find the change which documents this goal?
> >
> > Now that you asked, I'm not exactly sure. :-( Please don't tell it is for some
> > kind of stupid userspace which assumed low numbers are kernel threads.
>
> Not necessarily kernel threads,
>
> > > > Allow small pids to be allocated after rollover, there is nothing sacred
> > > > about them.
> > > >
> > > > Resource exhaustion should be handled by rlimits and/or kernel memory
> > > > accounting.
> > >
> > > I won't argue, but I always thought that the only purpose of RESERVED_PIDS
> > > is to make the system/kernek daemons started at boot time more "visible" in
> > > /usr/bin/ps output.
> >
> > They will be first in line naturally: kthreadd + init execute first and
> > rarely exit.
>
> Exactly.
>
> But, with your patch, only until ->last_pid overlaps.

I think that's OK. Here is how beginning of the process tree looks from here.
Pids below 300 aren't special (anymore):

PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
3 ? S 0:00 \_ [ksoftirqd/0]
5 ? S< 0:00 \_ [kworker/0:0H]
7 ? S 0:02 \_ [rcu_preempt]
8 ? S 0:00 \_ [rcu_sched]
9 ? S 0:00 \_ [rcu_bh]
10 ? S 0:00 \_ [migration/0]
11 ? S< 0:00 \_ [lru-add-drain]
12 ? S 0:00 \_ [cpuhp/0]
13 ? S 0:00 \_ [cpuhp/1]
14 ? S 0:00 \_ [migration/1]
15 ? S 0:00 \_ [ksoftirqd/1]
17 ? S< 0:00 \_ [kworker/1:0H]
18 ? S 0:00 \_ [cpuhp/2]
19 ? S 0:00 \_ [migration/2]
20 ? S 0:00 \_ [ksoftirqd/2]
22 ? S< 0:00 \_ [kworker/2:0H]
23 ? S 0:00 \_ [cpuhp/3]
24 ? S 0:00 \_ [migration/3]
25 ? S 0:00 \_ [ksoftirqd/3]
27 ? S< 0:00 \_ [kworker/3:0H]
28 ? S 0:00 \_ [cpuhp/4]
29 ? S 0:00 \_ [migration/4]
30 ? S 0:00 \_ [ksoftirqd/4]
32 ? S< 0:00 \_ [kworker/4:0H]
33 ? S 0:00 \_ [cpuhp/5]
34 ? S 0:00 \_ [migration/5]
35 ? S 0:00 \_ [ksoftirqd/5]
37 ? S< 0:00 \_ [kworker/5:0H]
38 ? S 0:00 \_ [cpuhp/6]
39 ? S 0:00 \_ [migration/6]
40 ? S 0:00 \_ [ksoftirqd/6]
41 ? S 0:00 \_ [kworker/6:0]
42 ? S< 0:00 \_ [kworker/6:0H]
43 ? S 0:00 \_ [cpuhp/7]
44 ? S 0:00 \_ [migration/7]
45 ? S 0:00 \_ [ksoftirqd/7]
47 ? S< 0:00 \_ [kworker/7:0H]
48 ? S 0:00 \_ [cpuhp/8]
49 ? S 0:00 \_ [migration/8]
50 ? S 0:00 \_ [ksoftirqd/8]
52 ? S< 0:00 \_ [kworker/8:0H]
53 ? S 0:00 \_ [cpuhp/9]
54 ? S 0:00 \_ [migration/9]
55 ? S 0:00 \_ [ksoftirqd/9]
57 ? S< 0:00 \_ [kworker/9:0H]
58 ? S 0:00 \_ [cpuhp/10]
59 ? S 0:00 \_ [migration/10]
60 ? S 0:00 \_ [ksoftirqd/10]
61 ? S 0:00 \_ [kworker/10:0]
62 ? S< 0:00 \_ [kworker/10:0H]
63 ? S 0:00 \_ [cpuhp/11]
64 ? S 0:00 \_ [migration/11]
65 ? S 0:00 \_ [ksoftirqd/11]
67 ? S< 0:00 \_ [kworker/11:0H]
68 ? S 0:00 \_ [cpuhp/12]
69 ? S 0:00 \_ [migration/12]
70 ? S 0:00 \_ [ksoftirqd/12]
72 ? S< 0:00 \_ [kworker/12:0H]
73 ? S 0:00 \_ [cpuhp/13]
74 ? S 0:00 \_ [migration/13]
75 ? S 0:00 \_ [ksoftirqd/13]
76 ? S 0:00 \_ [kworker/13:0]
77 ? S< 0:00 \_ [kworker/13:0H]
78 ? S 0:00 \_ [cpuhp/14]
79 ? S 0:00 \_ [migration/14]
80 ? S 0:00 \_ [ksoftirqd/14]
82 ? S< 0:00 \_ [kworker/14:0H]
83 ? S 0:00 \_ [cpuhp/15]
84 ? S 0:00 \_ [migration/15]
85 ? S 0:00 \_ [ksoftirqd/15]
87 ? S< 0:00 \_ [kworker/15:0H]
88 ? S 0:00 \_ [kdevtmpfs]
89 ? S< 0:00 \_ [netns]
90 ? S 0:00 \_ [oom_reaper]
91 ? S< 0:00 \_ [writeback]
92 ? S 0:00 \_ [kcompactd0]
93 ? SN 0:00 \_ [khugepaged]
94 ? S< 0:00 \_ [crypto]
95 ? S< 0:00 \_ [bioset]
96 ? S< 0:00 \_ [kblockd]
98 ? S 0:00 \_ [kworker/1:1]
99 ? S< 0:00 \_ [edac-poller]
100 ? S 0:00 \_ [kworker/2:1]
101 ? S 0:00 \_ [kswapd0]
102 ? S< 0:00 \_ [vmstat]
112 ? S 0:00 \_ [kworker/u32:1]
143 ? S 0:00 \_ [kworker/3:1]
146 ? S 0:00 \_ [kworker/6:1]
147 ? S 0:00 \_ [kworker/7:1]
150 ? S 0:00 \_ [kworker/10:1]
151 ? S 0:00 \_ [kworker/11:1]
153 ? S 0:00 \_ [kworker/13:1]
156 ? S< 0:00 \_ [acpi_thermal_pm]
157 ? S< 0:00 \_ [bioset]
158 ? S< 0:00 \_ [bioset]
159 ? S< 0:00 \_ [bioset]
160 ? S< 0:00 \_ [bioset]
161 ? S< 0:00 \_ [bioset]
162 ? S< 0:00 \_ [bioset]
163 ? S< 0:00 \_ [bioset]
164 ? S< 0:00 \_ [bioset]
165 ? S 0:00 \_ [scsi_eh_0]
166 ? S< 0:00 \_ [scsi_tmf_0]
167 ? S 0:00 \_ [scsi_eh_1]
168 ? S< 0:00 \_ [scsi_tmf_1]
169 ? S 0:00 \_ [scsi_eh_2]
170 ? S< 0:00 \_ [scsi_tmf_2]
171 ? S 0:00 \_ [scsi_eh_3]
172 ? S< 0:00 \_ [scsi_tmf_3]
173 ? S 0:00 \_ [scsi_eh_4]
174 ? S< 0:00 \_ [scsi_tmf_4]
175 ? S 0:00 \_ [scsi_eh_5]
176 ? S< 0:00 \_ [scsi_tmf_5]
182 ? S< 0:00 \_ [bioset]
184 ? S< 0:00 \_ [bioset]
185 ? S 0:00 \_ [kworker/1:2]
186 ? S< 0:00 \_ [bioset]
194 ? S 0:00 \_ [kworker/4:2]
198 ? S< 0:00 \_ [kworker/0:1H]
199 ? S 0:00 \_ [jbd2/sda4-8]
200 ? S< 0:00 \_ [ext4-rsv-conver]
201 ? S 0:00 \_ [scsi_eh_6]
202 ? S< 0:00 \_ [scsi_tmf_6]
203 ? S 0:00 \_ [usb-storage]
227 ? S< 0:00 \_ [bioset]
361 ? S< 0:00 \_ [kworker/9:1H]
441 ? S< 0:00 \_ [kworker/6:1H]
716 ? S< 0:00 \_ [kworker/12:1H]
732 ? S< 0:00 \_ [kworker/10:1H]
759 ? S< 0:00 \_ [kworker/5:1H]
760 ? S< 0:00 \_ [kworker/3:1H]
770 ? S 0:00 \_ [kworker/2:2]
874 ? S 0:00 \_ [jbd2/sda2-8]
875 ? S< 0:00 \_ [ext4-rsv-conver]
903 ? S 0:00 \_ [kworker/15:2]
958 ? S 0:00 \_ [kworker/11:2]
980 ? S< 0:00 \_ [kworker/14:1H]
1313 ? S 0:00 \_ [kworker/3:2]
1486 ? S 0:00 \_ [kworker/0:2]
1514 ? S< 0:00 \_ [kworker/11:1H]
1523 ? S< 0:00 \_ [kworker/1:1H]
1676 ? S< 0:00 \_ [kworker/2:1H]
1703 ? S< 0:00 \_ [kworker/7:1H]
1780 ? S< 0:00 \_ [kworker/4:1H]
2222 ? S< 0:00 \_ [kworker/8:1H]
2286 ? S 0:00 \_ [kworker/7:2]
2320 ? S 0:00 \_ [kworker/15:0]
2337 ? S 0:00 \_ [kworker/14:0]
2343 ? S 0:00 \_ [kworker/14:3]
2613 ? S< 0:00 \_ [kworker/13:1H]
2635 ? S 0:00 \_ [kworker/5:2]
2806 ? S 0:00 \_ [kworker/u32:0]
2995 ? S< 0:00 \_ [kworker/15:1H]
3238 ? S 0:00 \_ [kworker/12:0]
3336 ? S 0:01 \_ [kworker/0:0]
4424 ? S 0:00 \_ [kworker/4:1]
4477 ? S 0:00 \_ [kworker/8:2]
5163 ? S 0:00 \_ [kworker/5:0]
5571 ? S 0:00 \_ [kworker/9:0]
5573 ? S 0:00 \_ [kworker/9:2]
5664 ? S 0:00 \_ [kworker/8:1]
5743 ? S 0:00 \_ [kworker/12:2]
5843 ? S 0:00 \_ [kworker/5:1]
5845 ? S 0:00 \_ [kworker/7:0]
5925 ? S 0:00 \_ [irq/43-nvidia]
5926 ? S 0:00 \_ [nvidia]
5958 ? S 0:00 \_ [kworker/0:1]
5959 ? S 0:00 \_ [kworker/0:3]
5960 ? S 0:00 \_ [kworker/0:4]
5961 ? S 0:00 \_ [kworker/0:5]
5962 ? S 0:00 \_ [kworker/4:0]
5963 ? S 0:00 \_ [kworker/4:3]
5964 ? S 0:00 \_ [kworker/4:4]
5965 ? S 0:00 \_ [kworker/4:5]
5966 ? S 0:00 \_ [kworker/4:6]
5967 ? S 0:00 \_ [kworker/4:7]
6122 ? S 0:00 \_ [kworker/12:1]
1 ? Ss 0:01 init [3]
543 ? Ss 0:00 /sbin/udevd --daemon
...

> And while I don't think this can break something, I bet humans will notice
> this change ;)
>
> And in fact, from time to time I thought that perhaps it makes sense to change
> alloc_pidmap() to check PF_KTHREAD and allocate the new pid from RESERVED_PIDS
> interval if it is set.
>
> So I am not sure this is change is really good but I won't argue.