Re: [PATCH REPOST 0/8] kernel/fork: Move thread stack free otu of the scheduler path.

From: Sebastian Andrzej Siewior
Date: Tue Feb 08 2022 - 12:10:27 EST


On 2022-01-25 16:26:44 [+0100], To linux-kernel@xxxxxxxxxxxxxxx wrote:
> [ This is a repost of https://lkml.kernel.org/r/20211118143452.136421-1-bigeasy@xxxxxxxxxxxxx ]
>
> This is a follup-up on the patch
> sched: Delay task stack freeing on RT
> https://lkml.kernel.org/r/20210928122411.593486363@xxxxxxxxxxxxx
>
> It addresses the review feedback:
> - Decouple stack accounting from its free invocation. The accounting
> happens in do_exit(), the final free call happens later.
>
> - Add put_task_stack_sched() to finish_task_switch(). Here the VMAP
> stack is cached only. If it fails, or in the !VMAP case then the final
> free happens in delayed_put_task_struct(). This is also an oportunity
> to cache the stack.
>
> >From testing I observe the following:
>
> | bash-1715 [006] ..... 124.901510: copy_process: allocC ffffc90007e70000
> | sh-cmds.sh-1746 [007] ..... 124.907389: copy_process: allocC ffffc90007dc4000
> | <idle>-0 [019] ...1. 124.918126: free_thread_stack: cache ffffc90007dc4000
> | sh-cmds.sh-1746 [007] ..... 124.918279: copy_process: allocC ffffc90007de8000
> | <idle>-0 [004] ...1. 124.920121: free_thread_stack: delay ffffc90007de8001
> | <idle>-0 [007] ...1. 124.920299: free_thread_stack: cache ffffc90007e70000
> | <idle>-0 [007] ..s1. 124.945433: free_thread_stack: cache ffffc90007de8000
>
> TS 124.901510, bash started sh-cmds.sh, obtained stack from cache.
> TS 124.907389, script invokes its first command, obtained stacak from
> cache. As you can see bash was running on CPU6 but its child was moved
> CPU7.
> TS 124.918126, the first command is done, stack is ached on CPU19.
> TS 124.918279, script's second command, ache from stack.
> TS 124.920121, the command is done. The stack cache on CPU4 is full.
> TS 124.920299, the script is done, caches stack on CPU7.
> TS 124.945433, the RCU-callback of last command is now happening. On
> CPU7, which is where the command was invoked (but not running). Instead
> of freeing the stack, it was cached since CPU7 had an empty slot.
>
> If I pin the script to CPU5 and run it with multiple commands then it
> works as expected:
>
> | bash-1799 [005] ..... 993.608131: copy_process: allocC ffffc90007fa0000
> | sh-cmds.sh-1827 [005] ..... 993.608888: copy_process: allocC ffffc90007fa8000
> | sh-cmds.sh-1827 [005] ..... 993.610734: copy_process: allocV ffffc90007ff4000
> | sh-cmds.sh-1829 [005] ...1. 993.610757: free_thread_stack: cache ffffc90007fa8000
> | sh-cmds.sh-1827 [005] ..... 993.612401: copy_process: allocC ffffc90007fa8000
> | <...>-1830 [005] ...1. 993.612416: free_thread_stack: cache ffffc90007ff4000
> | sh-cmds.sh-1827 [005] ..... 993.613707: copy_process: allocC ffffc90007ff4000
> | sh-cmds.sh-1831 [005] ...1. 993.613723: free_thread_stack: cache ffffc90007fa8000
> | sh-cmds.sh-1827 [005] ..... 993.615024: copy_process: allocC ffffc90007fa8000
> | <...>-1832 [005] ...1. 993.615040: free_thread_stack: cache ffffc90007ff4000
> | sh-cmds.sh-1827 [005] ..... 993.616380: copy_process: allocC ffffc90007ff4000
> | <...>-1833 [005] ...1. 993.616397: free_thread_stack: cache ffffc90007fa8000
> | bash-1799 [005] ...1. 993.617759: free_thread_stack: cache ffffc90007fa0000
> | <idle>-0 [005] ...1. 993.617871: free_thread_stack: delay ffffc90007ff4001
> | <idle>-0 [005] ..s1. 993.638311: free_thread_stack: free ffffc90007ff4000
>
> and no new is allocated during its runtime and a cached stack is used.

ping

Sebastian