Re: [LSF/MM/BPF TOPIC] Machine Learning (ML) library in Linux kernel

From: Barry Song

Date: Mon Feb 09 2026 - 05:26:14 EST


On Sat, Feb 7, 2026 at 3:40 AM Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> wrote:
>
> Hello,
>
[...]
>
> The continuous learning model can be adopted during training phase.
> It implies that kernel subsystem can receive ML model recommendations
> even during training phase. ML model proxy on kernel side can estimate
> the current kernel subsystem state, tries to apply the ML model
> recommendations, and estimate the efficiency of applied recommendations.
> Generally speaking, ML model proxy on kernel side can consider several
> modes of interaction with ML model recommendations: (1) emergency mode,
> (2) learning mode, (3) collaboration mode, (4) recommendation mode.
> The emergency mode is the mode when kernel subsystem is in critical state
> and it is required to work as efficient as possible without capability of
> involving the ML model recommendations (for example, ML model
> recommendations are completely inadequate or load is very high).
> The learning mode implies that kernel subsystem can try to apply
> the ML model recommendations for some operations with the goal of
> estimation the maturity of ML model. Also, ML model proxy can degrade
> the mode to learning state if ML model recommendations becomes inefficient.
> The collaboration mode has the goal of using ML recommendations in
> 50% of operations with the goal of achieving mature state of ML model.
> And, finally, ML model proxy can convert kernel subsystem in recommendation
> mode if ML model is mature enough and efficiency of applying
> the ML recommendations is higher than using human-made algorithms.

Hi Slava,

Do we have any concrete examples where an ML-based proxy,
together with its userspace ML agent, has demonstrated
measurable performance improvements over well-designed,
human-crafted kernel algorithms?

Such examples could be in scheduling, filesystem I/O, or memory
reclamation and readahead. I think having a real, data-backed
example would be much more helpful for this discussion than
reasoning about an abstract framework without a concrete use
case.

Thanks,
Barry