Re: deprecated.rst: deprecated strcpy ? (was: [PATCH] checkpatch: add a new check for strcpy/strlcpy uses)

From: Joe Perches
Date: Thu Jan 07 2021 - 19:52:32 EST


On Thu, 2021-01-07 at 13:16 -0800, Kees Cook wrote:
> On Tue, Jan 05, 2021 at 01:28:18AM -0800, Joe Perches wrote:
> > On Tue, 2021-01-05 at 14:29 +0530, Dwaipayan Ray wrote:
> > > On Tue, Jan 5, 2021 at 2:14 PM Joe Perches <joe@xxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, 2021-01-05 at 13:53 +0530, Dwaipayan Ray wrote:
> > > > > strcpy() performs no bounds checking on the destination buffer.
> > > > > This could result in linear overflows beyond the end of the buffer.
> > > > >
> > > > > strlcpy() reads the entire source buffer first. This read
> > > > > may exceed the destination size limit. This can be both inefficient
> > > > > and lead to linear read overflows.
> > > > >
> > > > > The safe replacement to both of these is to use strscpy() instead.
> > > > > Add a new checkpatch warning which alerts the user on finding usage of
> > > > > strcpy() or strlcpy().
> > > >
> > > > I do not believe that strscpy is preferred over strcpy.
> > > >
> > > > When the size of the output buffer is known to be larger
> > > > than the input, strcpy is faster.
> > > >
> > > > There are about 2k uses of strcpy.
> > > > Is there a use where strcpy use actually matters?
> > > > I don't know offhand...
> > > >
> > > > But I believe compilers do not optimize away the uses of strscpy
> > > > to a simple memcpy like they do for strcpy with a const from
> > > >
> > > >         strcpy(foo, "bar");
> > > >
> > >
> > > Yes the optimization here definitely helps. So in case the programmer
> > > knows that the destination buffer is always larger, then strcpy() should be
> > > preferred? I think the documentation might have been too strict about
> > > strcpy() uses here:
> > >
> > > Documentation/process/deprecated.rst:
> > > "strcpy() performs no bounds checking on the destination buffer. This
> > > could result in linear overflows beyond the end of the buffer, leading to
> > > all kinds of misbehaviors. While `CONFIG_FORTIFY_SOURCE=y` and various
> > > compiler flags help reduce the risk of using this function, there is
> > > no good reason to add new uses of this function. The safe replacement
> > > is strscpy(),..."
> >
> > Kees/Jonathan:
> >
> > Perhaps this text is overly restrictive.
> >
> > There are ~2k uses of strcpy in the kernel.
> >
> > About half of these are where the buffer length of foo is known and the
> > use is 'strcpy(foo, "bar")' so the compiler converts/optimizes away the
> > strcpy to memcpy and may not even put "bar" into the string table.
> >
> > I believe strscpy uses do not have this optimization.
> >
> > Is there a case where the runtime costs actually matters?
> > I expect so.
>
> The original goal was to use another helper that worked on static
> strings like this. Linus rejected that idea, so we're in a weird place.
> I think we could perhaps build a strcpy() replacement that requires
> compile-time validated arguments, and to break the build if not.
>
> i.e.
>
> given:
> char array[8];
> char *ptr;
>
> allow:
>
>
> strcpy(array, "1234567");
>
> disallow:
>
> strcpy(array, "12345678"); /* too long */
> strcpy(array, src); /* not optimized, so use strscpy? */
> strcpy(ptr, "1234567"); /* unknown destination size */
> strcpy(ptr, src); /* unknown destination size */

I think that's not a good idea as it's not a generic equivalent of the
string.h code.

I still like the stracpy variant I proposed:

https://lore.kernel.org/lkml/24bb53c57767c1c2a8f266c305a670f7@xxxxxxx/T/#m0627aa770a076af1937cb5c610ed71dab3f1da72
https://lore.kernel.org/lkml/CAHk-=wgqQKoAnhmhGE-2PBFt7oQs9LLAATKbYa573UO=DPBE0Q@xxxxxxxxxxxxxx/

Linus liked a variant he called copy_string:

https://lore.kernel.org/lkml/CAHk-=wg8vLmmwTGhXM51NpSWJW8RFEAKoXxG0Hu_Q9Uwbjj8kw@xxxxxxxxxxxxxx/

I think the cocci scripts that convert:

strlcpy -> strscpy (only when return value unused)
str<sln>cpy(array, "string") -> stracpy(foo, "string")
s[cn]printf -> sysfs_emit

would leave relatively few uses of strcpy and sprintf variants and would
make it much easier to analyze the remainder uses for potential overflows.