Re: [PATCH] checkpatch: Check for quoted strings broken across lines

From: Josh Triplett
Date: Thu Feb 02 2012 - 20:31:17 EST


On Thu, Feb 02, 2012 at 05:08:53PM -0500, Nick Bowler wrote:
> On 2012-02-02 13:16 -0800, Josh Triplett wrote:
> > On Thu, Feb 02, 2012 at 03:22:07PM -0500, Nick Bowler wrote:
> > > On 2012-02-02 12:06 -0800, Josh Triplett wrote:
> > > > Documentation/CodingStyle recommends not splitting quoted strings across
> > > > lines, because it breaks the ability to grep for the string. checkpatch
> > > > already makes an exception to the 80-column rule for quoted strings to
> > > > allow this. Rather than just allowing it, actively warn about quoted
> > > > strings split across lines.
> > > [...]
> > > > +# Check for strings broken across lines (breaks greppability). Make an
> > > > +# exception when the previous string ends in a newline (multiple lines in one
> > > > +# string constant) or \n\t (common in inline assembly to indent the instruction
> > > > +# on the following line).
> > >
> > > There are tons of strings in the kernel that this makes checkpatch warn
> > > about where it probably shouldn't. For example, this one (from
> > > kernel/auditsc.c:1476):
> > >
> > > audit_log_format(ab,
> > > "oflag=0x%x mode=%#ho mq_flags=0x%lx mq_maxmsg=%ld "
> > > "mq_msgsize=%ld mq_curmsgs=%ld",
> > >
> > > WARNING: quoted string split across lines
> > > #1478: FILE: auditsc.c:1478:
> > > + "mq_msgsize=%ld mq_curmsgs=%ld",
> > >
> > > Breaking "greppability" of this string is a non-issue, because this sort
> > > of string is not really greppable to begin with (and would certainly not
> > > be any easier to grep for if it were all on one line).
> >
> > While I agree that in that particular case (heavy on the %formats and
> > light on the text) you can't easily grep to begin with, the guideline
> > from CodingStyle still applies,
>
> The guideline from CodingStyle talks about "user-visible strings". I
> don't think this string counts, because it contains code that is not
> displayed to the user (hence, the string is not user-visible). That is
> the reason why it's not greppable in the first place.
>
> There are hundreds, if not thousands of similar strings in the kernel.

OK, with a combination of arcane grep/sed commands and hand checking, I
just scanned the *entire* kernel for quoted strings split across lines,
and filtered out all user-visible strings, leaving only the split
non-user-visible strings.

An excerpt from that pile of ugly, suggesting just how many ways kernel
code has to spell "user-visible":

\(print[a-z_]*\|pr_[a-z]*\|pr_[a-z]*_ratelimited\|pr_debug[0-9]\|pr_info_once\|dev_[a-z]*\|msg\|warn\|warn_once\|dbg\|debug\|debugp\|err\|error\|errdev\|errx\|warning\|DBG[A-Z0-9_]*\|DEBUG[A-Z_]*\|trace[a-z0-9_]*\|LOG_KMS\|exception\|FAIL\|fatal\|assert\|panic_vm\|trace\|panic\|puts\|MODULE_[A-Z_]*\|asm\|__asm__\|volatile\|__volatile__\|message\|log\|info\|DAC960_[A-Za-z_]*\|notify\|ufsd\|crit\|debugf[0-9]\|DP\|die\|fyi\|notice\|dout\|snoop\|throtl_log_tg\|ERROR_INODE\|RFALSE\|DMESGE\|SAY\|JOM\|SAM\|JOT\|DBF_EVENT\|DE_ACT\|s390_handle_damage\|CIO_MSG_EVENT\|CIO_CRW_EVENT\|DBF_EVENT_DEVID\|fat_fs_error_ratelimit\|mark_tsc_unstable\|ata_ehi_push_desc\|DEB[0-9]\|IUCV_DBF_TEXT_\|LOG_PARSE\|esp_log_[a-z]*\|check_warn_return\|D_ISR\|D_CALIB\|D_RATE\|D_TX_REPLY\|D_TXPOWER\|D_EEPROM\|D_POWER\|D_SCAN\|D_ASSOC\|D_HC\|D_HC_DUMP\|IP_VS_ERR_BUF\|DMWARN_LIMIT\|ide_dump_status\|ppfinit\|mca_recovered\|die_if_kernel\|die_if_no_fixup\|gdbstub_proto\|ite_pr\|nvt_pr\|deb_i2c\|deb_irq\|deb_ts\|deb_rc\|mxl_i2c\|MLX4_EN_PARM_INT\|DI\|d_fnstart\|d_fnend\|lbs_deb_[a-z_]*\|LPFC_[A-Z_]*\|stop\|mconsole_reply_v0\|CH_ALERT\|D1\|ADD\|lbtf_deb_usbd\|IWL_DEBUG_MAC80211\|WL_CONN\|CX18_INFO_DEV\|CX18_ERR_DEV\|OPT_BOOLEAN\|asc_prt_line\|gdbstub_msg_write\|DCCP_BUG\|fappend\)

I found that a single change to my checkpatch test (only checking
strings passed as function arguments) limits the false positives in the
entire kernel to a grand total of 12:

arch/powerpc/platforms/iseries/mf.c: memcpy(src, "\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00"
arch/um/os-Linux/process.c: if (sscanf(buf, "%*d " COMM_SCANF " %*c %*d %*d %*d %*d %*d %*d %*d "
arch/x86/platform/olpc/olpc_dt.c: olpc_dt_interpret("\" /battery@0\" find-device"
arch/x86/platform/olpc/olpc_dt.c: olpc_dt_interpret("\" /pci/display@1\" find-device"
arch/x86/platform/olpc/olpc_dt.c: olpc_dt_interpret("\" /pci/display@1,1\" find-device"
drivers/block/rbd.c: "%" __stringify(RBD_MAX_OBJ_NAME_LEN) "s"
drivers/pcmcia/ds.c: if (add_uevent_var(env, "MODALIAS=pcmcia:m%04Xc%04Xf%02Xfn%02Xpfn%02X"
drivers/tty/tty_audit.c: audit_log_format(ab, "%s pid=%u uid=%u auid=%u ses=%u "
fs/ecryptfs/crypto.c:static unsigned char *portable_filename_chars = ("-.0123456789ABCD"
kernel/auditsc.c: audit_log_format(ab, " inode=%lu"
kernel/auditsc.c: audit_log_format(ab, "login pid=%d uid=%u "
security/selinux/ss/services.c: audit_log_format(ab, "op=security_compute_av reason=%s "

Not "hundreds, if not thousands", but 12, including 4 calls to
audit_log_format. :)

Meanwhile, this checkpatch test correctly flags several thousand
instances of user-visible strings that shouldn't get split. That seems
like a pretty reasonable ratio to me; what do you think?

I'll send PATCHv2, which limits the check to function parameters,
momentarily.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/