Re: [PATCH v3] printk: Have printk() never buffer its data

From: Joe Perches
Date: Mon Jul 09 2012 - 19:32:19 EST


On Tue, 2012-07-10 at 00:40 +0200, Kay Sievers wrote:
> On Tue, Jul 10, 2012 at 12:29 AM, Joe Perches <joe@xxxxxxxxxxx> wrote:
> > On Tue, 2012-07-10 at 00:10 +0200, Kay Sievers wrote:
> >> On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches <joe@xxxxxxxxxxx> wrote:
> >> > On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
> >> >
> >> >> At the same time the CPU#2 prints the same warning with a continuation
> >> >> line, but the buffer from CPU#1 can not be flushed to the console, nor
> >> >> can the continuation line printk()s from CPU#2 be merged at this point.
> >> >> The consoles are still locked and busy with replaying the old log
> >> >> messages, so the new continuation data is just stored away in the record
> >> >> buffer as it is coming in.
> >> >> If the console would be registered a bit earlier, or the warning would
> >> >> happen a bit later, we would probably not see any of this.
> >> >>
> >> >> I can fake something like this just by holding the console semaphore
> >> >> over a longer time and printing continuation lines with different CPUs
> >> >> in a row.
> >> >>
> >> >> The patch below seems to work for me. It is also here:
> >> >> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> >> >>
> >> >> It only applies cleanly on top of this patch:
> >> >> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
> >> >>
> >> >
> >> > Hi Kay.
> >> >
> >> > I just ran a test with what's in Greg's driver-core -for-linus branch.
> >> >
> >> > One of the differences in dmesg is timestamping of consecutive
> >> > pr_<level>("foo...)
> >> > followed directly by
> >> > pr_cont("bar...")
> >> >
> >> > For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
> >> >
> >> > # grep MAP /var/log/dm* -A1
> >> > dmesg:[ 0.781687] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
> >> > dmesg-[ 0.781707] ata2: port disabled--ignoring
> >> > --
> >> > dmesg.0:[ 0.948881] ata_piix 0000:00:1f.2: MAP [
> >> > dmesg.0-[ 0.948883] P0 P2 P1 P3 ]
> >> >
> >> > These messages originate starting at
> >> > drivers/ata/ata_piix.c:1354
> >> >
> >> > All the continuations are emitted with pr_cont.
> >> >
> >> > I think this output should still be coalesced without
> >> > timestamp deltas. Perhaps the timestamping code can
> >> > still be reworked to avoid too small a delta producing
> >> > a new timestamp and another dmesg line.
> >>
> >> Hmm, I don't see that.
> >>
> >> If I do:
> >> pr_info("[");
> >> for (i = 0; i < 4; i++)
> >> pr_cont("%i ", i);
> >> pr_cont("]\n");
> >>
> >> I get:
> >> 6,173,0;[0 1 2 3 ]
> >>
> >> And if I fill the cont buffer and forcefully hold the console sem
> >> during all that, and we can't merge anymore, I get:
> >> 6,167,0;[
> >> 4,168,0;0
> >> 4,169,0;1
> >> 4,170,0;2
> >> 4,171,0;3
> >> 4,172,0;]
> >>
> >> But the output is still all fine for both lines:
> >> [ 0.000000] [0 1 2 3 ]
> >> [ 0.000000] [0 1 2 3 ]
> >>
> >> What do I miss?
> >
> > In this case the initial line is dev_info not pr_info
> > so there are the additional dict descriptors output to
> > /dev/kmsg as well.
> >
> > Maybe that interferes with continuations. Dunno.
>
> Yes, it does. Annotated records dev_printk() must be reliable in the
> data storage and for the consumers. We can not expose them to the racy
> continuation printk()s. We need to be able to trust the data they
> print and not possibly merge unrelated things into it.
>
> If it's needed, we can try to set the flags accordingly, that they
> *look* like a line in the classic byte-stream output, but the
> interface in /dev/kmsg must not export them that way, because
> continuation lines can never be trusted to be correct.

Then you've changed semantics and I think you need to
fix it.

A dev_<level> call is not guaranteed to be a complete
message.

There are dev_<level> and netdev_<level> calls
followed by pr_cont.

Maybe these could be fixed up and then they could be
always integral. There don't look to be too many.

This may be most (all?) of them:

drivers/ata/ata_piix.c | 16 +++++++++-----
drivers/media/rc/redrat3.c | 36 ++++++++++++++++-----------------
drivers/net/ethernet/broadcom/bnx2.c | 26 ++++++++++++++----------
3 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 3c809bf..f51962f 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -1346,38 +1346,42 @@ static const int *__devinit piix_init_sata_map(struct pci_dev *pdev,
const int *map;
int i, invalid_map = 0;
u8 map_value;
+ char maps[50] = {0};
+ int len = 0;

pci_read_config_byte(pdev, ICH5_PMR, &map_value);

map = map_db->map[map_value & map_db->mask];

- dev_info(&pdev->dev, "MAP [");
for (i = 0; i < 4; i++) {
switch (map[i]) {
case RV:
invalid_map = 1;
- pr_cont(" XX");
+ len += snprintf(maps + len, sizeof(maps) - len, " XX");
break;

case NA:
- pr_cont(" --");
+ len += snprintf(maps + len, sizeof(maps) - len, " --");
break;

case IDE:
WARN_ON((i & 1) || map[i + 1] != IDE);
pinfo[i / 2] = piix_port_info[ich_pata_100];
i++;
- pr_cont(" IDE IDE");
+ len += snprintf(maps + len, sizeof(maps) - len,
+ " IDE IDE");
break;

default:
- pr_cont(" P%d", map[i]);
+ len += snprintf(maps + len, sizeof(maps) - len,
+ " P%d", map[i]);
if (i & 1)
pinfo[i / 2].flags |= ATA_FLAG_SLAVE_POSS;
break;
}
}
- pr_cont(" ]\n");
+
+ dev_info(&pdev->dev, "MAP [%s ]\n", maps);

if (invalid_map)
dev_err(&pdev->dev, "invalid MAP value %u\n", map_value);
diff --git a/drivers/media/rc/redrat3.c b/drivers/media/rc/redrat3.c
index 2878b0e..636a245 100644
--- a/drivers/media/rc/redrat3.c
+++ b/drivers/media/rc/redrat3.c
@@ -296,41 +296,36 @@ static void redrat3_issue_async(struct redrat3_dev *rr3)

static void redrat3_dump_fw_error(struct redrat3_dev *rr3, int code)
{
- if (!rr3->transmitting && (code != 0x40))
- dev_info(rr3->dev, "fw error code 0x%02x: ", code);
+ const char *msg = "";

switch (code) {
case 0x00:
- pr_cont("No Error\n");
+ msg = "No Error";
break;

/* Codes 0x20 through 0x2f are IR Firmware Errors */
case 0x20:
- pr_cont("Initial signal pulse not long enough "
- "to measure carrier frequency\n");
+ msg = "Initial signal pulse not long enough to measure carrier frequency";
break;
case 0x21:
- pr_cont("Not enough length values allocated for signal\n");
+ msg = "Not enough length values allocated for signal";
break;
case 0x22:
- pr_cont("Not enough memory allocated for signal data\n");
+ msg = "Not enough memory allocated for signal data";
break;
case 0x23:
- pr_cont("Too many signal repeats\n");
+ msg = "Too many signal repeats";
break;
case 0x28:
- pr_cont("Insufficient memory available for IR signal "
- "data memory allocation\n");
+ msg = "Insufficient memory available for IR signal data memory allocation";
break;
case 0x29:
- pr_cont("Insufficient memory available "
- "for IrDa signal data memory allocation\n");
+ msg = "Insufficient memory available for IrDa signal data memory allocation";
break;

/* Codes 0x30 through 0x3f are USB Firmware Errors */
case 0x30:
- pr_cont("Insufficient memory available for bulk "
- "transfer structure\n");
+ msg = "Insufficient memory available for bulk transfer structure";
break;

/*
@@ -339,20 +334,23 @@ static void redrat3_dump_fw_error(struct redrat3_dev *rr3, int code)
*/
case 0x40:
if (!rr3->transmitting)
- pr_cont("Signal capture has been terminated\n");
+ msg = "Signal capture has been terminated";
break;
case 0x41:
- pr_cont("Attempt to set/get and unknown signal I/O "
- "algorithm parameter\n");
+ msg = "Attempt to set/get and unknown signal I/O algorithm parameter";
break;
case 0x42:
- pr_cont("Signal capture already started\n");
+ msg = "Signal capture already started";
break;

default:
- pr_cont("Unknown Error\n");
+ msg = "Unknown Error";
break;
}
+
+ if (!rr3->transmitting && (code != 0x40))
+ dev_info(rr3->dev, "fw error code 0x%02x: %s\n", code, msg);
+
}

static u32 redrat3_val_to_mod_freq(struct redrat3_signal_header *ph)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 1901da1..f5dd19f 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -987,24 +987,28 @@ static void
bnx2_report_link(struct bnx2 *bp)
{
if (bp->link_up) {
+ const char *fc_type = "";
+ const char *fc_on = "";
netif_carrier_on(bp->dev);
- netdev_info(bp->dev, "NIC %s Link is Up, %d Mbps %s duplex",
- bnx2_xceiver_str(bp),
- bp->line_speed,
- bp->duplex == DUPLEX_FULL ? "full" : "half");

if (bp->flow_ctrl) {
if (bp->flow_ctrl & FLOW_CTRL_RX) {
- pr_cont(", receive ");
if (bp->flow_ctrl & FLOW_CTRL_TX)
- pr_cont("& transmit ");
- }
- else {
- pr_cont(", transmit ");
+ fc_type = ", receive & transmit";
+ else
+ fc_type = ", receive";
+
+ } else {
+ fc_type = ", transmit";
}
- pr_cont("flow control ON");
+ fc_on = " flow control ON";
}
- pr_cont("\n");
+ netdev_info(bp->dev, "NIC %s Link is Up, %d Mbps %s duplex%s%s\n",
+ bnx2_xceiver_str(bp),
+ bp->line_speed,
+ bp->duplex == DUPLEX_FULL ? "full" : "half",
+ fc_type, fc_on);
+
} else {
netif_carrier_off(bp->dev);
netdev_err(bp->dev, "NIC %s Link is Down\n",


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/