Re: [PATCH v2 01/11] pstore/blk: new support logger for block devices

From: WeiXiong Liao
Date: Thu Mar 19 2020 - 21:50:39 EST


Hi Kees Cook,

On 2020/3/19 AM 1:23, Kees Cook wrote:
> On Thu, Feb 27, 2020 at 04:21:51PM +0800, liaoweixiong wrote:
>> On 2020/2/26 AM 8:52, Kees Cook wrote:
>>> On Fri, Feb 07, 2020 at 08:25:45PM +0800, WeiXiong Liao wrote:
>>>> +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o
>>>> +pstore_blk-y += blkzone.o
>>>
>>> Why this dance with files? I would just expect:
>>>
>>> obj-$(CONFIG_PSTORE_BLK) += blkzone.o
>>>
>>
>> This makes the built module named blkzone.ko rather than
>> pstore_blk.ko.
>
> You can just do a regular build rule:
>
> obj-$(CONFIG_PSTORE_BLK) += blkzone.o
>

I don't get it. If make it as your words, the built module will be
blkzone.ko.
The module is named pstore/blk, however it built out blkzone.ko. I think
it's
confusing.

>>>> +#define BLK_SIG (0x43474244) /* DBGC */
>>>
>>> I was going to suggest extracting PERSISTENT_RAM_SIG, renaming it and
>>> using it in here and in ram_core.c, but then I realize they're not
>>> marking the same structure. How about choosing a new magic sig for the
>>> blkzone data header?
>>>
>>
>> That's OK to me. I don't know if there is a rule to get a new magic?
>> In addition, all members of this structure are the same as
>> struct persistent_ram_buffer after patch 2. Maybe it's a good idea to
>> extract it
>> if you want to merge ramoops and pstore/blk.
>
> Okay, let's leave it as-is for now.
>
>>>> + uint32_t sig;
>>>> + atomic_t datalen;
>>>> + uint8_t data[];
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct blkz_dmesg_header: dmesg information
>>>
>>> This is the on-disk structure also?
>>>
>> Yes. The structure blkz_buffer is a generic header for all recorder
>> zone, and the
>> structure blkz_dmesg_header is a header for dmesg, saved in
>> blkz_buffer->data.
>> The dmesg recorder use it to save it's specific attributes.
>
> Okay, can you add comments to distinguish the on-disk structures from
> the in-memory, etc?
>

Sure. I will do it.

>>>> +#define DMESG_HEADER_MAGIC 0x4dfc3ae5
>>>
>>> How was this magic chosen?
>>
>> It's a random number. Maybe should I chose a meaningful magic?
>
> That's fine; just add a comment to say so.
>

OK.

>>>> + * @dirty:
>>>> + * mark whether the data in @buffer are dirty (not flush to storage yet)
>>>> + */
>>>
>>> Thank you for the kerndoc! :) Is it linked to from any .rst files?
>>>
>>
>> I don't get your words. There is a document on the 6th patch. I don't know
>> whether it is what you want?
>
> Patch 6 is excellent; I think you might want to add references back to
> these kern-doc structures using the ".. kernel-doc::
> fs/pstore/blkzone.c" syntax:
> https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#including-kernel-doc-comments
>

Wow! I marvel at kernel-doc. Your link has helped me a lot.

I will optimize all my comment and document later.

>>>> +static int blkz_zone_write(struct blkz_zone *zone,
>>>> + enum blkz_flush_mode flush_mode, const char *buf,
>>>> + size_t len, unsigned long off)
>>>> +{
>>>> + struct blkz_info *info = blkz_cxt.bzinfo;
>>>> + ssize_t wcnt = 0;
>>>> + ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos);
>>>> + size_t wlen;
>>>> +
>>>> + if (off > zone->buffer_size)
>>>> + return -EINVAL;
>>>> + wlen = min_t(size_t, len, zone->buffer_size - off);
>>>> + if (buf && wlen) {
>>>> + memcpy(zone->buffer->data + off, buf, wlen);
>>>> + atomic_set(&zone->buffer->datalen, wlen + off);
>>>> + }
>>>
>>> If you're expecting concurrent writers (use of atomic_set(), I would
>>> expect the whole write to be locked instead. (i.e. what happens if
>>> multiple callers call blkz_zone_write()?)
>>>
>>
>> I don't agree with it. The datalen will be updated everywhere. It's useless
>> to lock here.
>
> But there could be multiple writers; locking should be needed.
>

All the recorders such as dmesg, pmsg, console and ftrace have been
locked on
pstore and upper layers. So, a recorder will not write in parallel and
different
recorders operate privately zone. They don't have any influence on each
other.

The only parallel case I think is that recorder writes while dirty-flush
thread is
working. And the dirty-flusher will flush the whole zone rather than
part of it, so,
it is OK to call in parallel.

Based on these reasons, I don't think locking should be needed.

>> One more things. During the analysis, I found another problem.
>> Removing old files will cause new logs to be lost. Take console recorder as
>> am example. After new rebooting, new logs are saved to buf while old
>> logs are
>> saved to old_buf. If we remove old file at that time, not only old_buf
>> is freed, but
>> also length of buf for new data is reset to zero. The ramoops may also
>> has this
>> problem.
>
> Hmm. I'll need to double-check this. It's possible the call to
> persistent_ram_zap() in ramoops_pstore_erase() is not needed.
>
>>>> +static int blkz_recover_dmesg_data(struct blkz_context *cxt)
>>>
>>> What does "recover" mean in this context? Is this "read from storage"?
>>
>> Yes. "recover" means reading data back from storage.
>
> Okay. Please add some comments here. I would think of it more as "read"
> or "load". When I think of "recover" I think of "finding something that
> was lost". But the name isn't important as long as there is a comment
> somewhere about what it's doing.
>

OK. I will add some comments on entry function blkz_recovery()ã

> -Kees
>

--
WeiXiong Liao