Re: [RFC] persistent store

From: Tony Luck
Date: Sun Nov 21 2010 - 16:47:31 EST


On Sun, Nov 21, 2010 at 1:07 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> this is actually very cool.

Thanks for looking at it (and more thanks for this endorsement!)

>> 1) "Why do you only allow one platform driver to register?"
>>   I only have one such driver.  Adding more is easy from the "read" side
>>   (just collect all the records from all devices and remember where they
>>   came from so you can call the correct "eraser" function).  But the "write"
>>   side opens up questions that I don't have good answers for:
>>   - Which device(s) should error records be written to?
>>     All of them? Start with one and move on when it is
>>     full?  Write some types of records to one device?
>
> Maybe based on the error type? We definitely need one device which
> should contain all the records, something like main pstore device.

But who decides which records go where? And which device gets to be
the "main" one? I don't think that we can usefully do this in the
registration mechanism (how does a driver know that other drivers even
exist?). I continue to want to defer this until someone with two or more
devices on one machine steps forward.

>> 3) "/sys/firmware/pstore is the wrong pathname for this".
>>   You are probably right. I put it under "firmware" because that's where
>>   the "efivars" driver put its top level directory. In my case the ERST
>>   back end is firmware, so there is some vague logic to it ... but better
>>   suggestions are welcome. Perhaps /sys/devices/platform/pstore?
>>
>> 4) "/sys is the wrong place for this."
>>   Perhaps.  I definitely want to use some sort of filesystem interface (so
>>   each record shows up as a file to the user). This seems a lot cleaner
>>   than trying to map the semantics of actual persistent storage devices
>>   onto a character device.  The "sysfs_create_bin_file()" API seems very
>>   well designed for this usage.  If not /sys, then where?  "debugfs"
>>   would work - but not everyone mounts debugfs. Creating a whole new
>>   filesystem for this seems like overkill.
>
> No, I think /sys is the right place for it being always present and
> all. Besides, for example, all the ACPI tables are there anyway
> (/sys/firmware/acpi/tables/) so pstore won't be the first blob there.

Heh! The acpi tables code is where I found out how easy it was to handle
blobs bigger than PAGE_SIZE using memory_read_from_buffer().
>
> /sys/firmware might not be all that sensible if someone comes up with
> persistent storage type which is the network but we'll change that then.

I'd like to get the right place first time - change means having to update
any applications that coded in the pathname.

>> +int
>> +pstore_register(struct pstore_info *psi)
>> +{
>> +     struct  pstore_entry    *new_pstore;
>> +     int     rc = 0, type;
>> +     unsigned long size;
>> +     u64     id;
>> +     unsigned long ps_maxsize;
>
> Alignment here? Maybe something like this:
>
>        struct pstore_entry     *new_pstore;
>        unsigned long           ps_maxsize;
>        int                     rc = 0, type;

Are you talking about textual alignment of the declarations? Yours
does look prettier.

>> +
>> +     spin_lock(&pstore_lock);
>> +     if (psinfo) {
>> +             spin_unlock(&pstore_lock);
>> +             return -EBUSY;
>> +     }
>> +     psinfo = psi;
>> +     spin_unlock(&pstore_lock);
>
> Maybe make this lockless with cmpxchg? OTOH, the spinlock would be
> easier when you have multiple persistent storage devices...

cmpxchg would make the code shorter - I'll try recoding and see if I like
the way it looks.

>> +     ps_maxsize = psi->header_size + psi->data_size + psi->footer_size;
>> +     pstore_buf = kzalloc(ps_maxsize, GFP_KERNEL);
>> +     if (!pstore_buf)
>> +             return -ENOMEM;
>
> newline

Yup. Will fix.

>> +int
>> +pstore_write(int type, char *buf, unsigned long size)
>> +{
>> +     if (!psinfo->writer)
>> +             return -ENODEV;
>
> I think you should move this check to the pstore_register() above. We
> don't want persistent storage backends which do not implement ->write
> anyway since the whole point of them becomes moot, no?

Doh! Yes, of course. Will fix.

>> +     list_del(&search_pstore->list);
>> +
>> +     spin_unlock(&pstore_lock);
>> +
>> +     sysfs_remove_bin_file(&pstore_kset->kobj, &search_pstore->attr);
>
> AFAICT, you might want to remove the sysfs file _before_ you remove it
> from the pstore list/erase its contents, otherwise concurrent accesses
> to it from userspace readers might make us go boom.

I'll think about the ordering. I have three things to do here:
1) Remove from the pstore_list
2) Call platform driver to erase from store
3) Call sysfs to remove the binfile.

Potentially the erase could fail ... and I should probably notice
that and do something.

>> +/* types */
>> +#define      PSTORE_DMESG    0
>> +#define      PSTORE_MCE      1
>
> maybe PSTORE_TYPE_DMESG/PSTORE_TYPE_MCE ?

Much better (I suck at naming things). Will fix.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/