Re: [RFC] UBIFS authentication

From: David Gstir
Date: Mon Apr 09 2018 - 11:23:15 EST


Hi Sascha,

> On 09.04.2018, at 11:59, Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
>
> Hi David,
>
> On Wed, Jan 17, 2018 at 04:19:14PM +0100, David Gstir wrote:
>> Hi everybody!
>>
>> ### Index Authentication
>>
>> Through UBIFS' concept of a wandering tree, it already takes care of only
>> updating and persisting changed parts from leaf node up to the root node
>> of the full B+ tree. This enables us to augment the index nodes of the tree
>> with a hash over each node's child nodes. As a result, the index basically also
>> a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
>> data, the hashes of their parent index nodes thus cover all the file contents
>> and file metadata. When a file changes, the UBIFS index is updated accordingly
>> from the leaf nodes up to the root node including the master node. This process
>> can be hooked to recompute the hash only for each changed node at the same time.
>> Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
>> the root node to ensure the node's integrity.
>>
>> To ensure the authenticity of the whole index, the UBIFS master node stores a
>> keyed hash (HMAC) over its own contents (which includes the location of the root
>> node) and the root node of the index tree. As mentioned above, the master node
>> is always written to the flash whenever the index is persisted (ie. on index
>> commit).
>>
>> Using this approach only UBIFS index nodes and the master node are changed to
>> include a hash. All other types of nodes will remain unchanged. This reduces
>> the storage overhead which is precious for users of UBIFS (ie. embedded
>> devices).
>>
>>
>> HMAC Master Node
>> |
>> ' ' ' ' ' ' ' ' ' ' ' ' ' '|' '
>> ' +---------------+ o '
>> ' | Master Node | '
>> ' +---------------+ ' Hash Index Node #1
>> ' | ' |
>> . . . . . . . .'. . . . . . v . . . . . . . . . . . . . . . .|. . . . .
>> . ' +---------------+ ' o .
>> . ' | Index Node #1 | ' .
>> . ' +---------------+ ' .
>> . ' | ... | (fanout: 8) .
>> . ' ' ' ' | ' ' ' ' | ' ' ' ' ' .
>> . +-------+ +------+ .
>> . | | .
>> . ' ' ' ' ' v ' ' ' ' ' ' ' ' ' ' ' v ' ' ' ' ' ' ' ' ' ' ' .
>> . ' +---------------+ ' ' +---------------+ ' .
>> . ' | Index Node #2 | ' ' | Index Node #3 | ' .
>> . ' +---------------+ ' ' +---------------+ ' .
>> . ' | ... ' ' | ... | ' .
>> . . . ' . v . . . . . . . '. ' . . . v . . . . v . . . . . . . .' . . .
>> ' +-----------+ ' '+----------+ +-----------+ '
>> ' | Data Node | ' '| INO Node | | DENT Node | '
>> ' +-----------+ ' '+----------+ +-----------+ '
>> ' o ' ' o '
>> ' '|' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '|' '
>> | |
>> Hash Index Node #2 Hash Index Node #3
>
> When a hash covers an index node and also its children then of course
> this is really space efficient, but this also means that in order to
> read a node we always have to read all of its children. Also adding a
> new leaf node means rehashing all siblings. Is it really worth paying
> this price to save a few bytes for more hashes?
> I would rather suggest to add a hash per child to each index node.

What you propose is a simple tradeoff between the required on-flash storage
space and the amount of read operations needed to verify the index node integrity.

To get some numbers here: In case of SHA256 and a fanout of 8, we'd store an
additional 224 bytes per index node and safe 7 index node reads per tree level.

I hope it is clear that having a single hash does _not_ mean we have to read
the whole tree! :)

Considering that UBIFS is mostly used in embedded where storage space is rather
precious, we opted for storing a single hash. But sure, storing a hash per
branch/child is a possibility and we have to discuss if that is acceptable
in most use cases.

As for adding new leaf nodes: If we add a leaf node, we'll always have to recompute
the hash of every index node on the path to the root node. Even with a hash per
branch/child. In case of TNC split/merge operations likely even more.
But I don't get why you think that would mean we have to recompute the hash
on all siblings of that nodes? Or do you simply mean that we have to re-read all
those siblings to compute the hash on the parent?

Thanks,
David