[RFC][PATCH v2 13/13] docs: Add documentation of the integrity digest cache

From: Roberto Sassu
Date: Sat Aug 12 2023 - 07:07:15 EST


From: Roberto Sassu <roberto.sassu@xxxxxxxxxx>

Add the documentation of the integrity digest cache in
Documentation/security.

Signed-off-by: Roberto Sassu <roberto.sassu@xxxxxxxxxx>
---
Documentation/security/index.rst | 1 +
.../security/integrity-digest-cache.rst | 484 ++++++++++++++++++
MAINTAINERS | 1 +
3 files changed, 486 insertions(+)
create mode 100644 Documentation/security/integrity-digest-cache.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 6ed8d2fa6f9..3316d50c839 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -18,3 +18,4 @@ Security Documentation
digsig
landlock
secrets/index
+ integrity-digest-cache
diff --git a/Documentation/security/integrity-digest-cache.rst b/Documentation/security/integrity-digest-cache.rst
new file mode 100644
index 00000000000..371b3f84780
--- /dev/null
+++ b/Documentation/security/integrity-digest-cache.rst
@@ -0,0 +1,484 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Integrity Digest Cache
+======================
+
+Introduction
+============
+
+The main goal of Integrity Measurement Architecture (IMA) is to perform a
+measurement of the file content and use it for remote attestation, to
+report a possibly compromised system, using the TPM as a root of trust. It
+can also prevent a system compromise from happening by checking the
+calculated file digest against a known-good reference value and by denying
+the current operation if there is a mismatch.
+
+
+Motivation
+==========
+
+This patch set aims to address two important shortcomings: predictability
+of the Platform Configuration Registers (PCRs), and the provisioning of
+reference values to compare the calculated file digest against.
+
+Remote attestation, according to Trusted Computing Group (TCG)
+specifications, is done by replicating the PCR extend operation in
+software with the digests in the event log (in this case the IMA
+measurement list), and by comparing the obtained value with the PCR value
+signed by the TPM with the quote operation.
+
+Due to how the extend operation is performed, if measurements are done in
+a different order, the final PCR value will be different. That means that
+if measurements are done in parallel, there is no way to predict what the
+final PCR value will be, making impossible to seal data to a PCR value. If
+the PCR value was predictable, a system could for example prove its
+integrity by unsealing and using its private key, without sending every
+time the full list of measurements.
+
+Provisioning reference values for file digests is also a difficult task.
+The solution so far was to add file signatures to RPM packages, and
+possibly to DEB packages, so that IMA can verify them. While this undoubtly
+works, it also requires Linux distribution vendors to support the feature
+by rebuilding all their packages, and eventually extending their PKI to
+perform the additional signatures. It could also require developers extra
+work to deal with the additional data.
+
+On the other hand, since often packages carry the file digests themselves,
+it won't be actually needed to add file signatures. If the kernel was able
+to extract the file digests by itself, all the tasks mentioned above for
+the Linux distribution vendors won't be needed too. All current and past
+Linux distributions can be easily retrofitted to enable IMA appraisal with
+the file digests from the packages.
+
+Narrowing down the scope of a package parser to only extract specific
+information makes it small enough to accurately verify that it cannot harm
+the kernel. An additional mitigation consists in verifying the signature of
+the package first, before attempting to extract the file digests.
+
+
+Solution
+========
+
+To avoid a PCR is extended in a non-deterministic way, the proposed
+solution is to replace individual file measurements with the measurement of
+a file (the digest list) containing a set of file digests. If the
+calculated digest of a file being measured/appraised matches one digest in
+the set, its measurement is skipped. If otherwise there is no match, the
+file digest is added to the measurement list.
+
+The resulting measurement list, which cannot be done on the default IMA PCR
+to avoid ambiguity with the default-style measurement, has the following
+meaning: none/some/all files represented with the measurement of the digest
+lists COULD have been accessed, without knowing IF and WHEN. Any other
+measurement (other than boot_aggregate) is of a file whose digest was not
+included in the digest list.
+
+File signatures have a coarser granularity, it is per-signing key and not
+per-package. A measurement list containing just the measurement of the
+signing keys and the files without/invalid signature (those with valid
+signature would be skipped) would be even less accurate.
+
+To ensure a rapid and smooth deployment of IMA appraisal, the kernel has
+been provided with the ability to extract file digests from the RPM
+package headers, and add them to the kernel memory on demand (only when a
+file from a given package is accessed). This ensures that the memory
+consumption for this new feature is directly proportional to the usage of
+the system.
+
+
+Scope
+=====
+
+The integrity digest cache enables IMA to extend a PCR (not the default
+one) in a deterministic fashion, and to appraise immutable files with file
+digests from the packages, when no other appraisal method is available. It
+does not yet support metadata verification with Extended Verification
+Module (EVM), for which a separate patch set will be provided.
+
+
+Design
+======
+
+The digest cache is a hash table of file digests, attached to the inode of
+the digest list from which file digests are extracted. It is accessible,
+when a given file is being measured/appraised, from the new xattr
+security.digest_list, containing the path of the digest list itself.
+
+If the calculated file digest is found in the digest cache, its measurement
+is avoided, or read-only access is granted if appraisal is in enforcing
+mode. Read-write access is prevented to avoid updating an unverified HMAC
+of file metadata.
+
+The digest cache can be used only if the following conditions are met:
+
+- The ``digest_cache=content`` keyword is added to the desired IMA policy
+ rules;
+- If the rule action is ``measure``, a PCR different from the default one
+ is specified;
+- If the rule action is ``appraise``, ``digest_cache=content`` and
+ ``appraise_type`` don't appear at the same time;
+- The same action for which the digest cache is used was done also on the
+ digest list;
+- The digest cache (currently) is not used for measurement/appraisal of
+ other digest lists.
+
+For performance reasons, the digest cache is attached to every inode using
+it, since multiple hooks can be invoked on it before the
+measurement/appraisal result is cached. A reference count indicates how
+many inodes use it, and only when it reaches zero, the digest cache can be
+freed (for example when inodes are evicted from memory).
+
+Two digest cache pointers have been added to the iint to distinguish for
+which purpose they should be used: dig_owner points to the digest cache
+created from the same inode the iint refers to, and should be used for
+measurement/appraisal of other inodes; dig_user points to the digest
+cache created from a different inode, and requested for
+measurement/appraisal. One digest cache pointer would be confusing, as
+for digest lists the digest cache was created from them, but IMA would
+try to use that digest cache for measurement/appraisal of itself.
+
+Finally, at the first digest list measurement, an iterator is executed to
+sequentially read (not parse) all the digest lists in the same directory,
+so that the PCR is extended in a deterministic fashion. The other
+concurrent users of the digest cache have to wait until the iterator
+finishes.
+
+
+API
+===
+
+Data Structures
+~~~~~~~~~~~~~~~
+
+.. kernel-doc:: security/integrity/digest_cache.h
+
+
+Functions
+~~~~~~~~~
+
+.. kernel-doc:: security/integrity/digest_cache.c
+
+``digest_cache_alloc()``, ``digest_cache_parse_digest_list()`` and
+``digest_cache_new()`` are internal functions used during the creation and
+initialization of the digest cache.
+
+``digest_cache_get()`` and ``digest_cache_free()`` are called by the user
+of the digest cache (e.g. IMA), to obtain and free a digest cache.
+
+``digest_cache_init_htable()``, ``digest_cache_add()`` and
+``digest_cache_lookup()`` are called by the digest list parsers to populate
+and search in a digest cache.
+
+
+Digest List Formats
+===================
+
+tlv
+~~~
+
+The Type-Length-Value (TLV) format was chosen for its extensibility.
+Additional fields can be added without breaking compatibility with old
+versions of the parser.
+
+The layout of a tlv digest list is the following::
+
+ [header: DIGEST_LIST_FILE, num fields, total len]
+ [field: DIGEST_LIST_ALGO, length, value]
+ [field: DIGEST_LIST_ENTRY#1, length, value (below)]
+ |- [header: DIGEST_LIST_FILE, num fields, total len]
+ |- [ENTRY#1_DIGEST, length, file digest]
+ |- [ENTRY#1_PATH, length, file path]
+ [field: DIGEST_LIST_ENTRY#N, length, value (below)]
+ |- [header: DIGEST_LIST_FILE, num fields, total len]
+ |- [ENTRY#N_DIGEST, length, file digest]
+ |- [ENTRY#N_PATH, length, file path]
+
+DIGEST_LIST_ALGO is a field to specify the algorithm of the file digest.
+DIGEST_LIST_ENTRY is a nested TLV structure with the following fields:
+ENTRY_DIGEST contains the file digest; ENTRY_PATH contains the file path.
+
+
+rpm
+~~~
+
+The rpm digest list is basically a subset of the RPM package header.
+Its format is::
+
+ [RPM magic number]
+ [RPMTAG_IMMUTABLE]
+
+RPMTAG_IMMUTABLE is a section of the full RPM header containing the part
+of the header that was signed, and whose signature is stored in the
+RPMTAG_RSAHEADER section.
+
+
+Appended Signature
+~~~~~~~~~~~~~~~~~~
+
+Digest lists can have a module-style appended signature, that can be used
+for appraisal with IMA. The signature type can be PKCS#7, as for kernel
+modules, or the new user asymmetric key signature.
+
+
+History
+=======
+
+The original name of this work was IMA Digest Lists, which was somehow
+considered too invasive. The code was moved to a separate component named
+DIGLIM (DIGest Lists Integrity Module), with the purpose of removing the
+complexity away of IMA, and also add the possibility of using it with other
+kernel components (e.g. Integrity Policy Enforcement, or IPE).
+
+Since it was originally proposed, in 2017, this work grew up a lot thanks
+to various comments/suggestions. It became integrally part of the openEuler
+distribution since end of 2020.
+
+There are significant differences between this and the previous versions.
+The most important one is moving from a centralized repository of file
+digests to a per-package repository. This significantly reduces the memory
+pressure, since digest lists are loaded into kernel memory only when they
+are actually needed. Also, file digests are automatically unloaded from
+kernel memory at the same time inodes are evicted from memory during
+reclamation.
+
+
+Performance
+===========
+
+The tests have been performed on a Fedora 38 virtual machine, with 8 cores
+(AMD EPYC-Rome), 4GB of RAM, TPM passthrough. The signing key is an ECDSA
+NIST P-384.
+
+IMA measurement policy: no cache
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+ dont_measure fsmagic=0x01021994
+ measure func=BPRM_CHECK
+ measure func=MMAP_CHECK
+
+
+IMA measurement policy: cache
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+ dont_measure fsmagic=0x01021994
+ measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
+ measure func=BPRM_CHECK digest_cache=content pcr=11
+ measure func=MMAP_CHECK digest_cache=content pcr=11
+
+
+IMA Measurement Results
+~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ +-----------+-----------+-----------+
+ | # measur. | boot time | slab |
+ +-----------------------------+-----------+-----------+-----------+
+ | measure (no cache) | 389 | 12.682s | 231453 KB |
+ +-----------------------------+-----------+-----------+-----------+
+ | measure (cache, no iter) | 175 | 12.283s | 234224 KB |
+ +-----------------------------+-----------+-----------+-----------+
+ | measure (cache, iter) | 853 | 16.430s | 238491 KB |
+ +-----------------------------+-----------+-----------+-----------+
+
+With the iterator enabled, all 852 packages are measured. Consequently, the
+boot time is longer. One possible optimization would be to exclude the
+packages that don't include measured files. By disabling the iterator, it
+can be seen that the packages actually used are 174 (one measurement is for
+boot_aggregate).
+
+
+IMA appraisal policy: no cache
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+ dont_appraise fsmagic=0x01021994
+ appraise func=BPRM_CHECK appraise_type=imasig
+ appraise func=MMAP_CHECK appraise_type=imasig
+
+
+IMA appraisal policy: cache
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+ dont_appraise fsmagic=0x01021994
+ appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
+ appraise func=BPRM_CHECK digest_cache=content
+ appraise func=MMAP_CHECK digest_cache=content
+
+
+IMA Appraisal Results
+~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ +-----------+-----------+
+ | boot time | slab |
+ +-----------------------------+-----------+-----------+
+ | appraise (no cache) | 11.995s | 231145 KB |
+ +-----------------------------+-----------+-----------+
+ | appraise (cache) | 11.879s | 233091 KB |
+ +-----------------------------+-----------+-----------+
+
+In this test, it can be seen that the performance of the two solutions are
+comparable, with the digest cache slightly ahead. The difference could be
+more substantial with more file appraised.
+
+
+How to Test
+===========
+
+First, it is necessary to copy the new kernel headers (tlv_parser.h,
+uasym_parser.h, tlv_digest_list.h) from usr/include/linux in the kernel
+source directory to /usr/include/linux.
+
+Then, gpg must be rebuilt with the additional patches to convert the PGP
+keys of the Linux distribution to the new user asymmetric key format:
+
+.. code-block:: bash
+
+ $ gpg --conv-kernel <path of PGP key> >> certs/uasym_keys.bin
+
+This embeds the converted keys in the kernel image. Then, the following
+kernel options must be enabled:
+
+.. code-block:: bash
+
+ CONFIG_INTEGRITY_DIGEST_CACHE=y
+ CONFIG_UASYM_KEYS_SIGS=y
+ CONFIG_UASYM_PRELOAD_PUBLIC_KEYS=y
+
+and the kernel must be rebuilt with the patches applied. After boot, it is
+necessary to build and install the digest list tool in tools/digest-lists,
+and to execute (as root):
+
+.. code-block:: bash
+
+ # manage_digest_lists -o gen -d /etc/digest_lists -i rpmdb -f rpm
+
+The new gpg must also be installed in the system, as it will be used to
+convert the PGP signatures of the RPM headers to the user asymmetric key
+format.
+
+It is recommended to create an additional digest list with the following
+files, by creating a file named ``list`` with the content:
+
+.. code-block:: bash
+
+ /usr/bin/manage_digest_lists
+ /usr/lib64/libgen-tlv-list.so
+ /usr/lib64/libgen-rpm-list.so
+ /usr/lib64/libparse-rpm-list.so
+ /usr/lib64/libparse-tlv-list.so
+
+Then, to create the digest list, it is sufficient to execute:
+
+.. code-block:: bash
+
+ # manage_digest_lists -i list -L -d /etc/digest_lists -o gen -f tlv
+
+If appraisal is enabled and in enforcing mode, it is necessary to sign the
+new digest list, with the sign-file tool in the scripts/ directory of the
+kernel sources:
+
+.. code-block:: bash
+
+ # scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-list
+
+The final step is to add security.digest_list to each file with:
+
+.. code-block:: bash
+
+ # manage_digest_lists -i /etc/digest_lists -o add-xattr
+
+After that, it is possible to test the integrity digest cache with the
+following policy written to /etc/ima/ima-policy:
+
+.. code-block:: bash
+
+ dont_measure fsmagic=0x01021994
+ measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
+ measure func=BPRM_CHECK digest_cache=content pcr=11
+ measure func=MMAP_CHECK digest_cache=content pcr=11
+ dont_appraise fsmagic=0x01021994
+ appraise func=BPRM_CHECK digest_cache=content
+ appraise func=MMAP_CHECK digest_cache=content
+ appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
+
+Tmpfs is excluded for now, until memfd is properly handled.
+
+Before loading the policy, it is possible to enable dynamic debug to see
+which operations are done by the integrity digest cache:
+
+.. code-block:: bash
+
+ # echo "file tlv* +p" > /sys/kernel/debug/dynamic_debug/control
+ # echo "file rpm* +p" > /sys/kernel/debug/dynamic_debug/control
+ # echo "file digest* +p" > /sys/kernel/debug/dynamic_debug/control
+
+Alternatively, the same strings can be set as value of the dyndbg= option
+in the kernel command line.
+
+A preliminary test, before booting the system with the new policy, is to
+supply the policy to IMA in the current system with:
+
+.. code-block:: bash
+
+ # cat /etc/ima/ima-policy > /sys/kernel/security/ima/policy
+
+If that worked, the system can be rebooted. Systemd will take care of
+loading the IMA policy at boot. The instructions have been tested on a
+Fedora 38 OS.
+
+After boot, it is possible to check the content of the measurement list:
+
+.. code-block:: bash
+
+ # cat /sys/kernel/security/ima/ascii_runtime_measurements
+
+If only the files shipped with Fedora 38 have been executed, the
+measurement list will contain only the digest lists, and not the individual
+files.
+
+Another test is to ensure that IMA prevents the execution of unknown files:
+
+.. code-block:: bash
+
+ # cp -a /bin/cat .
+ # ./cat
+
+That will work. But not on the modified binary:
+
+.. code-block:: bash
+
+ # echo 1 >> cat
+ # cat
+ -bash: ./cat: Permission denied
+
+Execution will be denied, and a new entry in the measurement list will
+appear (it would be probably ok to not add that entry, as access to the
+file was denied):
+
+.. code-block:: bash
+
+ 11 50b5a68bea0776a84eef6725f17ce474756e51c0 ima-ng sha256:15e1efee080fe54f5d7404af7e913de01671e745ce55215d89f3d6521d3884f0 /root/cat
+
+Finally, it is possible to test the shrinking of the digest cache, by
+forcing the kernel to evict inodes from memory:
+
+.. code-block:: bash
+
+ # echo 3 > /proc/sys/vm/drop_caches
+
+The kernel log should have messages like:
+
+.. code-block:: bash
+
+ [ 313.032536] DIGEST CACHE: Remove digest sha256:102900208eef27b766380135906d431dba87edaa7ec6aa72e6ebd3dd67f3a97b from digest list /etc/digest_lists/rpm-libseccomp-2.5.3-4.fc38.x86_64
diff --git a/MAINTAINERS b/MAINTAINERS
index 2bd85dcfd23..af33db344ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10293,6 +10293,7 @@ M: Dmitry Kasatkin <dmitry.kasatkin@xxxxxxxxx>
L: linux-integrity@xxxxxxxxxxxxxxx
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
+F: Documentation/security/integrity-digest-cache.rst
F: security/integrity/
F: security/integrity/ima/
F: tools/digest-lists/
--
2.34.1