[BUG] Race between policy reload sidtab conversion and live conversion
From: Tyler Hicks
Date: Tue Feb 23 2021 - 16:44:50 EST
I'm seeing a race during policy load while the "regular" sidtab
conversion is happening and a live conversion starts to take place in
sidtab_context_to_sid().
We have an initial policy that's loaded by systemd ~0.6s into boot and
then another policy gets loaded ~2-3s into boot. That second policy load
is what hits the race condition situation because the sidtab is only
partially populated and there's a decent amount of filesystem operations
happening, at the same time, which are triggering live conversions.
[ 3.091910] Unable to handle kernel paging request at virtual address 001303e1aa140408
[ 3.100083] Mem abort info:
[ 3.102963] ESR = 0x96000004
[ 3.102965] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.102967] SET = 0, FnV = 0
[ 3.102968] EA = 0, S1PTW = 0
[ 3.102969] Data abort info:
[ 3.102970] ISV = 0, ISS = 0x00000004
[ 3.102971] CM = 0, WnR = 0
[ 3.102973] [001303e1aa140408] address between user and kernel address ranges
[ 3.102977] Internal error: Oops: 96000004 [#1] SMP
[ 3.102981] Modules linked in:
[ 3.111250] bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
[ 3.111259] CPU: 0 PID: 529 Comm: (tservice) Tainted: G O 5.10.17.1 #1
[ 3.119881] Hardware name: redacted (DT)
[ 3.119884] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--)
[ 3.119898] pc : sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202)
[ 3.119902] lr : sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312)
[ 3.126105] sp : ffff800011ceb810
[ 3.126106] x29: ffff800011ceb810 x28: 0000000000000000
[ 3.126108] x27: 0000000000000005 x26: ffffda109f3f2000
[ 3.126110] x25: 00000000ffffffff x24: 0000000000000000
[ 3.126113] x23: 0000000000000001
[ 3.133124] x22: 0000000000000005
[ 3.133125] x21: aa1303e1aa140408 x20: 0000000000000001
[ 3.133127] x19: 00000000000000cc x18: 0000000000000003
[ 3.133128] x17: 000000000000003e x16: 000000000000003f
[ 3.145519] x15: 0000000000000039 x14: 000000000000002e
[ 3.145521] x13: 0000000058294db1 x12: 00000000158294db
[ 3.145523] x11: 000000007f0b3af2 x10: 0000000000004e00
[ 3.145525] x9 : 00000000000000cd x8 : 0000000000000005
[ 3.281289] x7 : feff735e62647764 x6 : 00000000000080cc
[ 3.286769] x5 : 0000000000000005 x4 : ffff3f47c5b20000
[ 3.292249] x3 : ffff800011ceb900 x2 : 0000000000000001
[ 3.297729] x1 : 00000000000000cc x0 : aa1303e1aa1403e0
[ 3.303210] Call trace:
[ 3.305733] sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202)
[ 3.309867] sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312)
[ 3.314451] security_context_to_sid_core (/usr/src/kernel/security/selinux/ss/services.c:1557)
[ 3.319661] security_context_to_sid_default (/usr/src/kernel/security/selinux/ss/services.c:1616)
[ 3.324961] inode_doinit_use_xattr (/usr/src/kernel/security/selinux/hooks.c:1366)
[ 3.329634] inode_doinit_with_dentry (/usr/src/kernel/security/selinux/hooks.c:1457)
[ 3.334486] selinux_d_instantiate (/usr/src/kernel/security/selinux/hooks.c:6278)
[ 3.338889] security_d_instantiate (/usr/src/kernel/security/security.c:2004)
[ 3.343385] d_splice_alias (/usr/src/kernel/fs/dcache.c:3030)
[ 3.347251] squashfs_lookup (/usr/src/kernel/fs/squashfs/namei.c:220)
[ 3.385561] el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
[ 3.389517] el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
[ 3.392939] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0)
All code
========
0: 18 27 sbb %ah,(%rdi)
2: 00 51 d7 add %dl,-0x29(%rcx)
5: 01 00 add %eax,(%rax)
7: 34 68 xor $0x68,%al
9: 27 (bad)
a: d8 1a fcomps (%rdx)
c:* 15 4c 28 8b a0 adc $0xa08b284c,%eax <-- trapping instruction
11: 02 40 f9 add -0x7(%rax),%al
Code starting with the faulting instruction
===========================================
0: a0 .byte 0xa0
1: 02 40 f9 add -0x7(%rax),%al
[ 3.399230] ---[ end trace cc1840b3ff2c7506 ]---
The corresponding source from sidtab.c:
179 static struct sidtab_entry *sidtab_do_lookup(struct sidtab *s, u32 index,
180 int alloc)
181 {
...
193 /* lookup inside the subtree */
194 entry = &s->roots[level];
195 while (level != 0) {
196 capacity_shift -= SIDTAB_INNER_SHIFT;
197 --level;
198
199 entry = &entry->ptr_inner->entries[leaf_index >> capacity_shift];
200 leaf_index &= ((u32)1 << capacity_shift) - 1;
201
202 if (!entry->ptr_inner) {
203 if (alloc)
204 entry->ptr_inner = kzalloc(SIDTAB_NODE_ALLOC_SIZE,
205 GFP_ATOMIC);
206 if (!entry->ptr_inner)
207 return NULL;
208 }
209 }
210 if (!entry->ptr_leaf) {
211 if (alloc)
212 entry->ptr_leaf = kzalloc(SIDTAB_NODE_ALLOC_SIZE,
213 GFP_ATOMIC);
214 if (!entry->ptr_leaf)
215 return NULL;
216 }
217 return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES];
218 }
...
263 int sidtab_context_to_sid(struct sidtab *s, struct context *context,
264 u32 *sid)
265 {
...
305 /*
306 * if we are building a new sidtab, we need to convert the context
307 * and insert it there as well
308 */
309 if (convert) {
310 rc = -ENOMEM;
311 dst_convert = sidtab_do_lookup(convert->target, count, 1);
312 if (!dst_convert) {
313 context_destroy(&dst->context);
314 goto out_unlock;
315 }
...
What I'm having trouble understanding is how the above call to
sidtab_do_lookup(), on the target sidtab that's undergoing a conversion
in sidtab_convert(), can be expected to work. sidtab_convert_tree() is
allocating and initializing ptr_inner sidtab nodes at the same time
sidtab_do_lookup() is trying to use them with no locking being performed
on the target sidtab.
Ondrej specifically mentions, in commit ee1a84fdfeed ("selinux: overhaul
sidtab to fix bug and improve performance"), that there's no need to
freeze the sidtab during policy reloads so I know that there's thought
given to these code paths running in parallel.
Can someone more knowledgeable on how the sidtab locking is expected to
work suggest a fix for this crash?
Tyler