Re: [PATCH ghak90 V8 07/16] audit: add contid support for signalling the audit daemon

From: Richard Guy Briggs
Date: Wed Mar 25 2020 - 08:29:26 EST


On 2020-03-20 17:56, Paul Moore wrote:
> On Thu, Mar 19, 2020 at 5:48 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> > On 2020-03-18 17:47, Paul Moore wrote:
> > > On Wed, Mar 18, 2020 at 5:42 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> > > > On 2020-03-18 17:01, Paul Moore wrote:
> > > > > On Fri, Mar 13, 2020 at 3:23 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> > > > > > On 2020-03-13 12:42, Paul Moore wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > > The thread has had a lot of starts/stops, so I may be repeating a
> > > > > > > previous suggestion, but one idea would be to still emit a "death
> > > > > > > record" when the final task in the audit container ID does die, but
> > > > > > > block the particular audit container ID from reuse until it the
> > > > > > > SIGNAL2 info has been reported. This gives us the timely ACID death
> > > > > > > notification while still preventing confusion and ambiguity caused by
> > > > > > > potentially reusing the ACID before the SIGNAL2 record has been sent;
> > > > > > > there is a small nit about the ACID being present in the SIGNAL2
> > > > > > > *after* its death, but I think that can be easily explained and
> > > > > > > understood by admins.
> > > > > >
> > > > > > Thinking quickly about possible technical solutions to this, maybe it
> > > > > > makes sense to have two counters on a contobj so that we know when the
> > > > > > last process in that container exits and can issue the death
> > > > > > certificate, but we still block reuse of it until all further references
> > > > > > to it have been resolved. This will likely also make it possible to
> > > > > > report the full contid chain in SIGNAL2 records. This will eliminate
> > > > > > some of the issues we are discussing with regards to passing a contobj
> > > > > > vs a contid to the audit_log_contid function, but won't eliminate them
> > > > > > all because there are still some contids that won't have an object
> > > > > > associated with them to make it impossible to look them up in the
> > > > > > contobj lists.
> > > > >
> > > > > I'm not sure you need a full second counter, I imagine a simple flag
> > > > > would be okay. I think you just something to indicate that this ACID
> > > > > object is marked as "dead" but it still being held for sanity reasons
> > > > > and should not be reused.
> > > >
> > > > Ok, I see your point. This refcount can be changed to a flag easily
> > > > enough without change to the api if we can be sure that more than one
> > > > signal can't be delivered to the audit daemon *and* collected by sig2.
> > > > I'll have a more careful look at the audit daemon code to see if I can
> > > > determine this.
> > >
> > > Maybe I'm not understanding your concern, but this isn't really
> > > different than any of the other things we track for the auditd signal
> > > sender, right? If we are worried about multiple signals being sent
> > > then it applies to everything, not just the audit container ID.
> >
> > Yes, you are right. In all other cases the information is simply
> > overwritten. In the case of the audit container identifier any
> > previous value is put before a new one is referenced, so only the last
> > signal is kept. So, we only need a flag. Does a flag implemented with
> > a rcu-protected refcount sound reasonable to you?
>
> Well, if I recall correctly you still need to fix the locking in this
> patchset so until we see what that looks like it is hard to say for
> certain. Just make sure that the flag is somehow protected from
> races; it is probably a lot like the "valid" flags you sometimes see
> with RCU protected lists.

This is like looking for a needle in a haystack. Can you point me to
some code that does "valid" flags with RCU protected lists.

> > > > Another question occurs to me is that what if the audit daemon is sent a
> > > > signal and it cannot or will not collect the sig2 information from the
> > > > kernel (SIGKILL?)? Does that audit container identifier remain dead
> > > > until reboot, or do we institute some other form of reaping, possibly
> > > > time-based?
> > >
> > > In order to preserve the integrity of the audit log that ACID value
> > > would need to remain unavailable until the ACID which contains the
> > > associated auditd is "dead" (no one can request the signal sender's
> > > info if that container is dead).
> >
> > I don't understand why it would be associated with the contid of the
> > audit daemon process rather than with the audit daemon process itself.
> > How does the signal collection somehow get transferred or delegated to
> > another member of that audit daemon's container?
>
> Presumably once we support multiple audit daemons we will need a
> struct to contain the associated connection state, with at most one
> struct (and one auditd) allowed for a given ACID. I would expect that
> the signal sender info would be part of that state included in that
> struct. If a task sent a signal to it's associated auditd, and no one
> ever queried the signal information stored in the per-ACID state
> struct, I would expect that the refcount/flag/whatever would remain
> held for the signal sender's ACID until the auditd state's ACID died
> (the struct would be reaped as part of the ACID death). In cases
> where the container orchestrator blocks sending signals across ACID
> boundaries this really isn't an issue as it will all be the same ACID,
> but since we don't want to impose any restrictions on what a container
> *could* be it is important to make sure we handle the case where the
> signal sender's ACID may be different from the associated auditd's
> ACID.
>
> > Thinking aloud here, the audit daemon's exit when it calls audit_free()
> > needs to ..._put_sig and cancel that audit_sig_cid (which in the future
> > will be allocated per auditd rather than the global it is now since
> > there is only one audit daemon).
> >
> > > paul moore
> >
> > - RGB
>
> paul moore

- RGB

--
Richard Guy Briggs <rgb@xxxxxxxxxx>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635