Re: [REGRESSION] RLIMIT_DATA crashes named

From: Linus Torvalds
Date: Fri Sep 16 2016 - 16:32:50 EST


On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <labbott@xxxxxxxxxx> wrote:
>
> As far as I can tell this isn't Fedora specific.

Some googling does seem to say that "datalimit 20M" and "named.conf"
ends up being some really old default that just gets endlessly copied.

So no, it's not Fedora-specific per se.

But I suspect most people with a named.conf did either

(a) get it from their distro and didn't change it and so if the
distro just updates theirs, things will automatically "just work"

(b) actually did write their own (or at least edited it), and knows
what they are doing, and have absolutely no problem removing or
updating that datalimit thing.

> I would like to see RLIMIT_DATA actually do something useful so worse
> case I'll figure out something to carry in Fedora and this thread
> can be an FYI for people googling.

Yeah, even if we only get a good hit for "named segmentation fault", I
guess that will help people a lot.

The really annoying thing seems to be that the kernel message has been
hidden too much. IOW, Sam in his bugzilla report clearly found the
system messages with

Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
user 25 dumped core.

but for some reason never noticed the kernel saying (quoting Jason):

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data

at the same time.

Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
didn't notice the initial failure (which would have had the kernel
message too), and he then looked at the logs for when he tried to
re-start.

Or maybe the system logs don't have those kernel messages, which would
be a disaster.

So maybe we should just change the "pr_warn_once()" into
"pr_warn_ratelimited()", except the default rate limits for that are
wrong (we'd perhaps want something like "at most once every minute" or
similar, while the default rate limits are along the lines of "max 10
lines every 5 _seconds_").

Sam, do you end up seeing the kernel warning in your logs if you just
go back earlier in the boot?

Linus