neighbour entry incorrectly moved to NUD_REACHABLE

From: ash . millar
Date: Sun Feb 17 2019 - 21:43:21 EST


We have encountered an issue resulting from commit 2724680bceee ("neigh: Keep neighbour cache entries if number of them is small enough."), which allows stale entries to remain in the neigh table indefinitely if the total number of entries is less than gc_thresh1.

This issue arises if:
- a stale entry has existed for a long time, so it has a sufficiently old neigh->confirmed value
- the neighbour itself has sinced change MAC address
- we then try to ping the neighbour

When we ping the neighbour, the entry moves into NUD_DELAY as expected. But then, within neigh_timer_handler(), an incorrect jiffie comparison causes time_before_eq(now, neigh->confirmed + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME)) to return true and the entry is erroneously moved to NUD_REACHABLE. The entry becomes stuck in this state, even though it is not actually reachable as the neighbour has since changed MAC address.

The necessary age of neigh->confirmed for this to occur depends on the platform. It occurs after approximitely 100 days on a 32-bit platform with 250HZ.

We have resolved this by setting gc_thresh1 = 0, which effectively undoes commit 2724680bceee.

I would like to know if anyone else has observed this or has an alternative solution.

Kind regards,
Ash