I have a RUT-240 acting as a simple 4G router. Recently recommissioned it and updated the firmware to the latst, including the new UI. Nothing out of the ordinary in terms of config. Wifi clients, DHCP server, NAT for the 4G wan and a single wireguard tunnel.
Everything works fine but I found network connectivity would drop sometimes. Usually when changing some UI config or refreshing the UI overview.
At first I thought it was attempting failover due to loss of 4G signal or something, but I found the actual router SSH service was unobtainable and the router IP on the LAN couldn’t be pinged. This took sever al minutes to recover. Evetually everything returned to normal. Looking at the uptime on the router it seems to have reset.
I seem to have narrowed it down to load coming from the router handline normal operations and the UI. if I make a config change, say modify the DHCP server options, and then load the “overview” status page the router’s load goes up to 8-10.
Is there any logging I can turn on to see what’s happening?
Is anyone else seeing this load-related problem?
What causes the reboot? I don’t have any ping-timeout watchdogs set but presumably some other watchdog is triggering.
For me it’s annoying, I just wait a few minutes and go back to work, but I could see that, given the only interface for modifying router configs is the WebUI, this might cause problems for more critical routers. Makes it nerve racking to change configuraiton options when you’re not sure if the router’s going to hang on you…
When navigating to the overview page, the router tries to pull a lot of information and thus, the CPU load can be high. However, the device should function normally.
Could you please clarify if the router actually reboots when this happens? Also, how long does it take for it to recover?
It is also possible that there was an issue when you updated the firmware. You can try restoring the device to factory defaults in System → Backup, or better, try to reflash the firmware via bootloader as described here.
You’re right about the reboot. Seems suspect and I want to try to reproduce the problem in a more controlled way.
Load does increase but not super high, but the 4g modem definitely loses connection. Now this seems to be when making a config change but to something unrelated to that interface. (adding a DHCP option or configuring a wireguard endpoint). Neither should cause connectivity to drop,
If there’s any logs I can gather while I’m reproducing this let me know.
The install is effectively new. I updated the firmware and reset to defaults at that point so we should be good there.
Many of the changes that are done on the device require a service restart to take effect. For instance, if you modify certain settings like DNS, DHCP, or network configurations, the relevant services will restart, resulting in a brief drop of the connection. However, the connection should be restored within a few seconds. Thus, could you please clarify how long do these connection drops last?
I have seen the load spike to 15-20 without loss of network or apparent reboot.
Changing the DHCP client options on the LAN interface does seem to cause the wwan interafe to restart losing connection.
I have seen reboots bith under load and apparently “spontaneous”. Not sure if these are related or if there’s some other problem causing them.
Is there any debugging I can turn on to see what happens around the time of a reobot? logread shows nothing of interest that I can see (nothing around the event in fact)
As mentioned, not all configuration changes restart the services needed for connectivity.
The logs are stored in RAM, which means that they are erased every time the device reboots. To ensure the logs are retained even after a reboot, you have the option to switch the storage location to flash memory. This can be done in System → Administration → Troubleshoot. However, please keep in mind that continuously writing logs to flash memory will reduce the lifetime of your flash memory.
Maybe you have some services or scripts running on the device that fill up the RAM memory? For example, maybe there is a TCPDump running on the device (if you downloaded a package and set it up)?
I would also suggest cleaning logs from the command line (instructions here, usename is ‘root’):