I created the cronjob to inspect /proc/net/nf_conntrack and restart rms_mqtt if it detects active connections that are traversing via the non-primary WAN interface.
But it has not stopped this rogue data usage from occurring. My data cap was just breached again due to this. Please help, how can I log this traffic somehow to see what it is or why it’s completely ignoring the Failover rules that I have configured?
Whatever it is, it appears to consume ~445MB of data each day.
@luckman212
Maybe you can disable mob1s1a1 for a while and check what does not seem to work anymore. I know that might be a lame approach. I struggle myself with this unwanted consumption of mobile data despite Wifi is available and connected, but I have not yet found a reason for this.
And I am very reluctant to make use of a troubleshoot file that contains all kinds of personal data.
A bit OT but still very relevant in this context:
It would be much better if Teltonika automatically obfuscated/ redacted such information and shared in detail what kind of personal and other confidential information is contained at all, e.g.:
User/ admin names, passwords, Wifi SSIDs/ BSSIDs and credentials, phone numbers, email addresses, IMEI, IMSI, mobile cell, GPS and other coordinates, serial numbers of the router, of the modem, and any other information by which an individual could be identified or located.
@7wells I have been working on this all day. I’m cautiously optimistic that I’ve tracked down the source of the rogue traffic, and come up with a workaround for it. I don’t want to blow the victory horn yet, but I will keep monitoring and if successful, I will share my fix here.
Well for me the most important tool ended up being tcpdump.
Recently I was lucky to catch it on one of my misbehaving devices while the condition was still occurring. Running tcpdump on the qmimux0 interface revealed that (in my case) it was SIP traffic from some very chatty VOIP phones that were stuck communicating via the failover interface.
The exact scenario that causes this is not known yet, but I think it’s a race condition when the hardwired WAN goes up/down rapidly in a short time frame while devices are very actively communicating with stateful connections like SIP. ALG may also be part of this and since finding this edge case I have disabled ALG completely.
I still have a ticket open with Teltonika, so I was waiting a bit more as there might be another solution offered. But for now what I came up with myself is a script that I run with cron every 15 minutes to check for this condition, and selectively kill any flows/states that are bound to the non-primary WAN. Killing states with precision like that requires installing the conntrack package with opkg, but the script takes care of that automatically. It also does some somewhat nasty AWK parsing on the nf_conntrack table to get the connection details, which works but sure isn’t optimal.
I’m happy to share the script on GitHub if you want to try it, or we can wait a bit more for Teltonika to get back to me.
I don’t know if you can disable just the SIP ALG, but the way I disabled all conntrack helpers is at Network → Firewall → General → Automatic helper assignment (off)
@Marija
Can you or somebody else from Teltonika please confirm that this is the solution for the aforementioned problem?
EDIT: I do not have that menu that @luckman212 shows in his firewall screenshot.
I am still struggling with this. Despite my RUTX11 is connected to my home Wifi and this wifi1 network is on highest priority, I still see mobile data (mob1s1a1) consumed:
As a side note, why is the data flow for mob1s1a1 per day and per week shown as MB/h? Shouldn’t it be MB/d for the week, or is the same scale intention?
Below you can see that my wifi1 is working in parallel. It shows upload and download, but - see above - also via mob1s1a1. And this should not be the case, right?
I observed that when I logged into my RUTX11 via the Android App that the failover settings were shown as disabled, so enabled them (again). I am 100% sure that I always have failover enabled and wifi1 on first place and mob1s1a1 on second place (nothing else).
Are you aware of situations where failover might get disabled? For which reasons?
The problem with mobile data consumption despite having wifi on first prio for failover is still a severe problem to my opinion, as it consumes expensive data plans unnecessarily.
I have 2 scripts to help with this problem, one that I created myself which is the only one that I have tested “long term” and another one that was shared with my by Teltonika via the support ticket.
@Marija Is it allowed for me to post both of these scripts as an offer to see if it helps others having this issue?
@Marija
Are you or is somebody else from Teltonika still working on this problem?
I just observed that again my RUTX11 is on mobile data instead of using its Wifi connection (wifi1).
The reason: Network > Failover > Multiwan shows that the status of “wifi1” as “offline”. Consequently, “mob1s1a1” has the status “online” (else, it would be “idle”):
Neither my RUTX11 nor my other router (an AVM FRITZ!Box 7490) have been moved around. They see each other’s Wifi network and are also connected together via Wifi. As they have different subnets, I connected them via WireGuard, and here is the problem:
The WireGuard status on my FB7490 shows that they are currently (since this morning) not connected via WG anymore. It seems that at some point in time, the WG connection drops/disrupts, leading to Wifi failover and switching to mobile data.
I have that cronjob running that was mentioned elsewhere in this forum that should ensure that a dropped WG connection is re-established, but this does not seem to work.
I am really lost.
PS:
After restarting my RUTX11, its failover mechanism switched back to Wifi (via WireGuard) without further interventions or problems. I wish it would auto-reboot in case of WG problems. This would probably also solve the problem with the undesired use of mobile data.
To assist in resolving the issue you are experiencing, we kindly ask you to provide a troubleshoot file. This will help us analyze the potential causes of the problem.
Please let us know if you can share the troubleshoot file so we can proceed further.
Apart from that, I found this warning under Network > Wireless > SSIDs:
Trying to find the access point
An access point was found but connection or authentication failed. The client will try to establish the connection again. If this occurs during authentication first check the password.
Last disconnection reason
Reason code 1: UNSPECIFIED
and then:
Currently not searching for the access point
An access point was found but connection or authentication failed. The client will try to establish the connection again. If this occurs during authentication first check the password.
Last disconnection reason
Reason code 1: UNSPECIFIED
Then the first warning appears again.
I have not changed the WiFi password.
@Mike
Thanks for your suggestion. I will look into this, too.