RMS API / Remote access outage

Hi,

There seem to been a RMS (API / Remote WebUI and Device LCI) outage currently that seems to have start around 11:28 EST. All are field device doesn’t received the command that we are sending them througth the RMS API, also the remote WebUI and Device LCI request timeout for all our device.

Could you advice of any current outage and ETA for resolution ?

thanks,

1 Like

Same Issue here

I already tried to call Teltonika but nobody answered.

ETA would be good, as I also mentioned that all our Devices are currently not responding despite showing online.

Its not the first time, after the major outtage when they had Maintenance im very nervous about this Issue again…

Same issue here. WebUI times out for all connected devices

Anyone remember how long it took to get functionality back after “last time”?

@GPGC24
The last big outtage was ab 20 hours long. Which I really hope that will NOT happen again and this is a different Situation…

Same for me - can’t connect to any of our RUT200’s - get timeout message

Do these outages happen often?

Updated:
Seems the last happened on Nov 24th so not long ago…

## Summary of the Incident
Unexpected complications during planned maintenance led to extended downtime. Server updates caused connection issues with devices, and despite efforts, these could not be resolved within a reasonable timeframe. The system was reverted, though connection problems persisted, resulting in approximately 24 hours of downtime, with devices gradually reconnecting over the next 72 hours.

Back online now for me - yay

Yup, seems to be working for me again too. Hopefully this isn’t a regular thing for them…

Hello Everyone,

Current Status
We are pleased to inform you that the RMS system and its services have been fully restored. Our team is actively monitoring the systems to ensure continued stability and performance.


Incident Summary

Issue: On 2024-12-18, between 16:00 and 18:00 UTC, users experienced “Timeout” errors while performing actions within the RMS platform or using the RMS API. Affected operations included firmware upgrades, backup uploads, retrieving monitoring data, and generating new remote access links.

Root Cause: The incident was caused by a memory leak in one of the system applications. This led to an unexpected overload of a key virtual machine, which serves as the foundation for all dependent system components. As a result, services and operational applications froze, preventing the system from processing new actions.


Actions Taken and Next Steps

  • The main virtual machine and associated services were promptly restarted, restoring RMS functionality.
  • We contacted AWS, as such VM RAM overload should not have been possible.
  • Our team is actively monitoring memory usage across the infrastructure to identify and address the root cause of the memory leak.
  • We have added a temporary measure to restart relevant services should server would be running out of memory.
    Impact in case of such an issue: action may succeed not from 1st, but on second try.

We sincerely apologize for any inconvenience caused and appreciate your patience and understanding as we work diligently to enhance the resilience of our systems and minimize future disruptions