some days ago I did some IPSec throughput iperf3 tests between two RUTX50 which were connected by their ETH-WAN interfaces to a switch.
Iperf3 was either used directly on the RUTX50s or on linux LAN clients.
RUTX50 had firmware 7.11.3.
My goal was identifiing a sweet spot between security and max. throughput as this will be our usecase.
Depending on phase 2 settings IPSec throughput varied seriously, which is understandable as different crypto algorithms need different computation ressources.
What left me confused was, I always got best results when using AES128-CBC, not -GCM.
E.g. I could get ~100 Mbit/s in both directions during an hour-long test without a speed dropdown with AES12-CBC/SHA256/ECP384.
But whenever choosing any AES-GCM mode, results always were much worse (~ 45-50 Mbit/s).
That’s odd, because I think I remember, some time ago in the old community forum Teltonika claimed AES-GCM being best choice (can’t find posts any more)?
Second I couldn’t find any differences in my tests using different offload settings (software offload, hardware offload, ipsec offload).
I repeated my tests with an OpnSense gateway on one side instead (same StrongSwan version), resulting pretty much the same.
Can some Teltonika technician give some hints which in theory should be “best”, please?
Might there be some problems inside Teltonika firmware?
There might be some confusion with the feature when comparing speeds using iperf3 on the device, which is why you don’t see any difference. IPsec software flow offload, like other flow offloading features, only works in forwarding scenarios, for example, when the iperf3 client is running from a LAN device instead of directly on the router.
Additionally, there are a few partially supported options, like “UDP encapsulation” and “Route-based IPsec”, which might result in a smaller performance difference.
I also want to mention that RUTX GCM speeds will improve with the RutOS 7.13 release, and they should be similar to AES-CBC speeds. Additionally, 7.13 will introduce support for the ChaCha20-Poly1305 algorithm, which should provide even better performance.
I hope this clarifies things! Let me know if you have any questions.