RUTX50 07.04.5 netifd coredump

Hello,

This is to follow for the same issue in the old forum. Same backtrace:

root@lgr5g:/etc/config# gdb netifd /tmp/netifd.1691111243.1984.11.core 
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-openwrt-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netifd...
(No debugging symbols found in netifd)
[New LWP 1984]
Core was generated by `/sbin/netifd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00021680 in ?? ()
(gdb) bt
#0  0x00021680 in ?? ()
#1  0x0002934c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

I’ll compile a netifd with debug information to see what is going on.

Regards,

I have compiled netifd with debug info:

make package/network/config/netifd/compile V=sc TARGET_OPTIMIZATION="-ggdb3 -O0" STRIP="/bin/true"

Now the backtrace is a little more helpful:

Core was generated by `/sbin/netifd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00021680 in wireless_device_hotplug_event (add=65, name=0x34322e31 <error: Cannot access memory at address 0x34322e31>)
    at /home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/wireless.c:1571
1571	/home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/wireless.c: No such file or directory.

Line 1571 of wireless.c contains:

        vlist_for_each_element(&wireless_devices, wdev, node) {

and wireless_device_hotplug_event(const char *name, …) is called with an invalid argument, here name = 0x34322e31.
The same is true for the caller : device_hotplug_event(const char *name, …)
As the stack appears to be corrupt it isn’t easy to go further except trying to infer the name of a plausible caller.
netifd_handle_dev_hotplug() seems to be a good candidate, it contains one local variable named tb which an array, manipulating it without enough precautions can corrupt the memory around it.
Next step: add traces there.

Next coredump for netifd compiled with debug information:

(gdb) bt full
#0  device_release (dep=0xa21c78)
    at /home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/device.c:560
        dev = 0x34322e31
        __func__ = <error reading variable>
#1  0x00029498 in interface_flush_state (iface=0xa21bb0)
    at /home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/interface.c:281
No locals.
#2  0x0002d394 in interface_main_dev_cb (dep=0xa21c30, ev=<optimized out>)
    at /home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/interface.c:437
        iface = 0xa21bb0
#3  0x0001f8b8 in device_broadcast_cb (ctx=<optimized out>, list=<optimized out>)
    at /home/fl/rutos-ipq40xx-rutx-gpl/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/netifd-2022-01-12-5ca5e0b4/device.c:485
        dep = <optimized out>
        ev = <optimized out>
        __mptr = <optimized out>
        __mptr = <optimized out>
#4  0xb6eb33fc in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) print *dep
$1 = {list = {list = {next = 0x63616672, prev = 0x67772065}, i = 0xa736c74}, claimed = false, hotplug = 97, alias = 109, 
  ev_idx = <error reading variable>, dev = 0x34322e31, cb = 0xa312e38}
(gdb) 

So the root cause is a list corruption.

This topic was automatically closed after 15 days. New replies are no longer allowed.