Every few hours I get "ERROR:network_service_instance_impl.cc(916)] Network service crashed, restarting service"

TvE

@sdetweil Yep (but I get exposed to that type of behavior daily :-) )

I have also found the DEBUG setting in the conf file:

logLevel: ["INFO", "LOG", "WARN", "ERROR", "DEBUG"], // Add "DEBUG" for even more logging

And am tcpdumping the traffic as well

sudo tcpdump -i wlan0 -w tcpdump-wlan.pcap && sudo tcpdump -i eth0 -w tcpdump-eth.pcap &

So let’s see what a bisect approach to disabling the “services” leads to

sdetweil

@tve yeh, sorry on debug. lots of calendar stuff

TvE

@sdetweil Don’t worry about “debug” I just had to read the config in stead of skipping over that part

I have now reverted to my original config and will enable the different service in bisect style.

/*
TvE test log (on legacy OS)
Re-configured services resulting in crash:
  1 calendar
  2 openweather_current
  3 openweather_forecast
  4 news

A: Sample config
    => 11h and still OK

B: Enable DEBUG log level

C: Disable 2+3 (openweather) Enable 1+4
    => 14 h and still OK

D: Enable 2+3 (openweather) + disable 1+4 (news & calendar)
    => ongoing
*/

So in a few days I should know what’s triggering the issue…

TvE

Hmm so my latest config ran for more than 2 days.

I’m now reverting to the initial config (enabling all four config-changes) and then time will tell

TvE

@sdetweil This is really getting intersting - I have now re-enabled all the config changes so I’m back to my initial config and have been running for four days without a problem.

I even reproduced the problem on two differerent physical Pi’s (to eliminate HW) before starting to dissect the config…

So for now I can only conclude that something external was triggering these crashes!

I can now go back to the latest OS and start adding more stuff to my config (and keep my fingers crossed that I no longer encounter what-ever it was…)

Thank you for your help!

sdetweil

@tve interesting. keep us informed…

thgmirror

@TvE Do you use some wires at the GPIO? For example a PIR-sensor? Than may be interferences causes the issue when an electronic device is around your PI.

Or it is the temperature? Check it with
vcgencmd measure_temp

TvE

@sdetweil Will do!

@thgmirror Nope no PIR sensor (as in “not yet”)
My setup is as basic “Hello World!” as it can be (IMHO)
(The full config can be seen - as a diff to the sample - earlier in this thread)

The pi have been placed horisontally on my desk during all this testing.
It is in a cabinet though (the one from the “startet kit”).

I have measured the temperature (with my hand) and not been alarmed
Now I can validate my assumption based on your tip, thanx!

pi@mirror:~ $ vcgencmd measure_temp
temp=63.4'C

I have read that it’s possible to get add a cooling fin to the CPU, which is something I’ll look in to later (as I am also working on making a replacement for this guy, an image grabber [1].
You can no longer purchase it (and I can make more captures pr second on my pi) for 1/2 the price, I only need to add this thingy [2]
I have HelloWorld! for this, I “just” need to add

Capture to RAM disk in stead of USB
Auto upload to an ftp share (or maybe change this to scp…)
Maybe get the pi’s to netboot as well (as I need 10+ of the system)

But that’s completely irellevant to my MM problem ;-)

[1 ]https://www.epiphan.com/wp-content/uploads/2014/07/epiphan-vga2ethernet-brochure.pdf
[2] https://www.epiphan.com/products/avio-4k/

thgmirror

@tve 63.4 C? Could be less!
You can generate a simple protocol by using
crontab -e and enter

@reboot echo “---------- start ----------” >> proto.txt
*/5 * * * * date >> proto.txt
1-56/5 * * * * vcgencmd measure_temp >> proto.txt

which will write a timestamp and a celsius amount every 5 minutes…and an eyecatcher at every boot. Look after a while into the file.

TvE

@thgmirror A cron based measuring is a good approach I’ll keep that in mind if I start to suspect temperature (or when I perform a load test my grabber-PI).

For now I do not think it can be temperature related as it’s very reliable around 60 degrees centigrade

pi@mirror:~ $ vcgencmd measure_temp
temp=59.6'C

sdetweil

@tve but see his post

https://forum.magicmirror.builders/topic/16485/temperature-of-a-rpi3

60 is bad if u can do 30!

TvE

@sdetweil said in Every few hours I get "ERROR:network_service_instance_impl.cc(916)] Network service crashed, restarting service":

@tve but see his post

https://forum.magicmirror.builders/topic/16485/temperature-of-a-rpi3

60 is bad if u can do 30!

Great post(!)

(And here I thought that 60 was the normal/expected working temperature)

I think I’ll add a passive solution (or maybe even an active - but then there is a fan that will die)
(I have some other - rather expensive - equipment that died due to a dead fan, and unfortunately there was no way to measure the temperature - bad design…)

I see these two obvious options:

The active solution might even add enough cooling even when the fan dies (as it seems that there is a passive “path for the heat” from the CPU
I should try both :-)

What is to gain?:
I am aware that the warmer the chip the shorter it’s life
(I have complained to a vendor that one of their products got so warm that I burn myself if I touch the enclosure, but they insist that it’s expected - even though I have had to get at least three of my nine devices replaced!
I think it’s due to a bad design - hence my heat complaint…)

The rule of thumb is something like this:

By increasing the device temperature by just 10°C, we have reduced the lifetime by over 2x

Sources:
https://jetcool.com/post/semiconductor-lifetime-how-temperature-affects-mean-time-to-failure-device-reliability/

https://www.electronics-cooling.com/2017/08/10c-increase-temperature-really-reduce-life-electronics-half/#:~:text=If one were to ask,reduces component life by half”.

BTW: I have not seen any more network-service-instance-crashes, it’s still going strong after 8 days, really weird (but good!)…

TvE

@thgmirror

63.4 C? Could be less!

I have now added one of these [1] to my RPI 3 and returned it to the enclosure (the “starter kit cabinet”).

The temperature is now 48-49 degrees centigrade after 30 min usage of MM.
So it reduces the temperature with ~14K - thats nice (if you don’t mind listening to a spinning fan which is actually fairly loud in at least 2 meters distance)…!

I’ll try passive cooling later…

Oh! - and I had a 14 day uptime before powering off (the original reason for this thread :-)

[1] https://uk.rs-online.com/web/p/raspberry-pi-hats-add-ons/2020449

sdetweil

@TvE said in Every few hours I get "ERROR:network_service_instance_impl.cc(916)] Network service crashed, restarting service":

Oh! - and I had a 14 day uptime before powering off (the original reason for this thread

nice!!

TvE

@sdetweil Yeah - that’s perfect (and expected)
I really have (now stopped) wondering what happened initially :-)

@thgmirror
I made a quick test where I have removed the power to the fan to see how much that changes the temperature (aka a passive cooling test).

After ca. one hour the temperature rises to a max of 57-58 degrees centigrades
For the next hour the temperature is not rising any more
Thus a reduction of ~6K

NB.: I have now removed the lid to see how big (small) a difference that makes

TvE

So - without the lid I get another 2K as - after 15 min - the temperature stabilized around 54-55 degrees centigrades

* Passive cooling lid  on : 63 -> 58 =  -5
* Passive cooling lid off : 63 -> 55 =  -8
* Active  cooling lid  on : 63 -> 49 = -14
* Active  cooling lid off : 63 -> 42 = -21

Ivanov_d

@TvE I am in the same spot as you.

Raspberry Pi 3b+
Fresh installed Raspbian Bullseye 11
Fresh install of MM with vanilla config and modules - crashes every minute and restarts itself. Are you saying that the config might be the problem?

0|MagicMirror  | Launching application.
0|MagicMirror  | [18725:0322/102613.376542:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
0|MagicMirror  | [18766:0322/102613.700219:ERROR:sandbox_linux.cc(376)] InitializeSandbox() called with multiple threads in process gpu-process.
0|MagicMirror  | [22.03.2022 10:26.16.088] [LOG]   
0|MagicMirror  | Create new calendarfetcher for url: http://www.calendarlabs.com/ical-calendar/ics/76/US_Holidays.ics - Interval: 300000
0|MagicMirror  | [22.03.2022 10:26.16.175] [LOG]   
0|MagicMirror  | Create new newsfetcher for url: https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml - Interval: 300000
0|MagicMirror  | [22.03.2022 10:26.16.248] [INFO]  
0|MagicMirror  | Checking git for module: default
0|MagicMirror  | [22.03.2022 10:26.17.014] [INFO]  
0|MagicMirror  | Newsfeed-Fetcher: Broadcasting 61 items.
0|MagicMirror  | [22.03.2022 10:26.17.522] [INFO]  
0|MagicMirror  | Calendar-Fetcher: Broadcasting 10 events.
0|MagicMirror  | [22.03.2022 10:27.33.823] [LOG]   Shutting down server...
0|MagicMirror  | [22.03.2022 10:27.33.833] [LOG]   Stopping module helper: updatenotification
0|MagicMirror  | [22.03.2022 10:27.33.835] [LOG]   Stopping module helper: calendar
0|MagicMirror  | [22.03.2022 10:27.33.848] [LOG]   Stopping module helper: newsfeed
0|MagicMirror  | [18692:0322/102733.909781:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
0|MagicMirror  | [18692:0322/102733.942607:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
0|MagicMirror  | [18692:0322/102733.958362:ERROR:network_service_instance_impl.cc(916)] Network service crashed, restarting service.
0|MagicMirror  | [18692:0322/102733.974458:ERROR:gpu_process_host.cc(962)] GPU process launch failed: error_code=1002
0|MagicMirror  | [18692:0322/102733.982862:ERROR:gpu_process_host.cc(962)] GPU process launch failed: error_code=1002
0|MagicMirror  | [18692:0322/102733.989821:ERROR:gpu_process_host.cc(962)] GPU process launch failed: error_code=1002
0|MagicMirror  | [18692:0322/102733.998527:ERROR:gpu_process_host.cc(962)] GPU process launch failed: error_code=1002
0|MagicMirror  | > magicmirror@2.18.0 start
0|MagicMirror  | > DISPLAY="${DISPLAY:=:0}" ./node_modules/.bin/electron js/electron.js
0|MagicMirror  | [22.03.2022 10:27.42.385] [LOG]   
0|MagicMirror  | Starting MagicMirror: v2.18.0
0|MagicMirror  | [22.03.2022 10:27.42.407] [LOG]   
0|MagicMirror  | Loading config ...
0|MagicMirror  | [22.03.2022 10:27.42.421] [LOG]   
0|MagicMirror  | Loading module helpers ...
0|MagicMirror  | [22.03.2022 10:27.42.427] [LOG]   
0|MagicMirror  | No helper found for module: alert.
0|MagicMirror  | [22.03.2022 10:27.42.463] [LOG]

thgmirror

@TvE That is my experimental setup:

TvE

@Ivanov_d
Maybe we’re observing the same - currently it’s hard for me to tell.

You surely have no heat issue (no additional cooling is needed - only to prolong the life of the RPI)
OS
As you hopefully have read in this thread I started with the same OS as you, then “downgraded” to the socalled “legacy” version and still saw the issues. then it suddenly stopped.
Thus I need to replace to the latest OS and retest

I have no logical reason to suspect the OS but I need to do the test to be 100% certain…

You can follow my test steps (using bisection - remove the handfull of external calls and see which triggers the problem)
My guess
So far I can only conclude that something external provoked the network stack to crash.
In the end of my testing () I had returned to my original config and no longer see problems
That something must relate to one of the two services I was using (part of the vanilla config)

UPDATE!!! - I just looked at my test and wow, this morning I had an error again (my test runs with the monitor turned off, thus I did not see this (I’m working on other tasks at the moment…).

Looking in the logs I see calendar related issues a few hours before the service stopped

My error occurred at [21.03.2022 22:39.02.219] in Denmark
Your error occurred at [22.03.2022 10:27.33.823] in US? (guess deduced from your config)
That way too close to just be a coincidence…

I don’t know for how long you have been running your system, but I will now start to suspect that the root cause lies in the calendar service and we both are affected by the same root cause

I wonder if we can find another source for those data

(I now regret that I turned of tcpdump for my latest testing…)

stdout

[21.03.2022 22:34.06.084] [INFO]  Newsfeed-Fetcher: Broadcasting 20 items.
[21.03.2022 22:39.02.219] [LOG]   Shutting down server...
[21.03.2022 22:39.02.225] [LOG]   Stopping module helper: updatenotification
[21.03.2022 22:39.02.226] [LOG]   Stopping module helper: calendar
[21.03.2022 22:39.02.227] [LOG]   Stopping module helper: newsfeed

err

[21.03.2022 16:43.52.620] [WARN]  ^[[33mYou're using a full whitelist configuration to allow for all IPs^[[39m
[1002:0321/164356.019064:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
[1040:0321/164356.860589:ERROR:sandbox_linux.cc(376)] InitializeSandbox() called with multiple threads in process gpu-process.
[21.03.2022 17:37.30.487] [ERROR] Calendar Error. Could not fetch calendar:  http://calendars.icloud.com/holidays/dk_da.ics FetchError: request to https://calendars.icloud.com/holidays/dk_da.ics failed, reason: connect ETIMEDOUT 17.248.150.146:443
    at ClientRequest.<anonymous> (/home/pi/MagicMirror/node_modules/node-fetch/lib/index.js:1483:11)
    at ClientRequest.emit (node:events:394:28)
    at TLSSocket.socketErrorListener (node:_http_client:447:9)
    at TLSSocket.emit (node:events:394:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  type: 'system',
  errno: 'ETIMEDOUT',
  code: 'ETIMEDOUT'
}
[21.03.2022 17:44.40.539] [ERROR] Calendar Error. Could not fetch calendar:  http://calendars.icloud.com/holidays/dk_da.ics FetchError: request to https://calendars.icloud.com/holidays/dk_da.ics failed, reason: connect ETIMEDOUT 17.248.150.10:443
    at ClientRequest.<anonymous> (/home/pi/MagicMirror/node_modules/node-fetch/lib/index.js:1483:11)
    at ClientRequest.emit (node:events:394:28)
    at TLSSocket.socketErrorListener (node:_http_client:447:9)
    at TLSSocket.emit (node:events:394:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  type: 'system',
  errno: 'ETIMEDOUT',
  code: 'ETIMEDOUT'
}
[958:0321/223902.258167:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
[958:0321/223902.279012:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote

Ivanov_d

@TvE it’s been running for 25 hours so far:

pm2 status MagicMirror
┌───────────────┬────┬─────────┬──────┬──────┬────────┬─────────┬────────┬─────┬──────────┬──────┬──────────┐
│ App name      │ id │ version │ mode │ pid  │ status │ restart │ uptime │ cpu │ mem      │ user │ watching │
├───────────────┼────┼─────────┼──────┼──────┼────────┼─────────┼────────┼─────┼──────────┼──────┼──────────┤
│   MagicMirror │ 0  │ 2.18.0  │ fork │ 2911 │ online │ 15      │ 25h    │ 0%  │ 2.7 MB   │ pi   │ disabled │
└───────────────┴────┴─────────┴──────┴──────┴────────┴─────────┴────────┴─────┴──────────┴──────┴──────────┘

I am still tweaking my config, however, my approach was the following:

I updated from Rapbian Buster to Bullseye
That broke my Raspberry Pi 3b+'s WiFi connection (seems like a known bug https://forums.raspberrypi.com/viewtopic.php?t=325484), so I removed connman package and installed network-manager and connected to the WiFi. If you install network-manager, you will also want to disable the MAC randomization:
To disable the WiFi MAC randomization, create the following file using the this command:

sudo nano /etc/NetworkManager/conf.d/100-disable-wifi-mac-randomization.conf

and paste the following content inside:

[connection]
wifi.mac-address-randomization=1
[device]
wifi.scan-rand-mac-address=no

Save the changes and continue.

I backed up my MM config and modules folder and started with a fresh install of MM 2.18
I started MM with the default config - no luck, it was crashing every minute
I started removing the default modules and installing the ones that I had before
I was observing the same behavior as you did - the default modules that needed network connection, when fetching data were breaking the network stack somehow … I edited the config to enter my calendar and newsfeed URLs and that seemed to fixed the issue. It might be something in the response of the default calendar and/or newsfeed … but that is just my speculation.

I also have pulled all the latest updates for every module that I had installed previously just to make sure that I am all up to date.

Ever since as you can see, my MM has been running steadily. Definitely not an SoC temp issue, since I never had anything than the stock aluminum radiators on the Pi and it has been running for 3-4 years 24/7 already.

I wouldn’t have decided to upgrade to bullseye if it was not a requirement for the new version of MMM-GoogleAssistant.

Anyways, I hope that this helps.

P.S. I am in Sofia, so the date/time in my log is GMT +02:00