System Crash - Strange Results

Message boards : Number crunching : System Crash - Strange Results
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1862190 - Posted: 18 Apr 2017, 15:15:49 UTC

Of course, I am on another business trip and one of my systems crashed while I am away. Typically this happens due to power loss or Windows updates, which are hopefully under better control now. But this time it is only my Penta-Nano system that is down. https://setiathome.berkeley.edu/results.php?hostid=7953787

The strange thing is that yesterday (17-Apr), its credit earned was more than double the normal daily earnings. This was after several days of normal earnings. Here is the chart: https://boincstats.com/en/stats/0/host/detail/7953787/charts

Today it got a few computation errors before being non-responsive to SETI. This system has been stable for many months with no reboots or tweaks (with the exception of a windows update a few weeks ago)

Has there been a change in WU types or a new type of WU that could have stressed the system in a way not seen before? I am still away from the system, so I can only speculate about what happened...
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1862190 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1862200 - Posted: 18 Apr 2017, 15:44:27 UTC

Nothing changed related to seti Rick.


With each crime and every kindness we birth our future.
ID: 1862200 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1862203 - Posted: 18 Apr 2017, 15:48:16 UTC - in response to Message 1862200.  

Nothing changed related to seti Rick.


Thanks for the confirmation. I will know more when I get back to the system on Friday. It definitely looks strange from here.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1862203 · Report as offensive
Profile Shaggie76
Avatar

Send message
Joined: 9 Oct 09
Posts: 282
Credit: 271,858,118
RAC: 196
Canada
Message 1862221 - Posted: 18 Apr 2017, 16:32:37 UTC

I've been struggling with a system that locks up -- Nvidia GPUs though -- when I get them I find the machine still running with maybe half of the usual crunching power-load, but no bluescreen, mouse & keyboard don't wake it up, and it doesn't respond to pings. I've only gotten something in event manager after reboot a few times. I've recently re-enabled TDR detection to see if I can get more diagnostics but generally this has been unhelpful.

What are the symptoms of your crash?
ID: 1862221 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1862338 - Posted: 19 Apr 2017, 13:24:48 UTC - in response to Message 1862221.  

I've been struggling with a system that locks up -- Nvidia GPUs though -- when I get them I find the machine still running with maybe half of the usual crunching power-load, but no bluescreen, mouse & keyboard don't wake it up, and it doesn't respond to pings. I've only gotten something in event manager after reboot a few times. I've recently re-enabled TDR detection to see if I can get more diagnostics but generally this has been unhelpful.

What are the symptoms of your crash?


I am actually away from my systems now, so I only can speculate based on what I see from my account here and BOINC Stats. Looks like the system actually has not crashed. All tasks are getting computation errors with 8K seconds of GPU time with no response. This has been happening for 2 days. I won't know for sure what has happened until I can get back to the system on Friday.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1862338 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1862414 - Posted: 19 Apr 2017, 22:46:54 UTC - in response to Message 1862338.  

You can stop it from getting work via the web prefs.

Move it to it's own venue, if it's not already, then disable GPU downloads.
ID: 1862414 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1862679 - Posted: 21 Apr 2017, 11:20:18 UTC

I finally got back to the system. It was still producing computation errors, but was unresponsive to VNC, so I had to reboot to gain access. Worked fine on reboot. Confirmed that there were no auto updates that caused the problem. I don't plan to investigate further, since I will rebuild this based on Ryzen this weekend. So my host Server01 will be retired and the Penta-Nano block will move to Nexon, which will be Ryzen 1700 based.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1862679 · Report as offensive

Message boards : Number crunching : System Crash - Strange Results


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.