My main system is down!

Message boards : Number crunching : My main system is down!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908145 - Posted: 20 Dec 2017, 12:50:30 UTC

My Threadripper/Triple ProDuo system is down. It had been unstable since I had a mishap while installing the latest 17.50 GPU drivers. I forgot to uninstall 17.40 and ended up in a situation with unmet dependencies. Forced removed and couldn't recover. I decided to get a second m.2 ssd and installed the OS on it and used the original m.2 ssd as /home. I was able to recover my original home directory so everything was cool. But when I got the system back up, it was unstable and has become more unstable in the last few days. It is an Asrock X399 Professional Gaming MB. I was running with a mild OC, 3.675GHz@1.1625V with flat LLC (water cooled), so I really don't think I degraded anything. Now it even freezes after a few minutes of being idle. I removed half the memory and then tried the other half, with no improvement. I have tried old GPU drivers and previous MB BIOS version with no improvement. I get occasional error 55 or 94 when booting. Seems like the only change was adding the second m.2. I was considering to replace the MB with the ROG Zenith Extreme. My main desktop is running this one fully loaded 24/7 with no issues. Any recommendations on other things to try before replacing the MB?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908145 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1908152 - Posted: 20 Dec 2017, 14:10:56 UTC - in response to Message 1908145.  

That really sucks!
ID: 1908152 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1908157 - Posted: 20 Dec 2017, 15:08:36 UTC - in response to Message 1908145.  

Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores.
ID: 1908157 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1908160 - Posted: 20 Dec 2017, 15:44:34 UTC

Overclocking with a Asrock MB isn`t the best idea.
I always had trouble with them.
Tried a Extreme 3 and Extreme 4 with my FX back then but never was happy until i bought my Sabertooth.
Never had such a good Mobo and now with my Crosshair VI Hero its the same.
Rock stable.


With each crime and every kindness we birth our future.
ID: 1908160 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1908161 - Posted: 20 Dec 2017, 15:46:54 UTC

Have you looked over at https://forums.overclockers.co.uk/ there was some mention of problems regarding M.2 ssd's with Threadripper systems.
Kevin


ID: 1908161 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908272 - Posted: 21 Dec 2017, 4:00:30 UTC - in response to Message 1908145.  

That sucks Rick. The other good place for TR support are the threads at Overclock.net. I read the overclocking TR thread every day. First I would review the syslog and dmesg log in /var/logs and see what is happening in the timestamps for right before it crashes. And then fall back to basic troubleshooting, remove all unnecessary hardware to bare minimum. One hard drive, one stick of memory, one video card and see if is stable, then start adding things back in one at a time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908272 · Report as offensive
whill44

Send message
Joined: 3 Apr 99
Posts: 6
Credit: 27,227,985
RAC: 40
United States
Message 1908276 - Posted: 21 Dec 2017, 4:20:29 UTC

Sorry to hear about the problems Rick. I can't think of any good suggestions to help, your equipment choices are beyond my abilities ;^). I hope you get it figured out soon. I would also like to say as a subscriber I miss your videos, any chance you might be posting them again?
ID: 1908276 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908283 - Posted: 21 Dec 2017, 5:06:31 UTC - in response to Message 1908160.  

Overclocking with a Asrock MB isn`t the best idea.
I always had trouble with them.
Tried a Extreme 3 and Extreme 4 with my FX back then but never was happy until i bought my Sabertooth.
Never had such a good Mobo and now with my Crosshair VI Hero its the same.
Rock stable.


Hi Mike, I also prefer Asus. My Crosshair VI and Zenith Extreme are rock solid. Even my Crosshair V is still going. The BIOS is also much more flexible in OC, but I do like that Asrock has a flat LLC option, Not sure how to interpret the LLC options for Asus. Also, just to be clear, all of my post issue stability tests are at default settings. Just mentioned the OC in case it was a factor in the degradation.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908283 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908284 - Posted: 21 Dec 2017, 5:08:41 UTC - in response to Message 1908272.  

That sucks Rick. The other good place for TR support are the threads at Overclock.net. I read the overclocking TR thread every day. First I would review the syslog and dmesg log in /var/logs and see what is happening in the timestamps for right before it crashes. And then fall back to basic troubleshooting, remove all unnecessary hardware to bare minimum. One hard drive, one stick of memory, one video card and see if is stable, then start adding things back in one at a time.


Hi Keith, thanks for the recommendation on the logs. I will copy them off the machine before I dismantle it. I have already decided to get the Zenith Extreme. Hope it arrives tomorrow. We are getting a cold snap and need the system up to help warm my home!
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908284 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908285 - Posted: 21 Dec 2017, 5:14:19 UTC - in response to Message 1908157.  

Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores.


Thanks for the recommendations Tbar! Now it is crashing within a few minutes of boot up when not even loaded. I am hoping that I can power it up and copy the logs off of it. I have already ordered the Zenith Extreme MB, so I am not going to put a lot more work into this. When I bring the system up on the new MB, do you think it is safe to just boot off the m.2 SSD from the original build or best to do a clean install? Also, I noticed you are running boincmgr 7.8.4. Should I consider to upgrade to it when I get the system back up?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908285 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908286 - Posted: 21 Dec 2017, 5:16:13 UTC - in response to Message 1908161.  

Have you looked over at https://forums.overclockers.co.uk/ there was some mention of problems regarding M.2 ssd's with Threadripper systems.


Thanks for the recommendation. I will check it out. I have 2 m.2 ssd's on my main desktop based on Zenith Extreme with no issues. Maybe the issues are limited to specific MBs.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908286 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908289 - Posted: 21 Dec 2017, 5:18:14 UTC - in response to Message 1908276.  

Sorry to hear about the problems Rick. I can't think of any good suggestions to help, your equipment choices are beyond my abilities ;^). I hope you get it figured out soon. I would also like to say as a subscriber I miss your videos, any chance you might be posting them again?


I hope to get back to the channel in January. My daughter is back from university in the US for the holidays, so I am not going to think about it until she leaves. Enjoying family time for the holidays.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908289 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908294 - Posted: 21 Dec 2017, 6:33:26 UTC - in response to Message 1908284.  

Rick I have two ASUS Prime X370 mobos. They are rock solid. I also think the ASUS BIOS layout is top notch. For zero droop, ASUS uses LLC4 or LLC5 for the Prime. If I remember correctly in the Zenith TR thread, LLC8 is the no droop setting.
This is the thread I read and has a ton of information in it. ASUS ROG Zenith Extreme X399 ThreadRipper Overclocking / Support
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908294 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908295 - Posted: 21 Dec 2017, 6:35:06 UTC - in response to Message 1908285.  



Thanks for the recommendations Tbar! Now it is crashing within a few minutes of boot up when not even loaded. I am hoping that I can power it up and copy the logs off of it. I have already ordered the Zenith Extreme MB, so I am not going to put a lot more work into this. When I bring the system up on the new MB, do you think it is safe to just boot off the m.2 SSD from the original build or best to do a clean install? Also, I noticed you are running boincmgr 7.8.4. Should I consider to upgrade to it when I get the system back up?

I think the BOINC 7.8.4 was specifically to fix issues with Macs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908295 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908296 - Posted: 21 Dec 2017, 6:40:12 UTC - in response to Message 1908286.  



Thanks for the recommendation. I will check it out. I have 2 m.2 ssd's on my main desktop based on Zenith Extreme with no issues. Maybe the issues are limited to specific MBs.

There was one test interim BIOS that messed with dual M.2 SSD's but I believe it was fixed quickly in the next. The biggest issue is that AMD is slow in releasing BIOS code to the mobo vendors. On the Zenith TR, they are only up to AGESA 1.0.4.1 right now. We Ryzen users are at least up to 1.0.6 and 1.0.7 is imminent.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908296 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1908318 - Posted: 21 Dec 2017, 13:32:27 UTC - in response to Message 1908285.  

Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores.


Thanks for the recommendations Tbar! Now it is crashing within a few minutes of boot up when not even loaded. I am hoping that I can power it up and copy the logs off of it. I have already ordered the Zenith Extreme MB, so I am not going to put a lot more work into this. When I bring the system up on the new MB, do you think it is safe to just boot off the m.2 SSD from the original build or best to do a clean install? Also, I noticed you are running boincmgr 7.8.4. Should I consider to upgrade to it when I get the system back up?

If the problem is being caused by an incompatibility with the Storage devices, I wouldn't trust any data on the Storage devices. I would revert back to the configuration that worked previously and install a new system. In most cases it would be helpful if your Home folder were on a different partition or device, but in this particular case that wouldn't help. I would make the change in the next system though, make at least three partitions before installing and use separate partitions for swap, home, and system. I have a few system partitions with the same home and swap partitions. The 7.8.4 version is labeled as a Mac only release, https://github.com/BOINC/boinc/releases/tag/client_release%2F7.8%2F7.8.4
ID: 1908318 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908539 - Posted: 23 Dec 2017, 2:09:13 UTC

I spent the day yesterday upgrading from the Asrock X399 Professional to the Asus Zenith Extreme (photos at rpc_labs@IG). It is up and running with a stable OC 3.75GHz@1.2V&66C. At 3.8GHz temps got to 70C, so I backed it off. It has run overnight, so stability issues are gone. It is using the exact same CPU/Memory/NVMe/GPUs, but with one exception. I had trouble booting the system with everything installed. Wouldn't boot with all 8 memory dims and all 3 ProDuos. If I disabled 2 of the GPUs, system would boot and run fine. If I removed 4 dimms and had all GPUs installed, it boots and runs fine. My overnight results were with half the memory. Last night I discovered I can set the boot up DRAM voltage which I plan to try today. Not sure why the default is 1.15V for 1.2V memory. I had changed the memory voltage in BIOS early on, but I had not tried the boot up voltage. I also wanted to try disabling IOMMU, but I could not find the option. There is a BIOS setting to disable using both CPUs for this, but could not find the option to completely disable it. Common system boot problem is looping on memory and cpu in the BIOS or ending with error code 92 concerning NVRAM.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908539 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908548 - Posted: 23 Dec 2017, 3:06:12 UTC - in response to Message 1908539.  

Rick, from what I have been reading ..... everybody has had difficulty with all DIMMs populated. And no fix in sight until AMD releases newer AGESA code to the mobo vendors.

I believe the IOMMU toggle is in Zen/Common Options or something like that.

Definitely use the boot up voltage for the RAM and also you can increase the number of retries for memory training.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908548 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1908581 - Posted: 23 Dec 2017, 7:29:39 UTC - in response to Message 1908548.  

Rick, from what I have been reading ..... everybody has had difficulty with all DIMMs populated. And no fix in sight until AMD releases newer AGESA code to the mobo vendors.

I believe the IOMMU toggle is in Zen/Common Options or something like that.

Definitely use the boot up voltage for the RAM and also you can increase the number of retries for memory training.


My experience has been different, but I am not trying to OC memory. I have two TR4 builds. First is Zenith Extreme with 8 each 16G dimms of ECC memory. Never had a problem with it. Second build on the Asrock system, I have nearly identical memory (same Hynex Chips) but 8G single rank instead if 16G dual rank. No issues at all with that (except recent stability issues). Now I am moving it the a new Zenith MB. I have update the OLED/Aura firmware and update to the latest beta BIOS 0901. Still having problems with 4 or 8 dimms, but I found a way to bypass and get it to boot. Key is to disable 2 of the 3 graphics cards, remove MB power and battery, & clear CMOS. It will but to BIOS setup, then load my profile, save and reboot. When it gets to Linux login, shutdown. Enable all 3 GPU cards and reboot. The error I get when not doing this is Q Code 92, NVRAM Error. I suspect it is a BIOS issue.

I found the option to disable IOMMU, but the description indicates disable means it will only use 1 of the 2 CPUs for IOMMU.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1908581 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1908646 - Posted: 23 Dec 2017, 19:23:30 UTC - in response to Message 1908581.  

Sorry, I misunderstood you. Lots of issues with the OLED/Aura software from all the posts. One new version breaks something and next version fixes it only to break something else. I am at my limit with suggestions as I don't have an actual TR system. You will have to post questions to the Zenith Extreme forum threads for specific help.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1908646 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : My main system is down!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.