Message boards :
Number crunching :
My main system is down!
Message board moderation
Author | Message |
---|---|
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
My Threadripper/Triple ProDuo system is down. It had been unstable since I had a mishap while installing the latest 17.50 GPU drivers. I forgot to uninstall 17.40 and ended up in a situation with unmet dependencies. Forced removed and couldn't recover. I decided to get a second m.2 ssd and installed the OS on it and used the original m.2 ssd as /home. I was able to recover my original home directory so everything was cool. But when I got the system back up, it was unstable and has become more unstable in the last few days. It is an Asrock X399 Professional Gaming MB. I was running with a mild OC, 3.675GHz@1.1625V with flat LLC (water cooled), so I really don't think I degraded anything. Now it even freezes after a few minutes of being idle. I removed half the memory and then tried the other half, with no improvement. I have tried old GPU drivers and previous MB BIOS version with no improvement. I get occasional error 55 or 94 when booting. Seems like the only change was adding the second m.2. I was considering to replace the MB with the ROG Zenith Extreme. My main desktop is running this one fully loaded 24/7 with no issues. Any recommendations on other things to try before replacing the MB? GitHub: Ricks-Lab Instagram: ricks_labs |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
That really sucks! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Overclocking with a Asrock MB isn`t the best idea. I always had trouble with them. Tried a Extreme 3 and Extreme 4 with my FX back then but never was happy until i bought my Sabertooth. Never had such a good Mobo and now with my Crosshair VI Hero its the same. Rock stable. With each crime and every kindness we birth our future. |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Have you looked over at https://forums.overclockers.co.uk/ there was some mention of problems regarding M.2 ssd's with Threadripper systems. Kevin |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
That sucks Rick. The other good place for TR support are the threads at Overclock.net. I read the overclocking TR thread every day. First I would review the syslog and dmesg log in /var/logs and see what is happening in the timestamps for right before it crashes. And then fall back to basic troubleshooting, remove all unnecessary hardware to bare minimum. One hard drive, one stick of memory, one video card and see if is stable, then start adding things back in one at a time. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
whill44 Send message Joined: 3 Apr 99 Posts: 6 Credit: 27,227,985 RAC: 40 |
Sorry to hear about the problems Rick. I can't think of any good suggestions to help, your equipment choices are beyond my abilities ;^). I hope you get it figured out soon. I would also like to say as a subscriber I miss your videos, any chance you might be posting them again? |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Overclocking with a Asrock MB isn`t the best idea. Hi Mike, I also prefer Asus. My Crosshair VI and Zenith Extreme are rock solid. Even my Crosshair V is still going. The BIOS is also much more flexible in OC, but I do like that Asrock has a flat LLC option, Not sure how to interpret the LLC options for Asus. Also, just to be clear, all of my post issue stability tests are at default settings. Just mentioned the OC in case it was a factor in the degradation. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
That sucks Rick. The other good place for TR support are the threads at Overclock.net. I read the overclocking TR thread every day. First I would review the syslog and dmesg log in /var/logs and see what is happening in the timestamps for right before it crashes. And then fall back to basic troubleshooting, remove all unnecessary hardware to bare minimum. One hard drive, one stick of memory, one video card and see if is stable, then start adding things back in one at a time. Hi Keith, thanks for the recommendation on the logs. I will copy them off the machine before I dismantle it. I have already decided to get the Zenith Extreme. Hope it arrives tomorrow. We are getting a cold snap and need the system up to help warm my home! GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores. Thanks for the recommendations Tbar! Now it is crashing within a few minutes of boot up when not even loaded. I am hoping that I can power it up and copy the logs off of it. I have already ordered the Zenith Extreme MB, so I am not going to put a lot more work into this. When I bring the system up on the new MB, do you think it is safe to just boot off the m.2 SSD from the original build or best to do a clean install? Also, I noticed you are running boincmgr 7.8.4. Should I consider to upgrade to it when I get the system back up? GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Have you looked over at https://forums.overclockers.co.uk/ there was some mention of problems regarding M.2 ssd's with Threadripper systems. Thanks for the recommendation. I will check it out. I have 2 m.2 ssd's on my main desktop based on Zenith Extreme with no issues. Maybe the issues are limited to specific MBs. GitHub: Ricks-Lab Instagram: ricks_labs |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Sorry to hear about the problems Rick. I can't think of any good suggestions to help, your equipment choices are beyond my abilities ;^). I hope you get it figured out soon. I would also like to say as a subscriber I miss your videos, any chance you might be posting them again? I hope to get back to the channel in January. My daughter is back from university in the US for the holidays, so I am not going to think about it until she leaves. Enjoying family time for the holidays. GitHub: Ricks-Lab Instagram: ricks_labs |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Rick I have two ASUS Prime X370 mobos. They are rock solid. I also think the ASUS BIOS layout is top notch. For zero droop, ASUS uses LLC4 or LLC5 for the Prime. If I remember correctly in the Zenith TR thread, LLC8 is the no droop setting. This is the thread I read and has a ton of information in it. ASUS ROG Zenith Extreme X399 ThreadRipper Overclocking / Support Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I think the BOINC 7.8.4 was specifically to fix issues with Macs. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
There was one test interim BIOS that messed with dual M.2 SSD's but I believe it was fixed quickly in the next. The biggest issue is that AMD is slow in releasing BIOS code to the mobo vendors. On the Zenith TR, they are only up to AGESA 1.0.4.1 right now. We Ryzen users are at least up to 1.0.6 and 1.0.7 is imminent. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Have you tried running it for a while with a basic system to see if it still crashes? You can make a USB Boot system with a different OS, say 17.04, using this image http://releases.ubuntu.com/zesty/, and see if it still gives errors. I think you can even run the benchmark App while running the basic system to add a little strain. Make a few different copies of the benchmark folder by changing the name and you can load a few cores. If the problem is being caused by an incompatibility with the Storage devices, I wouldn't trust any data on the Storage devices. I would revert back to the configuration that worked previously and install a new system. In most cases it would be helpful if your Home folder were on a different partition or device, but in this particular case that wouldn't help. I would make the change in the next system though, make at least three partitions before installing and use separate partitions for swap, home, and system. I have a few system partitions with the same home and swap partitions. The 7.8.4 version is labeled as a Mac only release, https://github.com/BOINC/boinc/releases/tag/client_release%2F7.8%2F7.8.4 |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
I spent the day yesterday upgrading from the Asrock X399 Professional to the Asus Zenith Extreme (photos at rpc_labs@IG). It is up and running with a stable OC 3.75GHz@1.2V&66C. At 3.8GHz temps got to 70C, so I backed it off. It has run overnight, so stability issues are gone. It is using the exact same CPU/Memory/NVMe/GPUs, but with one exception. I had trouble booting the system with everything installed. Wouldn't boot with all 8 memory dims and all 3 ProDuos. If I disabled 2 of the GPUs, system would boot and run fine. If I removed 4 dimms and had all GPUs installed, it boots and runs fine. My overnight results were with half the memory. Last night I discovered I can set the boot up DRAM voltage which I plan to try today. Not sure why the default is 1.15V for 1.2V memory. I had changed the memory voltage in BIOS early on, but I had not tried the boot up voltage. I also wanted to try disabling IOMMU, but I could not find the option. There is a BIOS setting to disable using both CPUs for this, but could not find the option to completely disable it. Common system boot problem is looping on memory and cpu in the BIOS or ending with error code 92 concerning NVRAM. GitHub: Ricks-Lab Instagram: ricks_labs |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Rick, from what I have been reading ..... everybody has had difficulty with all DIMMs populated. And no fix in sight until AMD releases newer AGESA code to the mobo vendors. I believe the IOMMU toggle is in Zen/Common Options or something like that. Definitely use the boot up voltage for the RAM and also you can increase the number of retries for memory training. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
RueiKe Send message Joined: 14 Feb 16 Posts: 492 Credit: 378,512,430 RAC: 785 |
Rick, from what I have been reading ..... everybody has had difficulty with all DIMMs populated. And no fix in sight until AMD releases newer AGESA code to the mobo vendors. My experience has been different, but I am not trying to OC memory. I have two TR4 builds. First is Zenith Extreme with 8 each 16G dimms of ECC memory. Never had a problem with it. Second build on the Asrock system, I have nearly identical memory (same Hynex Chips) but 8G single rank instead if 16G dual rank. No issues at all with that (except recent stability issues). Now I am moving it the a new Zenith MB. I have update the OLED/Aura firmware and update to the latest beta BIOS 0901. Still having problems with 4 or 8 dimms, but I found a way to bypass and get it to boot. Key is to disable 2 of the 3 graphics cards, remove MB power and battery, & clear CMOS. It will but to BIOS setup, then load my profile, save and reboot. When it gets to Linux login, shutdown. Enable all 3 GPU cards and reboot. The error I get when not doing this is Q Code 92, NVRAM Error. I suspect it is a BIOS issue. I found the option to disable IOMMU, but the description indicates disable means it will only use 1 of the 2 CPUs for IOMMU. GitHub: Ricks-Lab Instagram: ricks_labs |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Sorry, I misunderstood you. Lots of issues with the OLED/Aura software from all the posts. One new version breaks something and next version fixes it only to break something else. I am at my limit with suggestions as I don't have an actual TR system. You will have to post questions to the Zenith Extreme forum threads for specific help. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.