PC Build for my Dad

Message boards : Number crunching : PC Build for my Dad
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1912945 - Posted: 14 Jan 2018, 8:45:45 UTC - in response to Message 1912944.  

Some answers ...
- It has been up for 21h now since the PSU replacement with 6 new errors.
- 99% of the time it is just a freeze up, normal screen, nothing logged, no sign of what happened. I think it only rebooted once on it's own.
- LOL, I changed to the Gigabyte board when you said you didn't like the B350, and by the video you sent me I picked the GB K7 with was said to be a good OC board.
- I haven't noted any difference if stability really between 3200/3333. I wish I has a Windows machine with DDR4 to read the RAM, but I don't. I would have to clone my Win8.1 and run it offline, but my backup drive is kind of tied up with my Aunt's Mac backups at the moment.
- I have seen the SIGSEGV errors on Ubuntu 16, 14 and Mint 17. Mint 17 would be the only one that had the newest drivers released this week. The Mint 17 is a new install just turned 4 days old on SSD ... I killed my USB Ubuntu 16/14 kernel files whenever I try to push 3.9G. They can probably be repaired ... but oh well, no biggie. I was thinking it might be something to do with USB sticks, but no.
- I don't think it is NVidia drivers, since it carried on thru 3 different OS's, 2 without the latest updates. My guess would be hardware.

Yes, sorry about the misdirect. I had a bias toward all B350 boards because all that I had seen reviewed were given D grades because of their inadequate VRM designs and heat sinks. Not a board for any R7 was the early opinion. Later in the threads, I did in fact learn that there were a couple of decent B350 boards and the forum members gave their blessings, as long as you didn't go crazy on the clock or power requirements. I've been reading lately in the memory clocking threads posts of members who are having trouble getting any good clocks on the Gigabyte boards. There are a lot of variations in the naming and I honestly can't keep them straight. So I don't know where your K7 stands. Believe there have been lots of new BIOS releases in the past month.

Nvidia-settings is shipped and not part of the Nvidia drivers. It is a standalone product with release numbers and release schedules not tied to the base video drivers. I never got the error until I picked up an update of nvidia-settings through the graphics-drivers PPA and got the 391.22 nvidia-settings while I was still on the 367 Nvidia graphics drivers from the main repository. I had backed out of that release and fell back to the version that was closest to the main 384 driver and the errors seemed to have disappeared. The current one I'm on showed up the day after the security update to the kernel which moved me from 4.10 to 4.13. I don't think it coincidence that I am getting the errors again now that nvidia-settings has moved back into the 390 release branch.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1912945 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1912947 - Posted: 14 Jan 2018, 8:51:11 UTC - in response to Message 1912938.  


...just to throw-in some ideas...
On Linux, you basically run BOINC as one user under one account (client app), and the manager is a different program giving direction to the client. I recall trying-out windows BOINC a few years ago, and it basically ran as screensaver. If the windows BOINC is a screensaver app, and you have multiple user accounts, could it be possible to create ghost accounts?
The other possibility that may be worth checking is if the computers are named the same, or if they have different host names. The SETI server might be confusing similar computers as the same machine (since all Jeyl's computers are probably seen as on the same router IPaddress).

Definitely something to ask Theo the next time we talk to see if the computers have the same name. Also to ask what they have for a router since Grant mentioned that all the hosts on that account have ghosts, even the Macs. So looking into a router or internet connection that is inadequate for the amount of traffic generated by all the hosts might be the cause.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1912947 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1912948 - Posted: 14 Jan 2018, 9:05:52 UTC - in response to Message 1912945.  

Ahh gotcha, never really paid attention to the SMI version number there ...
NVIDIA-SMI 384.111 Driver Version: 384.111
These are the latest for Mint 17.3
ID: 1912948 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1912952 - Posted: 14 Jan 2018, 9:14:39 UTC - in response to Message 1912947.  

I think the default is 8 for file transfers ...
<max_file_xfers>4</max_file_xfers>
<max_file_xfers_per_project>4</max_file_xfers_per_project>
I use 4 for my little 5Mb connection, which does handle 3 or 4 computers at once, but slooowwww if they all hit at once.
I try to get their timing offset then it's fine. I never get errors.
The router might not be able to handle 10ish concurrent from multiple computers.
ID: 1912952 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1912953 - Posted: 14 Jan 2018, 9:16:09 UTC - in response to Message 1912948.  

Yes, that is the version (384.111) that was installed along with the security update. When I rebooted the machine, I had pages of errors in the startup (I don't use --quiet splash) complaining about the new drivers. So I decided to see what was current in the repository and upgraded the nvidia drivers to the 390.12 release. Fixed the startup errors. But I may have reacquired the Sigsev errors.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1912953 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1913055 - Posted: 14 Jan 2018, 19:50:05 UTC - in response to Message 1912944.  
Last modified: 14 Jan 2018, 19:50:27 UTC

I don't know if I can be of any help, but I have had some stability problems on my own Ryzen system, that I now seem to have resolved (fingers crossed).

I run a Ryzen 1700 on a ASUS tuf b350m-plus gaming with 16Gb DDR3200 ram (G.Skill F4-3200C14-8GFX). The mem is guaranteed for 3200 with Ryzen, and has been completely unproblematic. My graphics card is a RX480 (and a HD7770).

Today I run the Ryzen at 3800Mhz, on a watercooled loop where it sits at 60 - 66 degrees C. I could probably push it more, but it really starts to want a lot of core voltage, and I dont want to push the motherboard too hard.
But the problems below all happened running at stock too, with much lower temps.

I sort of had 3 different things happen.

At first I had a black screen freeze, where the system turned of the monitor, and the system was unresponsive to my input. I could see some HDD activity though. This would happen completely random. Sometimes the system ran a few hours, other times more than a day.
Following some thread in a forum, I found a guy that said turning of all C-states for the CPU in BIOS, had solved the crashes for him. It seemed to help for me too, but I cant be 100% certain, since I installed a new BIOS a few days later, that could be the fix too. But I havent seen these crashes since turning of C-states for my CPU.

After this I had some system resets. Working at the system, the system would suddenly reboot. No warning of any kind.
I went ahaed and turned up the load line calibration for the core voltage, and the system has not reset itself since then.

The last crash I had, was harder to fix. Working at the system, or sometimes being away from it, the screen would freeze, with a constant picture. I couldn't move the mouse, the system was completely unresponsive. Reset / power toggle was the only way to restart.
This one really bugged me. But while working with my graphics card I noticed, that the core voltage to the GPU was lower than what the driver requested (I used HWiNFO64 to see these things). Consistently about 0.03V lower actually.
That doesnt seem like much.
But I could see 1.144V being requested (with voltage control in the driver at automatic), and the measured voltage was 1.110V. I changed the setting to manual voltage. That specified a 1.163 GPU core voltage instead, and now the core voltage is measured at 1.138. This has completely removed the crashes.

I cant claim 100% stability yet, since I made the last changes friday evening, but the system hasn't crashed since then.
ID: 1913055 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913066 - Posted: 14 Jan 2018, 20:43:09 UTC

Thanks for the feedback Karsten. I just had another automatic update to the 390.12 drivers, so maybe the crash reports on nvidia-settings made them rework the drivers or something. I 've seen the mention of not running with C-states enabled before and I think I tried it with no differences seen. I'd have to reboot and look in the BIOS and see what is set. I don't remember.

In my experience with Ryzen, black-screen crashes always point to not enough Vcore.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913066 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1913143 - Posted: 15 Jan 2018, 7:49:55 UTC - in response to Message 1913066.  
Last modified: 15 Jan 2018, 7:50:22 UTC

Your comment about black-screen crashes are very much in line with what I've read elsewhere.
Just wanted to mention the C-state thing, in case that could help. Cant hurt for him to try, just don't get the hopes up too much.

My system was, speaking of it, crashed with the black screen this morning....

So stabilizing the GFX gard seems to have brought back the other bug. Or rather it was probably allways there, the GFX card just crashed first :)

I have upped the core voltage for the 1700 with 0.05 Volts, time will tell if thats enough. I'm not worried about temperatures, theres plenty of headroom in the cooling setup.
I would like to keep the computer somewhat queit, which is off course made harder by me wanting to overclock everything :)
ID: 1913143 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913152 - Posted: 15 Jan 2018, 9:37:39 UTC - in response to Message 1913143.  

I have no qualms using the available load-line calibration to correct for the voltage droop on Vcore and Vsoc. That allows me to set reasonable voltages without having to set way high voltages for idle conditions so that the voltage is adequate for fully loaded BOINC conditions after voltage droop. I only use one tick (0.00625V) of positive offset on the default 1.35V to get the 1800X to run at 3.95 Ghz. Using LLC4 or 5, I can't remember. LLC3 or LLC4 on the Vsoc with again one tick of positive offset on the default 1.10V. Power phase delivery to Extreme and 130% for both voltages.

I have the Ryzens in a spare bedroom and don't hear their noise unless I go into that room. I have the fans cranked to max to keep them cool.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913152 · Report as offensive
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 239
Credit: 25,201,931
RAC: 11
Denmark
Message 1913154 - Posted: 15 Jan 2018, 9:57:10 UTC - in response to Message 1913152.  

I dont have any qualms either.

I dont have exactly the same settings as you, but am running at the next highest setting for the core, and standard for the SOC.

And my core voltage is now at 1,25V, so there's room for more there. Going beyond 3.8Ghz, starts to demand much more voltage...

Perhaps I'll try to up the SOC a little (its at 1.1V). Although I'm running the mem in spec, it is a overclocked setting when you look at the system data.

My system is in a room where I do lots of other stuff, and is also my main computer for doing "real" work, so I want it to be quiet.
ID: 1913154 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1913471 - Posted: 17 Jan 2018, 11:53:36 UTC

Well my Ryzen has been up for 4d-1h since the PSU swap out, possibly that cured that problem. But only long term will tell for sure.

I wonder about that C-state mentioned. I occasionally (rarely) see my CPU use drop to 0 for a few seconds, then pick right back up. This might be the cause of the errors I'm getting. Maybe ...
ID: 1913471 · Report as offensive
pavlos

Send message
Joined: 5 Apr 03
Posts: 29
Credit: 90,415,610
RAC: 249
United States
Message 1913510 - Posted: 17 Jan 2018, 15:21:54 UTC - in response to Message 1894345.  

He could install Linux Mint ... AMD FX-8350, matching m/b, 8GB ram, and a video card, GTX 960.
ID: 1913510 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913893 - Posted: 19 Jan 2018, 2:28:00 UTC - in response to Message 1913471.  

Well my Ryzen has been up for 4d-1h since the PSU swap out, possibly that cured that problem. But only long term will tell for sure.

I wonder about that C-state mentioned. I occasionally (rarely) see my CPU use drop to 0 for a few seconds, then pick right back up. This might be the cause of the errors I'm getting. Maybe ...

I checked both of my Ryzens. I have apparently turned off C-states in the BIOS. I guess I figured it was not necessary since the PC's are always fully 100% loaded at all times.

Still getting the nvidia-settings error once every day with the sigsegv fault. I send off the error report each time. Haven't seen anything announcing work being done in that area by the developers yet. I wonder if my error reports are every getting read.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913893 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1913924 - Posted: 19 Jan 2018, 4:29:48 UTC

I finally found the crash log. /var/log/crash Duh! I was trying to find out just what was being sent in the error log when I acknowledged the error. There hasn't been any noticeable issue with the system other than the error message on the desktop every morning. The system stays running without problems other than the message. The very end of the crash report said I had dependencies in some packages that were out of sync. The problem with nvidia-settings is being caused by gpu fan control and I do fan control via Jeff's app which is python based. Guess what? The python modules needed updating. Done and done. Will see if the error message disappears tomorrow morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1913924 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1914709 - Posted: 23 Jan 2018, 9:48:12 UTC

Just an update of ongoing runs, I'm really going to have to clone my Win8.1 drive to use some tools on it.
I made a quick script to log the uptime every minute, so I can tell how long it has been running.
- BIOS moved to defaults to test mem speeds.
- Mem rate not recorded ...
Thu Jan 11 13:59:55 CST 2018
3400 19m
@Thu Jan 11 14:23:18 CST 2018
3400 4h 16
Thu Jan 11 21:23:16 CST 2018
3400 8h 42
Fri Jan 12 06:28:21 CST 2018
3400 21h 41 - Shutdown
- Back to 3333MHz
- NEW PSU
Sat Jan 13 04:51:05 CST 2018
3700 4d 1h 21 *** best so far
Wed Jan 17 12:21:06 CST 2018
3800 5h 54
Thu Jan 18 00:44:51 CST 2018
3750 9h 05
Thu Jan 18 09:41:30 CST 2018
3700 18h 31
Fri Jan 19 04:32:06 CST 2018
3700 9h 44
Fri Jan 19 16:35:29 CST 2018
3700 22h 37
Sat Jan 20 18:18:16 CST 2018
3700 2d 9h ... and going
ID: 1914709 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1914738 - Posted: 24 Jan 2018, 7:44:13 UTC - in response to Message 1914709.  

I have been sending in the crash reports dutifully but I discovered today that none of them are being looked at. And none of the information in the report is useful. I found out that the nvidia-settings I am using is from the ppa and only the version from the official Ubuntu archive can have a bug report logged against it.

I still had the sigsegv fault in the nvidia-settings program. So just live with the error message or ignore it and set it to not report. I can't see any effect the crash has on the system other than the message.

My troubles today are coming from a new BIOS update that I installed on both Ryzens. Been trying to find stable settings again as my previous settings are not stable anymore.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1914738 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8

Message boards : Number crunching : PC Build for my Dad


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.