PC Build for my Dad

Message boards : Number crunching : PC Build for my Dad
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912719 - Posted: 13 Jan 2018, 3:46:26 UTC - in response to Message 1912718.  

Yeah, that is what I am thinking I need to do. I asked in a PM whether there might have been an earlier BOINC install in a custom location or something. An issue is that Theo, (the son) and who I am dealing with is not the owner (Ted the Dad) and hasn't been present. I would need Ted to be available so he could login to his account for re-attaching to the project.

I think I need to start completely from scratch.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912719 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10316
Credit: 138,590,657
RAC: 82,755
Australia
Message 1912732 - Posted: 13 Jan 2018, 5:26:40 UTC

Has the mechanism for Ghost creation ever been figured out?
Before when the systems were seriously over committed and under performing they were each producing ghosts, although from memory not at the same time. I notice that another of their systems also has a bunch of Ghosts (the one with the Vega64).
Possibly a modem/router issue? It's now struggling with the load of those 2 ThreadRipper systems?
Grant
Darwin NT
ID: 1912732 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912738 - Posted: 13 Jan 2018, 5:42:49 UTC - in response to Message 1912732.  
Last modified: 13 Jan 2018, 5:50:46 UTC

No, the 1950X only had the expected 400 tasks after my Tuesday session. I looked at it yesterday and it still only had 400 assigned task, no ghosts. I didn't look at it today though before my session with the 1900X.

[Edit] I just looked at the 1900X and it looks like the app_config tunings I applied are being used now. Maybe Theo rebooted the computer and the app_config file finally got read.

Also see that the number of Tasks in Progress is almost exactly 400 greater at 9936 than the 9376 or whatever I started the TeamViewer session with.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912738 · Report as offensive
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2390
Credit: 270,715,931
RAC: 670,869
Canada
Message 1912755 - Posted: 13 Jan 2018, 7:22:46 UTC - in response to Message 1912738.  
Last modified: 13 Jan 2018, 7:26:21 UTC

A thought ... it's being run with different user logins which cause a reload/ignore of client_state, etc. Could be it's being run from the wrong user account and not 'All Users'
EDIT: Also might explain why Grant(?) mentioned it was on Lunatics, nope it's not, yes it is ... every time they switched logins ???
ID: 1912755 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912758 - Posted: 13 Jan 2018, 7:50:42 UTC - in response to Message 1912755.  

I never thought of that. I wonder if Theo is logging in under his account when I am TeamView'ing with him on the phone. And when he leaves, his Dad logs in under his account.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912758 · Report as offensive
Profile Stargate (S.A.) Project Donor
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1641
Credit: 558,352
RAC: 1,271
Australia
Message 1912759 - Posted: 13 Jan 2018, 7:55:31 UTC

Possible
ID: 1912759 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10316
Credit: 138,590,657
RAC: 82,755
Australia
Message 1912760 - Posted: 13 Jan 2018, 8:00:38 UTC

And the Ghosts on their other system?
Grant
Darwin NT
ID: 1912760 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912763 - Posted: 13 Jan 2018, 8:35:34 UTC - in response to Message 1912760.  

Up until today, I have never seen ghosts on the 1950X Host 8371071

Every time I looked at it, it had the correct 400 Tasks in Progress. So I never worried about ghosts on that host. I only worried about the wrong applications being used and the overloaded cpu with the lousy task completion times. I fixed the apps and the overloaded condition in Tuesday's session.

Today's session on the 1900X Host 8389828 was mainly to get rid of the ghosts and choose the correct apps and eliminate the overloaded cpu condition. I wasn't successful on the ghosts obviously.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912763 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10316
Credit: 138,590,657
RAC: 82,755
Australia
Message 1912765 - Posted: 13 Jan 2018, 8:46:46 UTC - in response to Message 1912763.  

Up until today, I have never seen ghosts on the 1950X Host 8371071

Every time I looked at it, it had the correct 400 Tasks in Progress. So I never worried about ghosts on that host. I only worried about the wrong applications being used and the overloaded cpu with the lousy task completion times. I fixed the apps and the overloaded condition in Tuesday's session.

Today's session on the 1900X Host 8389828 was mainly to get rid of the ghosts and choose the correct apps and eliminate the overloaded cpu condition. I wasn't successful on the ghosts obviously.

I was mentioning the other, other system. The Apple one with the Vega 64 video card that has ghosts as well.
Whatever is happening, is affecting more than just the ThreadRipper systems, although they seem the be the most affected when it occurs.
Hence my suspicion of their Modem/router as none of the systems in question at this stage are over committed (which the ThreadRippers were before at various times).
I just find it odd that there is another system with Ghosts, and there are Ghosts again on systems that had the usual Ghost producing performance issues resolved.
Grant
Darwin NT
ID: 1912765 · Report as offensive
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2390
Credit: 270,715,931
RAC: 670,869
Canada
Message 1912766 - Posted: 13 Jan 2018, 8:49:12 UTC

I just wish I could keep my Ryzen running!!! The damn thing runs from 15m to a few days then crashes. Right now it's on stock 3400/3200Mhz RAM and 20h up. Last boot it was 15 min at the same :((( And it always throws a handful of errors whenever it gets the mood to.
I just wish it was reliably unreliable with different settings so I could figure it out. Getting darn tempted to pull the Gigabyte K7 and put in the cheaper ASUS B350 ROG ... I want to put my 1080s in it, but not too productive if it won't frigging run. The other computer is rock solid stable.
ID: 1912766 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10316
Credit: 138,590,657
RAC: 82,755
Australia
Message 1912769 - Posted: 13 Jan 2018, 8:52:41 UTC - in response to Message 1912766.  
Last modified: 13 Jan 2018, 8:52:57 UTC

I just wish I could keep my Ryzen running!!! The damn thing runs from 15m to a few days then crashes. Right now it's on stock 3400/3200Mhz RAM and 20h up. Last boot it was 15 min at the same :((( And it always throws a handful of errors whenever it gets the mood to.

Mem test, different PSU? CPU temps? Any BIOS updates addressing current issues?
Grant
Darwin NT
ID: 1912769 · Report as offensive
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2390
Credit: 270,715,931
RAC: 670,869
Canada
Message 1912771 - Posted: 13 Jan 2018, 9:21:47 UTC - in response to Message 1912769.  

Don't think it's temp, 60C (I think) w/120 AIO which is only luke warm o/p. 140mm front and back on board just to make sure, backplate is cold.
RAM has been fine for any 2h tests I've done, was going to try it for a day or more and see.
Best I got was 6d with original K4 BIOS @ 3.7/3333 (I think) I need to recheck BIOS1 setting w/K4. I'm on BIOS2 now with the updated K10 BIOS.
It's a new 1200W PSU with ~225 load, but yea I guess it's never been tested yet as known good ...
ID: 1912771 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10316
Credit: 138,590,657
RAC: 82,755
Australia
Message 1912772 - Posted: 13 Jan 2018, 9:41:54 UTC - in response to Message 1912771.  
Last modified: 13 Jan 2018, 9:45:41 UTC

It's a new 1200W PSU with ~225 load, but yea I guess it's never been tested yet as known good ...

Known good would be a good start, or try it with a couple of video cards loaded up in there.
A PSU of that caliber might be OK with such a low load, but switchmode PSUs if not carefully designed for such low loads (ie less than 25%) can have stability issues.


EDIT- found this reference,
Most switched mode power supplies are regulated. So as the load is reduced the controller will reduce the pulse width and hence the duty cycle in an attempt to maintain the output voltage.

However as the load is further reduced the pulse width reaches the minimum that the controller can achieve. What happens with very small or zero loads depends on the design of the controller.
1.The controller may maintain the minimum pulse width and duty cycle and allow the output voltage to increase until something goes up in smoke.
2.The controller may maintain the minimum pulse width and duty cycleallow the output voltage to increase until an overvoltage protection circuit is triggered and shuts down the supply until reset.
3.The controller may maintain the minimum pulse width and duty cycle until a self-resetting overvoltage protection is triggered causing wild swings in output voltage as the supply repeatedly shuts down and starts back up.
4.The controller may increase the time between pulses. This allows overall voltage regulation to be maintained down to zero load but it means that the frequency of the output ripple depends on load. This can lead to noise problems both electrical and audible.

My experience is that most modern power supplies fall into category 4 but older designs (which are sometimes still sold) often fell into categories 2 or 3.

Another alternative is for the power supply vendor to build in a "dummy load" to avoid ever reaching the point where the power supply can't reduce the duty cycle any more but I expect that would only be done in specialist applications where output quality is more important than efficiency.

https://electronics.stackexchange.com/questions/80547/operating-a-switched-mode-power-supply-without-a-load
Grant
Darwin NT
ID: 1912772 · Report as offensive
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2390
Credit: 270,715,931
RAC: 670,869
Canada
Message 1912785 - Posted: 13 Jan 2018, 12:33:26 UTC - in response to Message 1912772.  
Last modified: 13 Jan 2018, 12:37:27 UTC

It's worth a try ... a known good 750W PSU, UPS not just surge, 3.7/3333Mhz, 65C w/2x120mm fans on AIO now.
1.75h and still up, will have to see ...
Shaved 4m off CPU tasks from 3.4-3.7Ghz, now @50m
Will return this thread back to its regular programming ...
ID: 1912785 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912827 - Posted: 13 Jan 2018, 16:51:15 UTC - in response to Message 1912766.  

I just wish I could keep my Ryzen running!!! The damn thing runs from 15m to a few days then crashes. Right now it's on stock 3400/3200Mhz RAM and 20h up. Last boot it was 15 min at the same :((( And it always throws a handful of errors whenever it gets the mood to.
I just wish it was reliably unreliable with different settings so I could figure it out. Getting darn tempted to pull the Gigabyte K7 and put in the cheaper ASUS B350 ROG ... I want to put my 1080s in it, but not too productive if it won't frigging run. The other computer is rock solid stable.

What kind of crashes? Do you get memory errors that can be looked at in Event Viewer or BlueScreenView that freeze the system or green screen? Or do you get black-screens where the the display is blank and unresponsive, leaves no traces or residues and needs to have the system rebooted with a reset or power down?

Anything that logs memory errors can be fixed with better memory or higher voltages on the memory. Black screens are ALWAYS a sign of inadequate Vcore which requires more cpu core voltage, a bump in Vsoc, higher LLC, better power delivery or all of the above.

That said, Gigabyte boards are high on the list of mentioned boards that have been uncooperative in the forums. The ASUS B350 ROG has pretty good reviews in the forums.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912827 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912830 - Posted: 13 Jan 2018, 16:55:16 UTC - in response to Message 1912771.  
Last modified: 13 Jan 2018, 16:57:21 UTC

Don't think it's temp, 60C (I think) w/120 AIO which is only luke warm o/p. 140mm front and back on board just to make sure, backplate is cold.
RAM has been fine for any 2h tests I've done, was going to try it for a day or more and see.
Best I got was 6d with original K4 BIOS @ 3.7/3333 (I think) I need to recheck BIOS1 setting w/K4. I'm on BIOS2 now with the updated K10 BIOS.
It's a new 1200W PSU with ~225 load, but yea I guess it's never been tested yet as known good ...

Back the memory off to 3200. Have you used any of the Ryzen memory tools? It's only the top 25% of forum users that are able to get stable memory clocks above 3200 Mhz. The main 50% of forum users can use 3200 with the latest BIOS' across all board manufacturers. The bottom 25% can only run 2400 or 2666 memory mainly because of poor memory choice.

Ryzen DRAM Calculator 0.9.9 v11
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912830 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912867 - Posted: 13 Jan 2018, 20:44:04 UTC - in response to Message 1912769.  

I just wish I could keep my Ryzen running!!! The damn thing runs from 15m to a few days then crashes. Right now it's on stock 3400/3200Mhz RAM and 20h up. Last boot it was 15 min at the same :((( And it always throws a handful of errors whenever it gets the mood to.

Mem test, different PSU? CPU temps? Any BIOS updates addressing current issues?

I just noticed the link in the post to the errored tasks. Sigill and Sigsev violations. The minimum Linux kernel that is mostly compatible with Ryzen is supposed to be 4.10. The only kernels certified 100% compatible with Ryzen is supposed to be > 4.14.

I am still getting occasional sigsev errors too on my Linux Ryzen. I had less on 4.10 kernel. I seem to be getting more now on the new security patched 4.13.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912867 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5525
Credit: 382,125,443
RAC: 1,017,086
United States
Message 1912936 - Posted: 14 Jan 2018, 4:36:24 UTC

I believe the Sigsev errors on my Linux box are caused by nvidia-settings. I have had the "Ubuntu has experienced an error" message window a few times and looking at the details of the report, nvidia-settings is the culprit of the Sigsev errors. That program just updated after the security flaw fix to 390.12. Didn't have the error when it was back at 384.98.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1912936 · Report as offensive
Profile I C U Special Project $75 donor
Avatar

Send message
Joined: 4 Apr 07
Posts: 4
Credit: 202,278
RAC: 137
Canada
Message 1912938 - Posted: 14 Jan 2018, 7:29:48 UTC - in response to Message 1912765.  

Keith wrote:
Up until today, I have never seen ghosts on the 1950X Host 8371071

Every time I looked at it, it had the correct 400 Tasks in Progress. So I never worried about ghosts on that host. I only worried about the wrong applications being used and the overloaded cpu with the lousy task completion times. I fixed the apps and the overloaded condition in Tuesday's session.

Today's session on the 1900X Host 8389828 was mainly to get rid of the ghosts and choose the correct apps and eliminate the overloaded cpu condition. I wasn't successful on the ghosts obviously.

Grant (SSSF) wrote:
I was mentioning the other, other system. The Apple one with the Vega 64 video card that has ghosts as well.
Whatever is happening, is affecting more than just the ThreadRipper systems, although they seem the be the most affected when it occurs.
Hence my suspicion of their Modem/router as none of the systems in question at this stage are over committed (which the ThreadRippers were before at various times).
I just find it odd that there is another system with Ghosts, and there are Ghosts again on systems that had the usual Ghost producing performance issues resolved.


...just to throw-in some ideas...
On Linux, you basically run BOINC as one user under one account (client app), and the manager is a different program giving direction to the client. I recall trying-out windows BOINC a few years ago, and it basically ran as screensaver. If the windows BOINC is a screensaver app, and you have multiple user accounts, could it be possible to create ghost accounts?
The other possibility that may be worth checking is if the computers are named the same, or if they have different host names. The SETI server might be confusing similar computers as the same machine (since all Jeyl's computers are probably seen as on the same router IPaddress).
ID: 1912938 · Report as offensive
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2390
Credit: 270,715,931
RAC: 670,869
Canada
Message 1912944 - Posted: 14 Jan 2018, 8:16:41 UTC

Some answers ...
- It has been up for 21h now since the PSU replacement with 6 new errors.
- 99% of the time it is just a freeze up, normal screen, nothing logged, no sign of what happened. I think it only rebooted once on it's own.
- LOL, I changed to the Gigabyte board when you said you didn't like the B350, and by the video you sent me I picked the GB K7 with was said to be a good OC board.
- I haven't noted any difference if stability really between 3200/3333. I wish I has a Windows machine with DDR4 to read the RAM, but I don't. I would have to clone my Win8.1 and run it offline, but my backup drive is kind of tied up with my Aunt's Mac backups at the moment.
- I have seen the SIGSEGV errors on Ubuntu 16, 14 and Mint 17. Mint 17 would be the only one that had the newest drivers released this week. The Mint 17 is a new install just turned 4 days old on SSD ... I killed my USB Ubuntu 16/14 kernel files whenever I try to push 3.9G. They can probably be repaired ... but oh well, no biggie. I was thinking it might be something to do with USB sticks, but no.
- I don't think it is NVidia drivers, since it carried on thru 3 different OS's, 2 without the latest updates. My guess would be hardware.
ID: 1912944 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : PC Build for my Dad


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.