Don't know where it should go? Stick it here!

Message boards : Number crunching : Don't know where it should go? Stick it here!
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 147 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1972653 - Posted: 30 Dec 2018, 17:57:54 UTC - in response to Message 1972539.  

[quote]Has anyone ever used anything like this MB https://www.newegg.com/Product/Product.aspx?Item=N82E16813145042 with something like these risers https://www.newegg.com/Product/Product.aspx?Item=N82E16812183021&Description=x16%20riser%20cable&ignorebbr=1&cm_re=x16_riser_cable-_-12-183-021-_-Product?


Here is a "state of the art" Server/Workstation. The article says its biggest problem is it might be too loud. It should be quieter if you use active cooling GPUs. Its supposed to run up to 4 GPUs with EYPC CPU.

https://www.servethehome.com/gigabyte-w291-z00-amd-epyc-gpu-tower-launched/

Tom
A proud member of the OFA (Old Farts Association).
ID: 1972653 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973525 - Posted: 5 Jan 2019, 2:09:58 UTC

Well that sucks.
4hrs 30min worth of work down the drain, and an incorrect result in the database because 2 AMD GPU systems validated against each other.
ap_30dc18ab_B2_P0_00200_20181231_15740.wu
Grant
Darwin NT
ID: 1973525 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973529 - Posted: 5 Jan 2019, 2:31:22 UTC

My bet is that the cpu app got it correct and not the ATI gpus.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973529 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973577 - Posted: 5 Jan 2019, 9:04:06 UTC - in response to Message 1973525.  

Well that sucks.
4hrs 30min worth of work down the drain, and an incorrect result in the database because 2 AMD GPU systems validated against each other.
ap_30dc18ab_B2_P0_00200_20181231_15740.wu

And then to top it all, I got mugged by a couple of Special Application machines.
02ja19ab.14289.2112.11.38.0
Pulse v Triplet classification.
Grant
Darwin NT
ID: 1973577 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973637 - Posted: 5 Jan 2019, 16:23:01 UTC - in response to Message 1973577.  

Well that sucks.
4hrs 30min worth of work down the drain, and an incorrect result in the database because 2 AMD GPU systems validated against each other.
ap_30dc18ab_B2_P0_00200_20181231_15740.wu

And then to top it all, I got mugged by a couple of Special Application machines.
02ja19ab.14289.2112.11.38.0
Pulse v Triplet classification.

Meaningless because the task was an early overflow and noisebomb and is irrelevant to science.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973637 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973673 - Posted: 5 Jan 2019, 19:21:31 UTC - in response to Message 1973648.  
Last modified: 5 Jan 2019, 19:38:10 UTC

Well, not exactly correct. Even early overflows, are put into the science database.

https://setiathome.berkeley.edu/forum_thread.php?id=83328&postid=1954272

Well, yes, true they do get put into the database . . . . but again irrelevant to science as the task is an "early overflow" They will be thrown out first as outliers in analysis.

[Edit] From Richard's quote in your link.
It is true that the very noisiest WUs, with the shortest run-time before overflow, are unlikely to contain much significant science, but the concept that *every* overflow is a worthless noise bomb has come from volunteers, not from the project scientists.

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973673 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973682 - Posted: 5 Jan 2019, 21:12:04 UTC - in response to Message 1973679.  

I'm sure I get a lot more invalid results from two SoG apps validating against my CUDA app simply by the sheer numbers of stock app hosts compared to the special CUDA app hosts on early overflows. I don't worry about it and it doesn't bother me at all if I lost 2 seconds of compute time. Again, the early overflow result is not going to contain any significant science. What counts is that I don't get invalids on normal tasks that contain actual real pulses, triplets and gaussians.

Every benchmark test run shows the special apps produce the proper exact result as the stock cpu app and have the same or better Strongly similar, Q quotient of 99.95%, than the SoG app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973682 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973702 - Posted: 5 Jan 2019, 23:30:54 UTC - in response to Message 1973637.  

Meaningless because the task was an early overflow and noisebomb and is irrelevant to science.

Given this is a science project, and the reference is the CPU application, all other applications should produce the same results.
Grant
Darwin NT
ID: 1973702 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973707 - Posted: 5 Jan 2019, 23:51:10 UTC - in response to Message 1973682.  

Again, the early overflow result is not going to contain any significant science.

And yet again- What is important is getting the right result- since this result isn't correct according to the reference, it is an issue- regardless of whether the result is considered of scientific use or not.

Every benchmark test run shows the special apps produce the proper exact result as the stock cpu app and have the same or better Strongly similar, Q quotient of 99.95%, than the SoG app.

Except for this instance.
The only time I get an Invalid on my GPUs is when the validation is between two other applications that are the same- the worst offenders were from Apple systems, then Intel iGPU, and occasionally the Special Application where it doesn't categorise the result the same as the SoG or CPU application.

And given that Invalids & Errors are used to judge the quality of output of a system it should be a goal of any application to produce a valid result (ie one that matches the reference application) regardless of the (presently perceived) scientific importance of the result.
I'm sure you would feel differently if we were to get a batch of noisy work through, and your systems were limited in how much work they could get because of all the Invalids that resulted from other systems providing the Canonical result?
Accuracy- definition. The degree to which the result of a measurement, calculation, or specification conforms to the correct value or a standard.

Accuracy is what is important- for all results, not just some of them.
Grant
Darwin NT
ID: 1973707 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973711 - Posted: 6 Jan 2019, 0:05:40 UTC

Interesting article from Techspot
Software fix can double Threadripper 2990WX's performance in certain workloads.
After the 16-core Threadripper 2950X launched with consistently great performance, AMD fans couldn’t have been more excited for the 32-core 2990WX… only to be disappointed when it finally released with far worse value and versatility. As suspected all along, if a new report can be fully confirmed, a bug in the Windows scheduler is halving 2990WX's performance and this could be fixable via software.

Grant
Darwin NT
ID: 1973711 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973717 - Posted: 6 Jan 2019, 0:53:45 UTC

No system is perfect. I would agree with your assessment about which OS/platform/apps produce the most inconsistent results against the reference cpu app. The worst are the Apple/MAC/Darwin hosts. Do I begrudge their faults to the point that I think they should be banned? ? Of course not. Just a nuisance in an imperfect system that I have learned to accept.

I have more concerns about the hosts that are never looked at and simply chew up bandwidth and database storage for never completing a valid task. Reference the "Invalid Host Messaging" thread. I can show you benchmark runs of early overflow tasks processed on the cpu reference app and the same task processed on the special app and producing the same exact result. If paired as wingmen the result would be validated and entered into the database even though one result was produced by a "flawed" app as Sten proclaims all the special CUDA apps to be. I have done my own benchmark testing and many more have done so along with the developer and we are satisfied we produce valid science. I'm positive I monitor my hosts more diligently than 99% of all project volunteers to make sure I always produce valid results.

The 2990WX article just covers one aspect of the findings by Wendell at Level1Techs who produced a very well researched and written analysis of why the 4 die Threadrippers produced such poor results on certain benchmarks. All the media published was that the unique memory architecture of the 4 die Threadrippers was to blame. Wendell proved it is not. The problem only occurs on Windows systems and the likely culprit is a poorly written and performing Windows kernel. If you want to read the original article referenced or the YouTube video he produced, I would recommend it greatly.

https://level1techs.com/video/2990wx-threadripper-performance-regression-fixed-windows-threadripper

I also really liked that the software fix was produced by Jeremy Collake , the developer of the ProcessLasso software who Wendell worked with. I used it for years on Windows and think it is the best software for managing cpu affinity on the market. Highly recommended for any Windows user.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973717 · Report as offensive     Reply Quote
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1973765 - Posted: 6 Jan 2019, 6:58:10 UTC - in response to Message 1973758.  

From what I can see, Beta is not being used for any testing these days it is just seems to be there in case it might be needed in the future.

Petri's special apps were only for Linux rigs as I understand it, there isn't a Windows version. I use the latest Lunatics software on all my rigs which hasn't been updated for quite some time now.

The Seti management have to realise that if they don't continue development in-house then others will do it for them.

Well, the kitties would luv a Windows port of Petri's special sauce. If indeed that is possible.
There are a lot of people that would benefit. As the majority of users on Seti are running Windows. An enhanced stock app would benefit many many users.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1973765 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973769 - Posted: 6 Jan 2019, 7:47:24 UTC - in response to Message 1973747.  

I never haunted beta like some of the posters here. But as far as I can tell, no beta testing of Linux apps ever occurred on beta, only at Lunatics.

From my reading of posts by Richard and between the lines the reason that Petri released to Main is that there were no Linux testers at beta. So his laboratory had to be at Main. There also appears to a severe lack of Linux experience among the experts like Richard and Jord.

And I seem to remember a post by Richard somewhere that it was never mandatory that apps HAD to be released at beta before they would be considered for Main.

Show me some codicil that stipulates that. I don't think there is.

And if the special apps return results consistent with the cpu apps, then the science is sound. Case closed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973769 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1973770 - Posted: 6 Jan 2019, 8:03:43 UTC
Last modified: 6 Jan 2019, 8:19:04 UTC

There are a few reason why the app wouldn't make it to main as-is, but I don't believe validation is one of them.
No screensaver
Checkpoint problem on restarts
Doesn't read from command line file
So it's not really an app for novice users to get without knowing the issues.

Ultimately Eric and the team have the say as to accept the validity of the results. All the have to do is search for Petri's name in the results file and flag it as invalid if they want the app to go away.

EDIT: From memory ... The CUDA6 app (I believe) was run quit a bit at beta for a couple of weeks while testing validation against SoG apps. Which I believe resulted in the latest SoG updates. That's from my memory, which at times is also corrupt :P

EDIT2: Petri does post the source code when he makes a release, unlike the recompiled versions. If you need a copy of his latest posted release give me a shout.
ID: 1973770 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1973773 - Posted: 6 Jan 2019, 9:03:25 UTC - in response to Message 1973717.  


The 2990WX article just covers one aspect of the findings by Wendell at Level1Techs who produced a very well researched and written analysis of why the 4 die Threadrippers produced such poor results on certain benchmarks. All the media published was that the unique memory architecture of the 4 die Threadrippers was to blame. Wendell proved it is not. The problem only occurs on Windows systems and the likely culprit is a poorly written and performing Windows kernel. If you want to read the original article referenced or the YouTube video he produced, I would recommend it greatly.

https://level1techs.com/video/2990wx-threadripper-performance-regression-fixed-windows-threadripper

I also really liked that the software fix was produced by Jeremy Collake , the developer of the ProcessLasso software who Wendell worked with. I used it for years on Windows and think it is the best software for managing cpu affinity on the market. Highly recommended for any Windows user.


What I wondered about after I read that article is if that would make a difference in the Linux-based cpu production at more than 26 threads? I would find it hard to believe that Linux suffers from the exact same "threads thrashing" scaling issue as Windows once the # of NUMA cores exceed two.

If I still had my 2990wx cpu I would feel severely tempted to do some benchmarking in Windows with the software fix.

Since more than one scientific application under Linux has shown issues with crunching "full speed" with core counts much over 26 I doubt it is the Linux apps that have the issue. So the only question is, does Linux have the same scheduling bug as Windows?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1973773 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1973774 - Posted: 6 Jan 2019, 9:12:08 UTC - in response to Message 1973773.  

Also remember that the seti apps are not multithread apps, they are 26 individual apps running that each require lots of memory access.
That CPU would have been happy to run a bunch of other apps at the same time not requiring much RAM access.
ID: 1973774 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973844 - Posted: 6 Jan 2019, 18:33:55 UTC - in response to Message 1973773.  

Since more than one scientific application under Linux has shown issues with crunching "full speed" with core counts much over 26 I doubt it is the Linux apps that have the issue. So the only question is, does Linux have the same scheduling bug as Windows?

If you watched the video, read the article or comments you will have read that the Linux scheduler does not have the issue that the Windows scheduler does. Quite a few comments the coreprio "fix" works on 2 socket Intel servers also even though the "fix" was developed for a perceived issue only with the 4 die Threadripper cpus.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973844 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1973847 - Posted: 6 Jan 2019, 18:41:20 UTC - in response to Message 1973844.  

Since more than one scientific application under Linux has shown issues with crunching "full speed" with core counts much over 26 I doubt it is the Linux apps that have the issue. So the only question is, does Linux have the same scheduling bug as Windows?

If you watched the video, read the article or comments you will have read that the Linux scheduler does not have the issue that the Windows scheduler does. Quite a few comments the coreprio "fix" works on 2 socket Intel servers also even though the "fix" was developed for a perceived issue only with the 4 die Threadripper cpus.


It sounds like I must have gotten ahold of a summary of the article or just didn't read it closely enough :(

Oh, well. Thank you for posting this information.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1973847 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1973848 - Posted: 6 Jan 2019, 18:43:37 UTC - in response to Message 1973774.  

Also remember that the seti apps are not multithread apps, they are 26 individual apps running that each require lots of memory access.
That CPU would have been happy to run a bunch of other apps at the same time not requiring much RAM access.


Very good point. Because I was/am pretty much in the BOINC world I guess the only question would have been if I could have found any projects that didn't require heavy memory access.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1973848 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13753
Credit: 208,696,464
RAC: 304
Australia
Message 1973989 - Posted: 7 Jan 2019, 6:13:49 UTC

Yet another reason RTX cards are so damned expensive.
According to a listing from Hong Kong wholesale retailer Components-Mart, the GDDR6 memory modules Nvidia buys from Micron are a whopping 70% more expensive than their GDDR5 counterparts.

GDDR6 prices are through the roof.
Grant
Darwin NT
ID: 1973989 · Report as offensive     Reply Quote
Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 147 · Next

Message boards : Number crunching : Don't know where it should go? Stick it here!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.