LotzaCores and a GTX 1080 FTW

Message boards : Number crunching : LotzaCores and a GTX 1080 FTW
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1791565 - Posted: 29 May 2016, 3:40:16 UTC - in response to Message 1791523.  

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????

And a 16.5sec WU that pays out 90.
Grant
Darwin NT
ID: 1791565 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791571 - Posted: 29 May 2016, 3:47:33 UTC - in response to Message 1791550.  
Last modified: 29 May 2016, 3:48:59 UTC

I think I'll do that, I'll give it till around 10 tomorrow morning to give it 18 straight hours of processing them, (but I had better NNT it bright and early in the morning to clear the cache) and then install the other version and see how things go with that one.

ID: 1791571 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791572 - Posted: 29 May 2016, 3:48:26 UTC - in response to Message 1791565.  

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????

And a 16.5sec WU that pays out 90.


Karma? :-D

ID: 1791572 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791573 - Posted: 29 May 2016, 3:49:21 UTC - in response to Message 1791550.  
Last modified: 29 May 2016, 3:55:17 UTC

Running all 48 at once. I'll head downstairs now and take a look at it to see if the temps are still in the same range.

I would have expected to see a much larger difference between the Run Time & CPU Time with 48 running. Looks like normal AR tasks are pretty solid between 2hr59min & 3hr6min. You could switch to the AVX app tomorrow to see how it runs. Unless you want to get more data on how VLARs run, but they normally track the same between the CPU apps.


[Custom-3rd party app behaviour being different:]The stock cpu code optimised paths have a lot of cache+paging aware things going on. The dispatch mechanism is driven by a quick bench at startup. A look with setting the -verbose command line option (iirc) would display more information on which codepaths become selected. In a sense that makes the applications adaptive to system contention (not quite dynamically, but close enough for government work). It's quite possible some functions would be chosen as fast implementations as expected, but then others use other paths, just because they fit better in the remaining resources during bench, then all run more or less equivalently stacked in like different shaped Tetris blocks.

For third party/fixed-builds, there's still runtime dispatch in fftw, though less visible.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791573 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791582 - Posted: 29 May 2016, 4:12:24 UTC

Looking at my one inconclusive:
Task 	 	Computer 	Sent 	 	 	 	Time reported 	 	 	Status 	 	 	 	 	Run time 	CPU time 	Credit  		Application
4956504166 	8012837 	28 May 2016, 21:59:10 UTC 	29 May 2016, 3:34:45 UTC 	Completed, validation inconclusive 	10,076.96 	10,056.92 	pending 	SETI@home v8 Anonymous platform (CPU)
4956504167 	7187175 	28 May 2016, 21:59:11 UTC 	29 May 2016, 0:29:12 UTC 	Completed, validation inconclusive 	2,013.66 	164.42 	 	pending 	SETI@home v8 v8.00 (opencl_nvidia_mac) x86_64-apple-darwin 


Tells me pretty much what I needed to know about the relative speed of CPU vs. GPU processing. My wingman's computer:

Computer information
Owner			jmenard 
Created			11 Jan 2014, 2:13:10 UTC
Total credit		5,204,286
Average credit		9,235.45
Cross project credit	BOINCstats.com Free-DC
CPU type		Genuine Intel Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz [x86 Family 6 Model 60 Stepping 3]
Number of processors	8
Coprocessors		NVIDIA GeForce GTX 780M (4095MB) driver: 4600.58 
OpenCL: 		1.2
Operating System	Darwin 15.5.0
BOINC version		7.6.22
Memory			16384 MB
Cache			976.56 KB
Measured floating point 4629.22 million ops/sec
Measured integer speed	12818.84 million ops/sec
Average upload rate	48.72 KB/sec
Average download rate	842.72 KB/sec
Average turnaround time	0.18 days
Tasks			423
Number of times client has contacted server	63124
Last contact		29 May 2016


Those time differences are nothing short of amazing... 5x faster.

Hopefully I will have similar results once I get the GPU installed.

ID: 1791582 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791584 - Posted: 29 May 2016, 4:14:56 UTC - in response to Message 1791573.  
Last modified: 29 May 2016, 4:16:37 UTC

Running all 48 at once. I'll head downstairs now and take a look at it to see if the temps are still in the same range.

I would have expected to see a much larger difference between the Run Time & CPU Time with 48 running. Looks like normal AR tasks are pretty solid between 2hr59min & 3hr6min. You could switch to the AVX app tomorrow to see how it runs. Unless you want to get more data on how VLARs run, but they normally track the same between the CPU apps.


[Custom-3rd party app behaviour being different:]The stock cpu code optimised paths have a lot of cache+paging aware things going on. The dispatch mechanism is driven by a quick bench at startup. A look with setting the -verbose command line option (iirc) would display more information on which codepaths become selected. In a sense that makes the applications adaptive to system contention (not quite dynamically, but close enough for government work). It's quite possible some functions would be chosen as fast implementations as expected, but then others use other paths, just because they fit better in the remaining resources during bench, then all run more or less equivalently stacked in like different shaped Tetris blocks.

For third party/fixed-builds, there's still runtime dispatch in fftw, though less visible.


Jason, thank you.


And I have absolutely _no_ idea what you just said. lol

But, if there is anything you'd like me to do or to configure to help you see more of what is going on on this machine, just let me know, I'm glad to help!

ID: 1791584 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791590 - Posted: 29 May 2016, 4:21:50 UTC - in response to Message 1791584.  
Last modified: 29 May 2016, 4:28:21 UTC

Jason, thank you.


And I have absolutely _no_ idea what you just said. lol


lol, yeah distilling things down is hard :)

Worth an attempt in this case: Think of the system as a beanbag, and the applications you stuff in there as the polystyrene beans. With few in there, the beans can assume their natural shape. Stuffing to the brim and sitting on the beanbag (external pressure), and the apps (beans) can change shape a bit (to a point), so leave less air.
[That's adaptive behaviour]

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791590 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1791601 - Posted: 29 May 2016, 4:53:31 UTC - in response to Message 1791571.  

(but I had better NNT it bright and early in the morning to clear the cache)

As long as you run the Lunatics Installer, and don't play with the app_info.xml file, that isn't necessary. The installer takes care of all the references to the old application.

Since the Lunatics Installer came out, the only time I've trashed work is when editing the app_info.xml file by hand when not fully awake. Using the installer I've gone from one application to another & back again later on with no loss of work.
AFAIK the installer shuts down BOINC before doing it's thing, but by habit I always shut it down before even starting the installer.
Grant
Darwin NT
ID: 1791601 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1791602 - Posted: 29 May 2016, 4:56:15 UTC - in response to Message 1791590.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.
Grant
Darwin NT
ID: 1791602 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791604 - Posted: 29 May 2016, 5:04:49 UTC - in response to Message 1791602.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791604 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1791637 - Posted: 29 May 2016, 6:28:37 UTC - in response to Message 1791604.  
Last modified: 29 May 2016, 6:29:44 UTC

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.

I suspect that AVX may prove to be slower on that system. With 48 tasks at once that is a lot to stuff down the memory pipeline all at once.
It may not be the most correct way to say it, but I think higher level SIMD instructions tend to be more memory intensive.
I was already very surprised by the performance of the E5 v2 CPUs versus the E5 previous generation. So I'm split 50/50 on how AVX will compare to SSE3 & will have to find out if they are using DDR3 1600 or 1866 memory.

AVX apps proved to be the most efficient on my i5-4670K systems with DDR3 1600 memory.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1791637 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791643 - Posted: 29 May 2016, 7:14:14 UTC - in response to Message 1791637.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.

I suspect that AVX may prove to be slower on that system. With 48 tasks at once that is a lot to stuff down the memory pipeline all at once.
It may not be the most correct way to say it, but I think higher level SIMD instructions tend to be more memory intensive.
I was already very surprised by the performance of the E5 v2 CPUs versus the E5 previous generation. So I'm split 50/50 on how AVX will compare to SSE3 & will have to find out if they are using DDR3 1600 or 1866 memory.

AVX apps proved to be the most efficient on my i5-4670K systems with DDR3 1600 memory.


Totally agreed, in particular recall chatting with Joe Segur while he was handcrafting some of those kernels. He had correct code for firing up the prefetchers etc, which will probably mean fairly peaked out pipeline and caches (leaving not much left over).

Will certainly be interesting to see if a host like this works better with many smaller fluffier beans, or Old Joe's cannonballs, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791643 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791649 - Posted: 29 May 2016, 7:55:54 UTC - in response to Message 1791509.  

Probably they really were real overflows. It's now coming in WU's run full time.

Yeah, I just noticed a whole pile that completed came through.
If that's the case, it looks like in the next few hours i'll be spitting out quite a few noisy WUs as well.

Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)



. . Join the club! :)
ID: 1791649 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791650 - Posted: 29 May 2016, 7:57:34 UTC - in response to Message 1791523.  

It's finally got some Credit!
3 WUs validated.

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????



. . Just lucky I guess ...
ID: 1791650 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791651 - Posted: 29 May 2016, 8:04:00 UTC - in response to Message 1791543.  


Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)

Nope, it's that you're kind of like me, and just find this a fun pasttime. :-) Plus, it's something a little out of the ordinary, and something brand new, at least once that video card arrives and is installed. Some ppl like to gamble, some ppl like to read books, others like to dive into video games. It's all about what you find interesting in life, I guess. :-)



. . I would much prefer wasting my meagre funds buying better hardware to make things crunch faster than feeding some Poker Machine (Slot Machine for those unfamiliar with Aussie parlance).

. . I am also very interested in how much success he has with the 1080, I have been reading their specs today. The ASUS GTX 1080 Strix sounded interesting until I read the power bottom line ... 300W ... eeeekk!
ID: 1791651 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791652 - Posted: 29 May 2016, 8:06:26 UTC - in response to Message 1791565.  

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????

And a 16.5sec WU that pays out 90.



. . Now don't be jealous! :)
ID: 1791652 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791653 - Posted: 29 May 2016, 8:15:36 UTC - in response to Message 1791584.  
Last modified: 29 May 2016, 8:16:27 UTC



[Custom-3rd party app behaviour being different:]The stock cpu code optimised paths have a lot of cache+paging aware things going on. The dispatch mechanism is driven by a quick bench at startup. A look with setting the -verbose command line option (iirc) would display more information on which codepaths become selected. In a sense that makes the applications adaptive to system contention (not quite dynamically, but close enough for government work). It's quite possible some functions would be chosen as fast implementations as expected, but then others use other paths, just because they fit better in the remaining resources during bench, then all run more or less equivalently stacked in like different shaped Tetris blocks.

For third party/fixed-builds, there's still runtime dispatch in fftw, though less visible.


Jason, thank you.


And I have absolutely _no_ idea what you just said. lol



. . So it's not just me then? I was wondering what language he was speaking.
ID: 1791653 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1791654 - Posted: 29 May 2016, 8:24:59 UTC - in response to Message 1791637.  
Last modified: 29 May 2016, 8:29:46 UTC


I suspect that AVX may prove to be slower on that system. With 48 tasks at once that is a lot to stuff down the memory pipeline all at once.
It may not be the most correct way to say it, but I think higher level SIMD instructions tend to be more memory intensive.
I was already very surprised by the performance of the E5 v2 CPUs versus the E5 previous generation. So I'm split 50/50 on how AVX will compare to SSE3 & will have to find out if they are using DDR3 1600 or 1866 memory.

AVX apps proved to be the most efficient on my i5-4670K systems with DDR3 1600 memory.



. . FWIW On my i5 6400 with DDR4 2333 ram AVX works a treat, almost halving the runtimes and not running that hot, stays mainly in the 50,s. But efficiency drops off sharply if I run crunching on all 4 Cores (all four cores flat line at 100% and runtimes increase). So I just run 3 and live with a happy PC.
ID: 1791654 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791663 - Posted: 29 May 2016, 11:15:05 UTC - in response to Message 1791637.  
Last modified: 29 May 2016, 11:25:55 UTC

Well, just got up, so I went down, paused and then exited BOINC, uninstalled and then reinstalled Lunatics, and it seemed to start right where it left off, with no drama. Only thing slightly unusual was that windows security asked if it was alright to allow BOINC thru the firewall, I've never seen that one before.

I am running Hynix 1866 memory, in singles per bank to allow the system to utilize it at the full speed, as I read that more sticks = slower speeds. And 32 gig is more than enough for what I am running on this.

So far, looking at temps, it appears that they may have crept up a few degrees, maybe an average of 3-5, but it looks like they are still for the most part at 50 or below except on 3-4 cores out of 24 on each CPU. But, I suppose that will vary depending on they type of WU is being processed.

ID: 1791663 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1791664 - Posted: 29 May 2016, 11:21:45 UTC

One question that occured to me while looking at them process (on the tasks tab), was that of the list that is all the tasks, about 50% or so of them are actually crunching right now, and the rest are waiting. So, even though my system preferences are set for 8 days worth of work, this in reality has about 4 hours of work, best case, and little over 2 hours worst case. Something seems wrong about this, isn't the program calculating things out properly, to allow the correct sized cache?

ID: 1791664 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : LotzaCores and a GTX 1080 FTW


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.