LotzaCores and a GTX 1080 FTW

Message boards : Number crunching : LotzaCores and a GTX 1080 FTW
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791538 - Posted: 29 May 2016, 2:05:07 UTC - in response to Message 1791500.  

well so far your times are about 3.5 times slower than my i5, lucky you have 32 cores.

Yippee for me... :-/

Oh well, it is 2 gens back (or is it 3, I thought I read that v5 is now out), plus it is their slowest model in the 12 core lineup. And I do have 48 cores (somewhat) hard at work, so we'll see how it goes over the next week or so till the vid card shows up.

ID: 1791538 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791543 - Posted: 29 May 2016, 2:16:45 UTC - in response to Message 1791509.  

Probably they really were real overflows. It's now coming in WU's run full time.

Yeah, I just noticed a whole pile that completed came through.
If that's the case, it looks like in the next few hours i'll be spitting out quite a few noisy WUs as well.

Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)

Nope, it's that you're kind of like me, and just find this a fun pasttime. :-) Plus, it's something a little out of the ordinary, and something brand new, at least once that video card arrives and is installed. Some ppl like to gamble, some ppl like to read books, others like to dive into video games. It's all about what you find interesting in life, I guess. :-)

ID: 1791543 · Report as offensive
kittyman Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49845
Credit: 914,316,158
RAC: 160,401
United States
Message 1791544 - Posted: 29 May 2016, 2:18:44 UTC - in response to Message 1791543.  
Last modified: 29 May 2016, 2:19:32 UTC

Probably they really were real overflows. It's now coming in WU's run full time.

Yeah, I just noticed a whole pile that completed came through.
If that's the case, it looks like in the next few hours i'll be spitting out quite a few noisy WUs as well.

Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)

Nope, it's that you're kind of like me, and just find this a fun pasttime. :-) Plus, it's something a little out of the ordinary, and something brand new, at least once that video card arrives and is installed. Some ppl like to gamble, some ppl like to read books, others like to dive into video games. It's all about what you find interesting in life, I guess. :-)

Kinda like when I did the frozen penny thingy.

It was wonderful to watch it run. At minus 30c.

Meow.
What meowing lurks in the hearts of man? The kittyman knows....MEOWhahahahahahha!

Have made friends here.
Most were cats.
ID: 1791544 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791545 - Posted: 29 May 2016, 2:24:49 UTC

Oh, and the temps all look fine, and looking at the pile processing now, some are at 75% with a little under 2 hours of processing time, others are around 48-52% mostly at 1 hour 20some minutes, with a couple at 1 hour 40some minutes. And looking at my list, it appears that I just started my first Guppi task on here:

5/28/2016 9:01:27 PM | SETI@home | Starting task blc5_2bit_guppi_57451_23161_HIP63121_OFF_0014.19967.831.18.27.61.vlar_2

It is 22 minutes in with 8.7% completed, so about 4 hours looks to be a good estimate if it progresses in a linear fashion.

ID: 1791545 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791546 - Posted: 29 May 2016, 2:25:28 UTC - in response to Message 1791544.  

Probably they really were real overflows. It's now coming in WU's run full time.

Yeah, I just noticed a whole pile that completed came through.
If that's the case, it looks like in the next few hours i'll be spitting out quite a few noisy WUs as well.

Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)

Nope, it's that you're kind of like me, and just find this a fun pasttime. :-) Plus, it's something a little out of the ordinary, and something brand new, at least once that video card arrives and is installed. Some ppl like to gamble, some ppl like to read books, others like to dive into video games. It's all about what you find interesting in life, I guess. :-)

Kinda like when I did the frozen penny thingy.

It was wonderful to watch it run. At minus 30c.

Meow.



Exactly!

ID: 1791546 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6530
Credit: 185,562,960
RAC: 44,725
United States
Message 1791550 - Posted: 29 May 2016, 2:51:43 UTC - in response to Message 1791536.  

Running all 48 at once. I'll head downstairs now and take a look at it to see if the temps are still in the same range.

I would have expected to see a much larger difference between the Run Time & CPU Time with 48 running. Looks like normal AR tasks are pretty solid between 2hr59min & 3hr6min. You could switch to the AVX app tomorrow to see how it runs. Unless you want to get more data on how VLARs run, but they normally track the same between the CPU apps.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1791550 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9894
Credit: 128,292,359
RAC: 78,832
Australia
Message 1791565 - Posted: 29 May 2016, 3:40:16 UTC - in response to Message 1791523.  

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????

And a 16.5sec WU that pays out 90.
Grant
Darwin NT
ID: 1791565 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791571 - Posted: 29 May 2016, 3:47:33 UTC - in response to Message 1791550.  
Last modified: 29 May 2016, 3:48:59 UTC

I think I'll do that, I'll give it till around 10 tomorrow morning to give it 18 straight hours of processing them, (but I had better NNT it bright and early in the morning to clear the cache) and then install the other version and see how things go with that one.

ID: 1791571 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791572 - Posted: 29 May 2016, 3:48:26 UTC - in response to Message 1791565.  

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????

And a 16.5sec WU that pays out 90.


Karma? :-D

ID: 1791572 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791573 - Posted: 29 May 2016, 3:49:21 UTC - in response to Message 1791550.  
Last modified: 29 May 2016, 3:55:17 UTC

Running all 48 at once. I'll head downstairs now and take a look at it to see if the temps are still in the same range.

I would have expected to see a much larger difference between the Run Time & CPU Time with 48 running. Looks like normal AR tasks are pretty solid between 2hr59min & 3hr6min. You could switch to the AVX app tomorrow to see how it runs. Unless you want to get more data on how VLARs run, but they normally track the same between the CPU apps.


[Custom-3rd party app behaviour being different:]The stock cpu code optimised paths have a lot of cache+paging aware things going on. The dispatch mechanism is driven by a quick bench at startup. A look with setting the -verbose command line option (iirc) would display more information on which codepaths become selected. In a sense that makes the applications adaptive to system contention (not quite dynamically, but close enough for government work). It's quite possible some functions would be chosen as fast implementations as expected, but then others use other paths, just because they fit better in the remaining resources during bench, then all run more or less equivalently stacked in like different shaped Tetris blocks.

For third party/fixed-builds, there's still runtime dispatch in fftw, though less visible.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791573 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791582 - Posted: 29 May 2016, 4:12:24 UTC

Looking at my one inconclusive:
Task 	 	Computer 	Sent 	 	 	 	Time reported 	 	 	Status 	 	 	 	 	Run time 	CPU time 	Credit  		Application
4956504166 	8012837 	28 May 2016, 21:59:10 UTC 	29 May 2016, 3:34:45 UTC 	Completed, validation inconclusive 	10,076.96 	10,056.92 	pending 	SETI@home v8 Anonymous platform (CPU)
4956504167 	7187175 	28 May 2016, 21:59:11 UTC 	29 May 2016, 0:29:12 UTC 	Completed, validation inconclusive 	2,013.66 	164.42 	 	pending 	SETI@home v8 v8.00 (opencl_nvidia_mac) x86_64-apple-darwin 


Tells me pretty much what I needed to know about the relative speed of CPU vs. GPU processing. My wingman's computer:

Computer information
Owner			jmenard 
Created			11 Jan 2014, 2:13:10 UTC
Total credit		5,204,286
Average credit		9,235.45
Cross project credit	BOINCstats.com Free-DC
CPU type		Genuine Intel Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz [x86 Family 6 Model 60 Stepping 3]
Number of processors	8
Coprocessors		NVIDIA GeForce GTX 780M (4095MB) driver: 4600.58 
OpenCL: 		1.2
Operating System	Darwin 15.5.0
BOINC version		7.6.22
Memory			16384 MB
Cache			976.56 KB
Measured floating point 4629.22 million ops/sec
Measured integer speed	12818.84 million ops/sec
Average upload rate	48.72 KB/sec
Average download rate	842.72 KB/sec
Average turnaround time	0.18 days
Tasks			423
Number of times client has contacted server	63124
Last contact		29 May 2016


Those time differences are nothing short of amazing... 5x faster.

Hopefully I will have similar results once I get the GPU installed.

ID: 1791582 · Report as offensive
Al Special Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1618
Credit: 342,772,766
RAC: 270,867
United States
Message 1791584 - Posted: 29 May 2016, 4:14:56 UTC - in response to Message 1791573.  
Last modified: 29 May 2016, 4:16:37 UTC

Running all 48 at once. I'll head downstairs now and take a look at it to see if the temps are still in the same range.

I would have expected to see a much larger difference between the Run Time & CPU Time with 48 running. Looks like normal AR tasks are pretty solid between 2hr59min & 3hr6min. You could switch to the AVX app tomorrow to see how it runs. Unless you want to get more data on how VLARs run, but they normally track the same between the CPU apps.


[Custom-3rd party app behaviour being different:]The stock cpu code optimised paths have a lot of cache+paging aware things going on. The dispatch mechanism is driven by a quick bench at startup. A look with setting the -verbose command line option (iirc) would display more information on which codepaths become selected. In a sense that makes the applications adaptive to system contention (not quite dynamically, but close enough for government work). It's quite possible some functions would be chosen as fast implementations as expected, but then others use other paths, just because they fit better in the remaining resources during bench, then all run more or less equivalently stacked in like different shaped Tetris blocks.

For third party/fixed-builds, there's still runtime dispatch in fftw, though less visible.


Jason, thank you.


And I have absolutely _no_ idea what you just said. lol

But, if there is anything you'd like me to do or to configure to help you see more of what is going on on this machine, just let me know, I'm glad to help!

ID: 1791584 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791590 - Posted: 29 May 2016, 4:21:50 UTC - in response to Message 1791584.  
Last modified: 29 May 2016, 4:28:21 UTC

Jason, thank you.


And I have absolutely _no_ idea what you just said. lol


lol, yeah distilling things down is hard :)

Worth an attempt in this case: Think of the system as a beanbag, and the applications you stuff in there as the polystyrene beans. With few in there, the beans can assume their natural shape. Stuffing to the brim and sitting on the beanbag (external pressure), and the apps (beans) can change shape a bit (to a point), so leave less air.
[That's adaptive behaviour]

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791590 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9894
Credit: 128,292,359
RAC: 78,832
Australia
Message 1791601 - Posted: 29 May 2016, 4:53:31 UTC - in response to Message 1791571.  

(but I had better NNT it bright and early in the morning to clear the cache)

As long as you run the Lunatics Installer, and don't play with the app_info.xml file, that isn't necessary. The installer takes care of all the references to the old application.

Since the Lunatics Installer came out, the only time I've trashed work is when editing the app_info.xml file by hand when not fully awake. Using the installer I've gone from one application to another & back again later on with no loss of work.
AFAIK the installer shuts down BOINC before doing it's thing, but by habit I always shut it down before even starting the installer.
Grant
Darwin NT
ID: 1791601 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9894
Credit: 128,292,359
RAC: 78,832
Australia
Message 1791602 - Posted: 29 May 2016, 4:56:15 UTC - in response to Message 1791590.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.
Grant
Darwin NT
ID: 1791602 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791604 - Posted: 29 May 2016, 5:04:49 UTC - in response to Message 1791602.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791604 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6530
Credit: 185,562,960
RAC: 44,725
United States
Message 1791637 - Posted: 29 May 2016, 6:28:37 UTC - in response to Message 1791604.  
Last modified: 29 May 2016, 6:29:44 UTC

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.

I suspect that AVX may prove to be slower on that system. With 48 tasks at once that is a lot to stuff down the memory pipeline all at once.
It may not be the most correct way to say it, but I think higher level SIMD instructions tend to be more memory intensive.
I was already very surprised by the performance of the E5 v2 CPUs versus the E5 previous generation. So I'm split 50/50 on how AVX will compare to SSE3 & will have to find out if they are using DDR3 1600 or 1866 memory.

AVX apps proved to be the most efficient on my i5-4670K systems with DDR3 1600 memory.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1791637 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1791643 - Posted: 29 May 2016, 7:14:14 UTC - in response to Message 1791637.  

Good metric to gauge optimal loading might be temperature, or power from the wall. With the beanbag analogy it'd be firmness.

Keep an eye on temperatures when running the AVX application.
On my i7 system it was (barely) able to run the SSE3 application without getting too hot. With the AVX application I had to replace the stock cooler- it worked the CPU that much harder. And the crunching times came down, a lot.


Yeah, don't burst the beanbag. AVX shaped beans are larger and denser, lol.

I suspect that AVX may prove to be slower on that system. With 48 tasks at once that is a lot to stuff down the memory pipeline all at once.
It may not be the most correct way to say it, but I think higher level SIMD instructions tend to be more memory intensive.
I was already very surprised by the performance of the E5 v2 CPUs versus the E5 previous generation. So I'm split 50/50 on how AVX will compare to SSE3 & will have to find out if they are using DDR3 1600 or 1866 memory.

AVX apps proved to be the most efficient on my i5-4670K systems with DDR3 1600 memory.


Totally agreed, in particular recall chatting with Joe Segur while he was handcrafting some of those kernels. He had correct code for firing up the prefetchers etc, which will probably mean fairly peaked out pipeline and caches (leaving not much left over).

Will certainly be interesting to see if a host like this works better with many smaller fluffier beans, or Old Joe's cannonballs, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1791643 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3347
Credit: 69,545,367
RAC: 91,178
Australia
Message 1791649 - Posted: 29 May 2016, 7:55:54 UTC - in response to Message 1791509.  

Probably they really were real overflows. It's now coming in WU's run full time.

Yeah, I just noticed a whole pile that completed came through.
If that's the case, it looks like in the next few hours i'll be spitting out quite a few noisy WUs as well.

Yeah. So far, no GBT VLARs being crunched though.

Edit: I find this interesting to follow. I just wonder if it's a sign that I really don't have a life to live :-)



. . Join the club! :)
ID: 1791649 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3347
Credit: 69,545,367
RAC: 91,178
Australia
Message 1791650 - Posted: 29 May 2016, 7:57:34 UTC - in response to Message 1791523.  

It's finally got some Credit!
3 WUs validated.

Now how the hell did he get 89.94 for a WU that ran for 15.51 seconds when a couple of WUs that ran for around 11,000 secs got 96 & 97 and then 113 for another 11,000sec runtime????



. . Just lucky I guess ...
ID: 1791650 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : LotzaCores and a GTX 1080 FTW


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.