Help making my 1070 rig up it's RAC to above my 1060 one

Message boards : Number crunching : Help making my 1070 rig up it's RAC to above my 1060 one
Message board moderation

To post messages, you must log in.

AuthorMessage
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1845493 - Posted: 30 Jan 2017, 19:25:35 UTC
Last modified: 30 Jan 2017, 19:27:39 UTC

I built a couple crunchers over the last 4-5 months, and they are sitting in the back bedroom happily crunching away. Well, I thought happily, till I took the time to pay attention to their individual performances. The machines are

ID: 8064025 X58-DualGTX1060 and ID: 8170251 DualGTX1070

They are pretty similiar machines: 1070- 8 procs at 3.5GHz, 1060- 12 procs as 3.33GHz. The 1070's are 4 meg cards, the 1060's, 6 meg versions. Both running Lunatics, and have been running for at least a month, so they have stabilized pretty well by this point and are running 24x7.

And.. The 1070 RAC: about 36k. The 1060 RAC: about 44k. Could someone take a look at them both and see if there is anything obvious that I had missed, because I sure would think that 2 1070's should handily outperform 2 1060's. Shouldn't they? Thanks for any ideas, guys!

ID: 1845493 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1845498 - Posted: 30 Jan 2017, 20:15:48 UTC

That's easy, you're running the outdated CUDA app on those 1070's while you're running SoG on the 1060's (it's outdated also BTW).

Get the latest Lunatic beta installer and try again (on both please). ;-)

Cheers.
ID: 1845498 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1845504 - Posted: 30 Jan 2017, 20:35:20 UTC - in response to Message 1845498.  

Agree with Wiggo...

Should look at the new beta installer
ID: 1845504 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1845526 - Posted: 30 Jan 2017, 22:25:08 UTC

Thanks guys, I was hoping it was something relatively simple like that. I've been up to my arse in alligators for the last 2-3 months, and haven't been paying nearly as close attn to my SETI hobby as I normally have been for the last year. Is there anything special I need to do to uninstall them to prepare them for reinstallation? And I presume I should run SOG on both? Thanks!

ID: 1845526 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1845527 - Posted: 30 Jan 2017, 22:26:08 UTC

The most recent Lunatics Beta 6 installer has version that are older than the current stock GPU apps. The stock Nvidia/Radeon 8.22/8.23 apps are r3584 . Those apps can be downloaded from the SETI@home servers or from Raistmer's cloud storage.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1845527 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1845539 - Posted: 30 Jan 2017, 22:47:04 UTC - in response to Message 1845493.  

I built a couple crunchers over the last 4-5 months, and they are sitting in the back bedroom happily crunching away. Well, I thought happily, till I took the time to pay attention to their individual performances. The machines are

ID: 8064025 X58-DualGTX1060 and ID: 8170251 DualGTX1070

They are pretty similiar machines: 1070- 8 procs at 3.5GHz, 1060- 12 procs as 3.33GHz. The 1070's are 4 meg cards, the 1060's, 6 meg versions. Both running Lunatics, and have been running for at least a month, so they have stabilized pretty well by this point and are running 24x7.

And.. The 1070 RAC: about 36k. The 1060 RAC: about 44k. Could someone take a look at them both and see if there is anything obvious that I had missed, because I sure would think that 2 1070's should handily outperform 2 1060's. Shouldn't they? Thanks for any ideas, guys!

First thing that caught my attention is your statement of the 1070 only having 4 GB. That made me scramble to Google to see who produces a 4GB card. No one does. All the 1070's came with 8 GB. The 4 GB you see on SETI is just its lack of being able to report only a maximum of 4 GB memory for graphics cards. Your 1070's actually have 8 GB and can be verified with GPU-Z for example.

You don't have to do anything special to update your apps. Just run the Lunatics Beta-06 application and just choose the SoG application for MB since you already are running Anonymous platform. You can also download the latest SoG app R3584 from Mike's World. I find that easier than trying to find Raistmer's download site in a message in the huge support thread.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1845539 · Report as offensive
Bruce
Volunteer tester

Send message
Joined: 15 Mar 02
Posts: 123
Credit: 124,955,234
RAC: 11
United States
Message 1845559 - Posted: 31 Jan 2017, 0:46:26 UTC

Here is Raistmer's Download Page. It has most of the new apps.
Like Keith says, you can get them at Mike's World also.

Good Luck.
Bruce
ID: 1845559 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1845577 - Posted: 31 Jan 2017, 2:39:50 UTC - in response to Message 1845539.  

I built a couple crunchers over the last 4-5 months, and they are sitting in the back bedroom happily crunching away. Well, I thought happily, till I took the time to pay attention to their individual performances. The machines are

ID: 8064025 X58-DualGTX1060 and ID: 8170251 DualGTX1070

They are pretty similiar machines: 1070- 8 procs at 3.5GHz, 1060- 12 procs as 3.33GHz. The 1070's are 4 meg cards, the 1060's, 6 meg versions. Both running Lunatics, and have been running for at least a month, so they have stabilized pretty well by this point and are running 24x7.

And.. The 1070 RAC: about 36k. The 1060 RAC: about 44k. Could someone take a look at them both and see if there is anything obvious that I had missed, because I sure would think that 2 1070's should handily outperform 2 1060's. Shouldn't they? Thanks for any ideas, guys!

First thing that caught my attention is your statement of the 1070 only having 4 GB. That made me scramble to Google to see who produces a 4GB card. No one does. All the 1070's came with 8 GB. The 4 GB you see on SETI is just its lack of being able to report only a maximum of 4 GB memory for graphics cards. Your 1070's actually have 8 GB and can be verified with GPU-Z for example.

You don't have to do anything special to update your apps. Just run the Lunatics Beta-06 application and just choose the SoG application for MB since you already are running Anonymous platform. You can also download the latest SoG app R3584 from Mike's World. I find that easier than trying to find Raistmer's download site in a message in the huge support thread.

It's been a long time but my memory is telling me that BOINC only displays up to 4GB for Nvidia GPUs because of a limitation in the CUDA detection.
Perhaps in the future BOINC can be made to use OpenCL GPU detection for Nvidia GPUs like it does for Intel and Radeon GPUs. So that limitation will no longer be an issue.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1845577 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1845609 - Posted: 31 Jan 2017, 5:38:41 UTC - in response to Message 1845577.  
Last modified: 31 Jan 2017, 5:44:42 UTC

I would also suggest some very aggressive command line settings to get as much out of the SOG application and GTX 1070s as possible.
And probably worth doing the same for your GTX 1080/980Ti system.
Grant
Darwin NT
ID: 1845609 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1847085 - Posted: 6 Feb 2017, 19:04:31 UTC

Thanks for the thoughts guys, I've been in and out of town last week and this week for training and a trade show, so it's probably going to have to wait for the weekend, but I will be updating them. Are they both the same programs from either Mikes World and Raistmer's Download Page? And suggestions on aggressive command line settings for both? I do use the 1080/980 system for occasional work (mostly web browsing stuff), but I suppose I can always suspend BOINC from running when I use this computer, because I believe it is set up to resume after 10-15 minutes if I don't manually do it. I do have Keiths rescheduling program on the 1080 system, but not on the others, so if I am running SOG, is his program not necessary, or even desireable? And for the 1060 and 1070 systems, I don't care about lagging response, as they are just sitting and crunching, aggressive command lines are fine for those 2 systems, I just want to maximize their output and see how they compare.

ID: 1847085 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1847100 - Posted: 6 Feb 2017, 20:14:49 UTC - in response to Message 1847085.  

Thanks for the thoughts guys, I've been in and out of town last week and this week for training and a trade show, so it's probably going to have to wait for the weekend, but I will be updating them. Are they both the same programs from either Mikes World and Raistmer's Download Page? And suggestions on aggressive command line settings for both? I do use the 1080/980 system for occasional work (mostly web browsing stuff), but I suppose I can always suspend BOINC from running when I use this computer, because I believe it is set up to resume after 10-15 minutes if I don't manually do it. I do have Keiths rescheduling program on the 1080 system, but not on the others, so if I am running SOG, is his program not necessary, or even desireable? And for the 1060 and 1070 systems, I don't care about lagging response, as they are just sitting and crunching, aggressive command lines are fine for those 2 systems, I just want to maximize their output and see how they compare.

It's not my rescheduling program, rather Jimbocous who has polished the front-end to Mr. Kevvy's rescheduler.

I have I think good results from this MB command line argument for my 1070's. I don't really have too noticeable system lag when I run Guppies. Only when two Guppies exit and reload on the same card do I notice some keyboard input lag. You can de-tune the command line a bit by removing the -high_perf argument and that will reduce the input lag to be un-noticeable.

<cmdline>-sbs 2048 -period_iterations_num 2 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -high_prec_timer</cmdline>


I also run this command line argument for AP work. Of course that is seldom used case in the recent past.

<cmdline>-unroll 24 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1</cmdline>


These are command lines in my app_config.xml files under the appropriate app sections. I just use the default written Lunatics app_info.xml underneath this. I control CPU and GPU usage in app_config.xml. Simpler and it doesn't need to change when new apps are installed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1847100 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1847141 - Posted: 6 Feb 2017, 23:58:22 UTC - in response to Message 1847100.  

Thanks for the thoughts guys, I've been in and out of town last week and this week for training and a trade show, so it's probably going to have to wait for the weekend, but I will be updating them. Are they both the same programs from either Mikes World and Raistmer's Download Page? And suggestions on aggressive command line settings for both? I do use the 1080/980 system for occasional work (mostly web browsing stuff), but I suppose I can always suspend BOINC from running when I use this computer, because I believe it is set up to resume after 10-15 minutes if I don't manually do it. I do have Keiths rescheduling program on the 1080 system, but not on the others, so if I am running SOG, is his program not necessary, or even desireable? And for the 1060 and 1070 systems, I don't care about lagging response, as they are just sitting and crunching, aggressive command lines are fine for those 2 systems, I just want to maximize their output and see how they compare.

It's not my rescheduling program, rather Jimbocous who has polished the front-end to Mr. Kevvy's rescheduler.

I have I think good results from this MB command line argument for my 1070's. I don't really have too noticeable system lag when I run Guppies. Only when two Guppies exit and reload on the same card do I notice some keyboard input lag. You can de-tune the command line a bit by removing the -high_perf argument and that will reduce the input lag to be un-noticeable.

<cmdline>-sbs 2048 -period_iterations_num 2 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -high_prec_timer</cmdline>


I also run this command line argument for AP work. Of course that is seldom used case in the recent past.

<cmdline>-unroll 24 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1</cmdline>


These are command lines in my app_config.xml files under the appropriate app sections. I just use the default written Lunatics app_info.xml underneath this. I control CPU and GPU usage in app_config.xml. Simpler and it doesn't need to change when new apps are installed.


Keith how many work units per GPU with this commandline? 2048 seems a bit overaggressive. I thought Raistmer limited the max size to 1024.
ID: 1847141 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1847148 - Posted: 7 Feb 2017, 0:48:32 UTC - in response to Message 1847141.  

Thanks for the thoughts guys, I've been in and out of town last week and this week for training and a trade show, so it's probably going to have to wait for the weekend, but I will be updating them. Are they both the same programs from either Mikes World and Raistmer's Download Page? And suggestions on aggressive command line settings for both? I do use the 1080/980 system for occasional work (mostly web browsing stuff), but I suppose I can always suspend BOINC from running when I use this computer, because I believe it is set up to resume after 10-15 minutes if I don't manually do it. I do have Keiths rescheduling program on the 1080 system, but not on the others, so if I am running SOG, is his program not necessary, or even desireable? And for the 1060 and 1070 systems, I don't care about lagging response, as they are just sitting and crunching, aggressive command lines are fine for those 2 systems, I just want to maximize their output and see how they compare.

It's not my rescheduling program, rather Jimbocous who has polished the front-end to Mr. Kevvy's rescheduler.

I have I think good results from this MB command line argument for my 1070's. I don't really have too noticeable system lag when I run Guppies. Only when two Guppies exit and reload on the same card do I notice some keyboard input lag. You can de-tune the command line a bit by removing the -high_perf argument and that will reduce the input lag to be un-noticeable.

<cmdline>-sbs 2048 -period_iterations_num 2 -tt 1500 -high_perf -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -high_prec_timer</cmdline>


I also run this command line argument for AP work. Of course that is seldom used case in the recent past.

<cmdline>-unroll 24 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1</cmdline>


These are command lines in my app_config.xml files under the appropriate app sections. I just use the default written Lunatics app_info.xml underneath this. I control CPU and GPU usage in app_config.xml. Simpler and it doesn't need to change when new apps are installed.


Keith how many work units per GPU with this commandline? 2048 seems a bit overaggressive. I thought Raistmer limited the max size to 1024.

Nope, no limit as far as I can tell. I'm running two tasks per card which uses up about 4.5GB of video memory on average. The 1070's have 8 GB available. I looked at the tune parameter readout in stderr.txt on completed work units and saw that with my command line arguments, the optimum memory buffer for BLC tasks is on average the best at 2048. Sometimes a 4096 buffer would be best with two tasks running, but that wouldn't fit. It's definitely overkill for the shorty Arecibo tasks though. The FFT tune parameters for Arecibo shorties needs only about 512-1024 kB for optimization.

I run the 970's at 1024 buffer because they only have 4 GB on board.

Here's an example of a BLC task FFT tune readout.

Fftlength=512,pass=3:Tune: sum=10832.1(ms); min=8.97(ms); max=182.8(ms); mean=10.78(ms); s_mean=17.19; sleep=15(ms); delta=1; N=1005; usual
Fftlength=1024,pass=3:Tune: sum=12192.7(ms); min=4.68(ms); max=254.1(ms); mean=6.069(ms); s_mean=10.52; sleep=0(ms); delta=1; N=2009; usual
Fftlength=2048,pass=3:Tune: sum=4807.21(ms); min=1.072(ms); max=1.536(ms); mean=1.197(ms); s_mean=1.225; sleep=0(ms); delta=1; N=4017; usual
Fftlength=4096,pass=3:Tune: sum=3123.5(ms); min=0.3614(ms); max=0.4547(ms); mean=0.3888(ms); s_mean=0.3833; sleep=0(ms); delta=1; N=8033; usual
Fftlength=8192,pass=3:Tune: sum=1764.68(ms); min=0.1025(ms); max=0.1239(ms); mean=0.1098(ms); s_mean=0.112; sleep=0(ms); delta=1; N=16065; usual

Here's an example of a Arecibo shorty FFT tune readout

Fftlength=128,pass=3:Tune: sum=325.808(ms); min=1.776(ms); max=6.626(ms); mean=5.716(ms); s_mean=5.755; sleep=0(ms); delta=1; N=57; usual
Fftlength=256,pass=3:Tune: sum=155.991(ms); min=1.237(ms); max=1.536(ms); mean=1.356(ms); s_mean=1.363; sleep=0(ms); delta=1; N=115; usual
Fftlength=512,pass=3:Tune: sum=87.9097(ms); min=0.3574(ms); max=0.4557(ms); mean=0.3839(ms); s_mean=0.3846; sleep=0(ms); delta=1; N=229; usual
Fftlength=1024,pass=3:Tune: sum=66.9434(ms); min=0.1228(ms); max=6.683(ms); mean=0.1465(ms); s_mean=0.1323; sleep=0(ms); delta=1; N=457; usual


The tuning goes all the way up to 8GB buffer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1847148 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1847699 - Posted: 10 Feb 2017, 4:57:06 UTC

Here's an example of a BLC task FFT tune readout.

Fftlength=512,pass=3:Tune: sum=10832.1(ms); min=8.97(ms); max=182.8(ms); mean=10.78(ms); s_mean=17.19; sleep=15(ms); delta=1; N=1005; usual
Fftlength=1024,pass=3:Tune: sum=12192.7(ms); min=4.68(ms); max=254.1(ms); mean=6.069(ms); s_mean=10.52; sleep=0(ms); delta=1; N=2009; usual
Fftlength=2048,pass=3:Tune: sum=4807.21(ms); min=1.072(ms); max=1.536(ms); mean=1.197(ms); s_mean=1.225; sleep=0(ms); delta=1; N=4017; usual
Fftlength=4096,pass=3:Tune: sum=3123.5(ms); min=0.3614(ms); max=0.4547(ms); mean=0.3888(ms); s_mean=0.3833; sleep=0(ms); delta=1; N=8033; usual
Fftlength=8192,pass=3:Tune: sum=1764.68(ms); min=0.1025(ms); max=0.1239(ms); mean=0.1098(ms); s_mean=0.112; sleep=0(ms); delta=1; N=16065; usual


Keith,

How did you figure the 2048 is the best setting? I can't make head or tails out of this.
ID: 1847699 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1847722 - Posted: 10 Feb 2017, 6:13:42 UTC - in response to Message 1847699.  

Here's an example of a BLC task FFT tune readout.

Fftlength=512,pass=3:Tune: sum=10832.1(ms); min=8.97(ms); max=182.8(ms); mean=10.78(ms); s_mean=17.19; sleep=15(ms); delta=1; N=1005; usual
Fftlength=1024,pass=3:Tune: sum=12192.7(ms); min=4.68(ms); max=254.1(ms); mean=6.069(ms); s_mean=10.52; sleep=0(ms); delta=1; N=2009; usual
Fftlength=2048,pass=3:Tune: sum=4807.21(ms); min=1.072(ms); max=1.536(ms); mean=1.197(ms); s_mean=1.225; sleep=0(ms); delta=1; N=4017; usual
Fftlength=4096,pass=3:Tune: sum=3123.5(ms); min=0.3614(ms); max=0.4547(ms); mean=0.3888(ms); s_mean=0.3833; sleep=0(ms); delta=1; N=8033; usual
Fftlength=8192,pass=3:Tune: sum=1764.68(ms); min=0.1025(ms); max=0.1239(ms); mean=0.1098(ms); s_mean=0.112; sleep=0(ms); delta=1; N=16065; usual


Keith,

How did you figure the 2048 is the best setting? I can't make head or tails out of this.

As I said, I can't run the 4096 buffer with two tasks concurrently. The 1070 only has 8192 MB onboard, so not enough memory unless I drop to 1 task running. The stderr.txt shows with my 2048 setting, it will typically use about 2136~~ MB of buffer. You just look at the task timings in each tune line and look for the minimum ms runs with a delta of 1 and the highest N value. In the example above, 8192 is the fastest completion times but not enough memory on the card to use. Same case for the 4096 runs but again not enough memory to run two up on the card. So I settle for 2048 MB buffer.

Raistmer explained the tuning runs at Lunatics with that post of his detailing the parameter choices and what they accomplish. Go have a read.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1847722 · Report as offensive

Message boards : Number crunching : Help making my 1070 rig up it's RAC to above my 1060 one


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.