GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU

Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 37 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1809549 - Posted: 16 Aug 2016, 4:45:43 UTC - in response to Message 1809529.  


. . That would make for an interesting comparison. Using SoG I am running 4 simultaneous SoG WUs on my GTX 970 from one CPU core (on a pentium D 3GHz) and the runtimes are approx 28 mins, which is not too bad. My problem is the upgrade to Win10 on that machine was a mistake. It does not support the Nvidia NForce4 SLI chipset on this ASUS motherboard, and neither ASUS nor Nvidia offer win10 drivers for it also. This lack of proper drivers is causing me endless headaches with lockups and other probs, all related to machine functions not so much BOINC itself. For that matter I am running 3 concurrent SoG WUs on my GTX950 as well and they are running in approx 36 to 38 mins. Also using only one CPU core. So SoG is not the problem some ppl think.

I'd like to know how you are accomplishing 4 tasks per GPU on a 970 and still have a system that will respond to a mouse click in less than a week. Are you running with sleep? I only run 2 tasks per card and I am at 97-99% utilization all the time. I don't use sleep anymore as that incurred at least a 15 minute extension in task completion time. Throughput is greater with 2 tasks no sleep than my previous 3 tasks with sleep.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1809549 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809568 - Posted: 16 Aug 2016, 8:13:50 UTC - in response to Message 1809399.  

Does DA have much of an interest in SETI any longer, or are we now sort of the ugly redheaded stepchild of BOINC?

As one can see from SETI's SVN commit log, DA actively working on nebula so he's definitely involved in SETI still.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809568 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809581 - Posted: 16 Aug 2016, 9:06:03 UTC - in response to Message 1809532.  

@ Stephen is that time total for 3 work units or what the time is after you divide the total by 3?


. .It is the actual total runtime. Meaning it produces 3 WUs every 38 mins on the GTX950 and 4 WUs every 28 mins on the GTX970.

. . The average runtimes are slightly better than that but that is about the max for most WUs, often the runtime is 34 to 36 mins on the 950 and 24 to 27 mins on the 970. Not as good as CUDA75 on Mr Kevvys rigs but not terrible either.
ID: 1809581 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809582 - Posted: 16 Aug 2016, 9:09:18 UTC - in response to Message 1809543.  


In any event, I would think the programming changes to parse that info would be fairly trivial. Convincing someone to make such a policy change, however, would likely not be. Good luck with that one!


. . So very true.
ID: 1809582 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809585 - Posted: 16 Aug 2016, 9:15:39 UTC - in response to Message 1809549.  


. . That would make for an interesting comparison. Using SoG I am running 4 simultaneous SoG WUs on my GTX 970 from one CPU core (on a pentium D 3GHz) and the runtimes are approx 28 mins, which is not too bad. My problem is the upgrade to Win10 on that machine was a mistake. It does not support the Nvidia NForce4 SLI chipset on this ASUS motherboard, and neither ASUS nor Nvidia offer win10 drivers for it also. This lack of proper drivers is causing me endless headaches with lockups and other probs, all related to machine functions not so much BOINC itself. For that matter I am running 3 concurrent SoG WUs on my GTX950 as well and they are running in approx 36 to 38 mins. Also using only one CPU core. So SoG is not the problem some ppl think.

I'd like to know how you are accomplishing 4 tasks per GPU on a 970 and still have a system that will respond to a mouse click in less than a week. Are you running with sleep? I only run 2 tasks per card and I am at 97-99% utilization all the time. I don't use sleep anymore as that incurred at least a 15 minute extension in task completion time. Throughput is greater with 2 tasks no sleep than my previous 3 tasks with sleep.



. . Funny you should ask that. When I started using sleep the runtimes did blow out because the GPU utilisation dropped off, which I countered with increasing the number of tasks running. I found that with 4 running the GPU utilisation is pretty full with minimal further increase in runtimes. I am running settings not widely recommended but they work for me

-use_sleep_ex 5 -sbs 384 -period_iterations_number 5

. . Since it is a dedicated cruncher I do not use it for much else and the lag is OK for me. It's really not that bad. I did try using iterations of 10 but found it was worse for whatever reason so I went back to 5.
ID: 1809585 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809594 - Posted: 16 Aug 2016, 10:22:04 UTC - in response to Message 1809585.  
Last modified: 16 Aug 2016, 10:24:26 UTC

5ms of sleep is quite big for modern GPUs. One could consider to use -high_prec_timer instead and simple -use_sleep. This will allow better GPU utilization with less number of simultaneous tasks in fly.

Also, new builds did not follow period_iterations_num exactly. They use it as initial guess and then tune this number until desired kernel execution time will be reached. And that time (default is 15ms) can be changed with -tt F option.

I requested experimentation with -tt F to tune better default for release but no feedback so far.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809594 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22998
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1809596 - Posted: 16 Aug 2016, 10:48:17 UTC

Raistemer - have you seen Tbar or Petri's work on "nanosleep"?

It looks interesting in that, with care, it reduces lag, CPU usage and improves performance. They are talking about nano/micro second sleeps/delays.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1809596 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809600 - Posted: 16 Aug 2016, 10:51:29 UTC - in response to Message 1809596.  
Last modified: 16 Aug 2016, 10:59:06 UTC

Raistemer - have you seen Tbar or Petri's work on "nanosleep"?

It looks interesting in that, with care, it reduces lag, CPU usage and improves performance. They are talking about nano/micro second sleeps/delays.

It's Linux-only stuff.
(http://stackoverflow.com/questions/7827062/is-there-a-windows-equivalent-of-nanosleep )

P.S. and interesting research that supports my own findings in this area:
http://www.geisswerks.com/ryan/FAQS/timing.html

Currently I implement (before was undocumented one and for AP) -high_prec_timer option that allows 1ms sleep instead of ~15ms one.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809600 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22998
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1809604 - Posted: 16 Aug 2016, 11:05:17 UTC

I was talking about the concept of very short sleeps rather than the detail of their o/s specific solution. It looks to me as if you are heading along the same path, albeit only having access to slower timers than they are using?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1809604 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809605 - Posted: 16 Aug 2016, 11:14:03 UTC - in response to Message 1809604.  
Last modified: 16 Aug 2016, 11:17:16 UTC

I was talking about the concept of very short sleeps rather than the detail of their o/s specific solution. It looks to me as if you are heading along the same path, albeit only having access to slower timers than they are using?

To talk about microseconds delays w/o OS support for such delays makes no sense.
Windows can either yield thread (Sleep(0)) or put thread to sleep with minimum duration of 1ms.
What I do is to optimally circumvent these OS limitations.

P.S. Though there is some approach to try indeed: https://gist.github.com/Youka/4153f12cf2e17a77314c
Thanks for attracting attention to this.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809605 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1809645 - Posted: 16 Aug 2016, 14:14:56 UTC - in response to Message 1809594.  

5ms of sleep is quite big for modern GPUs. One could consider to use -high_prec_timer instead and simple -use_sleep. This will allow better GPU utilization with less number of simultaneous tasks in fly.

Also, new builds did not follow period_iterations_num exactly. They use it as initial guess and then tune this number until desired kernel execution time will be reached. And that time (default is 15ms) can be changed with -tt F option.

I requested experimentation with -tt F to tune better default for release but no feedback so far.


. . I was worried about trying to manually install r3486 or r3500 as I have not done this before. But now that Lunatics Beta4 is out with r3450 I will be keen to try it out, but maybe not in the middle of the WOW event :)
ID: 1809645 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1809646 - Posted: 16 Aug 2016, 14:23:38 UTC - in response to Message 1809594.  

I requested experimentation with -tt F to tune better default for release but no feedback so far.

After the first few days of WoW have past, I think you'll start seeing your volunteer testers provide your requested feedback...especially from those who have given the best start but are slowly falling in the ranks (like me! lol)
ID: 1809646 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1809650 - Posted: 16 Aug 2016, 14:32:19 UTC - in response to Message 1809645.  

. . I was worried about trying to manually install r3486 or r3500 as I have not done this before.

BilBg provided a post in late June to do this.
I think it's this one:
https://setiathome.berkeley.edu/forum_thread.php?id=79765&postid=1798985#1798985

Stephen, since you're getting a hang of using the commandline lines, would you be interested in writing a 1-pager for anyone trying to help Raistmer for the 1st time. I am available to review the doc and learn from your new expertise.
(Keep in mind I have too many other ideas for other small projects to help the project in the medium to long term...so I'll only be able to help a few hours here and there...just an idea)

Cheers,
RobG
ID: 1809650 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22998
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1809651 - Posted: 16 Aug 2016, 14:33:34 UTC - in response to Message 1809605.  

Raistmer,
I hope some of the ideas they have been floating around help you in your work.
(It is a shame Windows doesn't have the same high frequency timers that some other o/s have)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1809651 · Report as offensive
AMDave
Volunteer tester

Send message
Joined: 9 Mar 01
Posts: 234
Credit: 11,671,730
RAC: 0
United States
Message 1809657 - Posted: 16 Aug 2016, 14:45:43 UTC - in response to Message 1809645.  

. . I was worried about trying to manually install r3486 or r3500 as I have not done this before. But now that Lunatics Beta4 is out with r3450 I will be keen to try it out, but maybe not in the middle of the WOW event :)

It's been updated.  See Message 1809279.
ID: 1809657 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1809682 - Posted: 16 Aug 2016, 20:58:31 UTC - in response to Message 1809645.  


. . I was worried about trying to manually install r3486 or r3500 as I have not done this before. But now that Lunatics Beta4 is out with r3450 I will be keen to try it out, but maybe not in the middle of the WOW event :)

I updated to Lunatics Beta4 yesterday, in the middle of the WOW contest. No problems seen, EXCEPT, you have to manually copy your mb_cmdline_win_x86_SSE3_OpenCL_NV.txt parameter settings into the newly named mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt file. Raistmer changed the text file name in the R3500 aistub file which builds the new app_info. You just have to go back in and update your <count> value as usual and change over to the new command line file. Pretty painless. No ghosts produced across all three machines.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1809682 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1809684 - Posted: 16 Aug 2016, 21:02:00 UTC - in response to Message 1809594.  

5ms of sleep is quite big for modern GPUs. One could consider to use -high_prec_timer instead and simple -use_sleep. This will allow better GPU utilization with less number of simultaneous tasks in fly.

Also, new builds did not follow period_iterations_num exactly. They use it as initial guess and then tune this number until desired kernel execution time will be reached. And that time (default is 15ms) can be changed with -tt F option.

I requested experimentation with -tt F to tune better default for release but no feedback so far.

Raistmer, which thread contains the info on the new tuning parameter -tt F?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1809684 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1809691 - Posted: 16 Aug 2016, 21:17:18 UTC - in response to Message 1809684.  
Last modified: 16 Aug 2016, 21:19:31 UTC

5ms of sleep is quite big for modern GPUs. One could consider to use -high_prec_timer instead and simple -use_sleep. This will allow better GPU utilization with less number of simultaneous tasks in fly.

Also, new builds did not follow period_iterations_num exactly. They use it as initial guess and then tune this number until desired kernel execution time will be reached. And that time (default is 15ms) can be changed with -tt F option.

I requested experimentation with -tt F to tune better default for release but no feedback so far.

Raistmer, which thread contains the info on the new tuning parameter -tt F?

ReadMe_MultiBeam_OpenCL.txt
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1809691 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1809704 - Posted: 16 Aug 2016, 21:37:12 UTC - in response to Message 1809691.  

5ms of sleep is quite big for modern GPUs. One could consider to use -high_prec_timer instead and simple -use_sleep. This will allow better GPU utilization with less number of simultaneous tasks in fly.

Also, new builds did not follow period_iterations_num exactly. They use it as initial guess and then tune this number until desired kernel execution time will be reached. And that time (default is 15ms) can be changed with -tt F option.

I requested experimentation with -tt F to tune better default for release but no feedback so far.

Raistmer, which thread contains the info on the new tuning parameter -tt F?

ReadMe_MultiBeam_OpenCL.txt

Or, if deployed via the Beta4 installer,

ReadMe_MultiBeam_OpenCL_NV_SoG.txt

(as noted in the release thread. We can't have different files with the same name)
ID: 1809704 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1809705 - Posted: 16 Aug 2016, 21:39:10 UTC - in response to Message 1809704.  

I've taken to just placing the commandline into my app_config.xml that Richard mentioned some time back.

Use it on all my machines and haven't had any problems
ID: 1809705 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 37 · Next

Message boards : Number crunching : GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.