Modified SETI MB CUDA + opt AP package for full GPU utilization


log in

Advanced search

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization

1 · 2 · 3 · 4 . . . 25 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 842760 - Posted: 20 Dec 2008, 22:20:16 UTC
Last modified: 20 Dec 2008, 23:09:10 UTC

1) This package (Raistmer's_opt_package.rar) can be downloaded from http://files.mail.ru/CIXVXO .
It also can be downloaded from this post on Lunatics forums:
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html;topicseen#msg12177
Targed hosts: Windows x86, SSE3 support for AP, CUDA support for MB.

2) It consist of modified SETI MB CUDA and current SSE3 opt SETI AP binaries with corresponding app_info.xml file
3) Modification that I have done increases CUDA worker thread priority in SETI MB CUDA that allows more fully GPU usage while keeping all CPU cores busy too. That is, using of this build can increase total performance of your host for BOINC tasks.
4) MB binaries based on CUDA MB sources recived from Eric (with small modification), opt AP is just repacking of current Lunatics opt AP release (SSE3 build).
5) It's not "official" Lunatics release so you could blame only me (or yourself, or BOINC bugs and so on and so forth) for any issues you encounter.
6) I still can' check AP+MB work (no AP tasks here) but it works just fine with CUDA MB + einstein@home combination.
7) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU.
8) Installation instructions are the same as for any opt app: stop BOINC, decompress all files in archive into SETI project directory, restart BOINC.

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 842844 - Posted: 21 Dec 2008, 2:36:36 UTC - in response to Message 842760.
Last modified: 21 Dec 2008, 2:37:57 UTC

1) This package (Raistmer's_opt_package.rar) can be downloaded from http://files.mail.ru/CIXVXO .
It also can be downloaded from this post on Lunatics forums:
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html;topicseen#msg12177
Targed hosts: Windows x86, SSE3 support for AP, CUDA support for MB.

2) It consist of modified SETI MB CUDA and current SSE3 opt SETI AP binaries with corresponding app_info.xml file
3) Modification that I have done increases CUDA worker thread priority in SETI MB CUDA that allows more fully GPU usage while keeping all CPU cores busy too. That is, using of this build can increase total performance of your host for BOINC tasks.
4) MB binaries based on CUDA MB sources recived from Eric (with small modification), opt AP is just repacking of current Lunatics opt AP release (SSE3 build).
5) It's not "official" Lunatics release so you could blame only me (or yourself, or BOINC bugs and so on and so forth) for any issues you encounter.
6) I still can' check AP+MB work (no AP tasks here) but it works just fine with CUDA MB + einstein@home combination.
7) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU.
8) Installation instructions are the same as for any opt app: stop BOINC, decompress all files in archive into SETI project directory, restart BOINC.


Hi Raistmer,

I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being.
____________
BOINC blog

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,109,705
RAC: 20,558
United States
Message 842905 - Posted: 21 Dec 2008, 5:49:34 UTC
Last modified: 21 Dec 2008, 5:50:01 UTC

To both MarkJ and Raistmer.......

What version of NVIDIA drivers are you using?

Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here.
____________
Boinc....Boinc....Boinc....Boinc....

Profile popandbob
Volunteer tester
Send message
Joined: 19 Mar 05
Posts: 535
Credit: 1,896,421
RAC: 0
Canada
Message 842928 - Posted: 21 Dec 2008, 7:21:38 UTC

I'd recommend the latest stable (non beta) drivers available for your vid card.

____________


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 842940 - Posted: 21 Dec 2008, 8:34:23 UTC - in response to Message 842905.

To both MarkJ and Raistmer.......

What version of NVIDIA drivers are you using?

Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here.


I used 180.48. Before that I was running the one supplied on CD with the card, which was an even earlier version. Things seemed to go a little faster after I upgraded.
____________
BOINC blog

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 842943 - Posted: 21 Dec 2008, 8:43:14 UTC - in response to Message 842844.
Last modified: 21 Dec 2008, 8:45:04 UTC

I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being.


I gave it (the Raistmer version) a bit more of a run in 4+1 mode. However after it had fnished all its cuda work it then tried to run 5 other tasks. That is while it had cuda work it did other stuff (Astropulse & Einstein) plus the Seti cuda, but after running out of cuda work it was still trying to run 5 tasks, which would mean 2 tasks have to compete for a core or take it in turns. They don't progress too well.

I have backed out the cc_config change, which of course gets me back to 3+1 mode. Still running the Raistmer version though to get some speed comparisons.
____________
BOINC blog

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 842950 - Posted: 21 Dec 2008, 9:02:34 UTC - in response to Message 842844.
Last modified: 21 Dec 2008, 9:44:59 UTC


Hi Raistmer,

I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being.


Probably you did short WUs before and now recived longer ones.
Could you give link on your host as some factual evidence of such behavior?
I stress again my build is CUDA MB version + priority increase nothing more. Can't imagine how it could be slower :) Just one possibility - I have outdated CUDA sources and 6.06 has some speed gain over prev build. It would be nice to check this looking on your host log.

ADDITION: Yesterday my GPU finishes task in ~20 min wall clock time.
Today it fnishes task just in 7 minutes of wall clock time.
It means (just as it's with CPU version) that GPU task run time varies widely with AR of WU. So look on claimed credits for task and on task result itself (you need true angle range field) to check if you compare the same or different by AR tasks.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 842952 - Posted: 21 Dec 2008, 9:06:05 UTC - in response to Message 842943.
Last modified: 21 Dec 2008, 9:08:38 UTC

I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being.


I gave it (the Raistmer version) a bit more of a run in 4+1 mode. However after it had fnished all its cuda work it then tried to run 5 other tasks. That is while it had cuda work it did other stuff (Astropulse & Einstein) plus the Seti cuda, but after running out of cuda work it was still trying to run 5 tasks, which would mean 2 tasks have to compete for a core or take it in turns. They don't progress too well.

I have backed out the cc_config change, which of course gets me back to 3+1 mode. Still running the Raistmer version though to get some speed comparisons.

Sure it will do that (not my version but BOINC with modified ncpu value). BOINC thinks your host has 5 CPUs instead of 4 and behaves accordingly. This artifical CPU number increase is just workaround for BOINC inability of correct scheduling of CUDA tasks. Surely it should be corrected in BOINC itself to avoid such hacks.

And about too well or not too well progress: do you have some numbers? What relative slowdown you see? Just remember you have 4 cores. So if these 4 cores will do 5 tasks surely wall clock times will be bigger for each task but you will report 5 tasks at once, notjust four. Look at CPU times for CPU tasks, are CPU times increased?

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 842954 - Posted: 21 Dec 2008, 9:13:33 UTC - in response to Message 842905.
Last modified: 21 Dec 2008, 9:42:23 UTC

To both MarkJ and Raistmer.......

What version of NVIDIA drivers are you using?

Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here.


I can't recommend anything regarding drivers cause didn't make any comparison. My host using 180.48 now. It works pretty fine but VLAR tasks cause driver crash with restart under Vista x86.

Profile Byron S Goodgame
Volunteer tester
Avatar
Send message
Joined: 16 Jan 06
Posts: 1151
Credit: 3,936,993
RAC: 0
United States
Message 842981 - Posted: 21 Dec 2008, 10:30:21 UTC
Last modified: 21 Dec 2008, 10:32:09 UTC

Hi Raistmer,

Ran Einstein and CUDA with the new modified app, ran smooth, no problems to report. Much like the runs I had in Beta setting the priority to real time.

Thanks, and appreciate the hard work you're putting in.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 842996 - Posted: 21 Dec 2008, 11:09:02 UTC - in response to Message 842981.

Hi Raistmer,

Ran Einstein and CUDA with the new modified app, ran smooth, no problems to report. Much like the runs I had in Beta setting the priority to real time.

Thanks, and appreciate the hard work you're putting in.

Thanks. Report w/o any problems is good one too :) I had few BSODs with RealTime priority mode on my host so even reverted back to default priority for few days. Current build with increases priority runs here w/o any problems much longer already than it was to get BSOD before and no problems. And (now main is better place for experiments than beta, we can see validation results here and still no one of my results has defined status on beta) results are mostly valid (exception - computation errors on VLAR, but it's 6.06 CUDA MB bug/feature ;) too).

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 842997 - Posted: 21 Dec 2008, 11:09:31 UTC - in response to Message 842950.


Hi Raistmer,

I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being.


Probably you did short WUs before and now recived longer ones.
Could you give link on your host as some factual evidence of such behavior?
I stress again my build is CUDA MB version + priority increase nothing more. Can't imagine how it could be slower :) Just one possibility - I have outdated CUDA sources and 6.06 has some speed gain over prev build. It would be nice to check this looking on your host log.

ADDITION: Yesterday my GPU finishes task in ~20 min wall clock time.
Today it fnishes task just in 7 minutes of wall clock time.
It means (just as it's with CPU version) that GPU task run time varies widely with AR of WU. So look on claimed credits for task and on task result itself (you need true angle range field) to check if you compare the same or different by AR tasks.


Hi Raistmer,

I was looking in the messages tab in BOINC and had to find the start/finish times for the tasks. As you say they vary by angle range so possibly I had a bunch of big ones. I did notice some 7 minute ones earlier today. At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing).

My host with the cuda card is called Qui-Gon (from Star Wars) and is here. I added the card on the 20th, so ignore any work before the 20th of Dec.

I had a bunch of wu error out yesterday, but this was using the 6.05 app (ie stock) and I understand it doesn't cope too well with VLAR or VHAR, which is probably why you had video driver issues. I also had some just hang. The only way to get them going was to shutdown/start up BOINC and then they error out. Nothing you've done, just bugs with the current cuda app. Hopefully 6.06 will correct these issue.

BOINC 6.5 is supposed to address the issue of 4+1 scheduling. This is mentioned in another couple of message threads. Looks like in their haste to release a cuda-aware BOINC they didn't get around to scheduling issues.

Thanks for all your efforts with this one and the other optimized apps.
____________
BOINC blog

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 843001 - Posted: 21 Dec 2008, 11:21:15 UTC - in response to Message 842997.


.... (this would be right up Fred's alley for graphing).


Looks like I might have to re-visit Seti_Pal in the New Year. ATM it turns up its toes when it sees an AP WU. Should be OK if the CPU cores are being used for a different project altogether though.

F.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 843009 - Posted: 21 Dec 2008, 11:36:47 UTC - in response to Message 842997.

At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing).


Well 3+1 mode should run just similar for stock CUDA app and my mod.
The key meaning of mod is to allow adequate GPU feeding while all CPU cores are busy with other BOINC tasks. Stock CUDA app suffer in this mode from lack of CPU. Wall clock times for stock app increase very much and GPU becomes almost idle (you can see it measuring GPU temperature). My mod will keep GPU busy in such config too. If you free whole core to feed GPU you no need any mods. But you will lose whole core...
So mod's targed config is 4+1, not 3+1.

George
Volunteer tester
Send message
Joined: 14 Oct 08
Posts: 100
Credit: 435,680
RAC: 0
United States
Message 843017 - Posted: 21 Dec 2008, 12:19:36 UTC
Last modified: 21 Dec 2008, 12:19:56 UTC

i have this working on a duel core and two gpu's
going good so far do two ap's and two mb's at the same time
look here

and
12/21/2008 7:03:04 AM||CUDA devices: GeForce GTX 260, nForce 780a SLI
ty for all the hard work

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,484,296
RAC: 8,232
Russia
Message 843021 - Posted: 21 Dec 2008, 12:35:33 UTC - in response to Message 843017.

Fine!
It's just as it intended to be - many cores, many co-processors and all busy with SETI, very nice picture :)

George
Volunteer tester
Send message
Joined: 14 Oct 08
Posts: 100
Credit: 435,680
RAC: 0
United States
Message 843022 - Posted: 21 Dec 2008, 12:43:04 UTC

Ya once i get two 8600gts back from a rma i will have 4gpu's in this computer. Well that be two much for the cpus cores to handle. where as I should put them in a different computer.

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 843026 - Posted: 21 Dec 2008, 13:08:37 UTC - in response to Message 843009.

At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing).


Well 3+1 mode should run just similar for stock CUDA app and my mod.
The key meaning of mod is to allow adequate GPU feeding while all CPU cores are busy with other BOINC tasks. Stock CUDA app suffer in this mode from lack of CPU. Wall clock times for stock app increase very much and GPU becomes almost idle (you can see it measuring GPU temperature). My mod will keep GPU busy in such config too. If you free whole core to feed GPU you no need any mods. But you will lose whole core...
So mod's targed config is 4+1, not 3+1.


Interestingly 3+1 mode has the gpu at 61 degrees, when I switched back to 4+1 mode its down to 59 degrees so obviously get a little bit more out of it when less load on the cpu.

Will see how my remaining Einsteins fare when they try and run 5 of them, normally they take 8 hrs 38 mins so will see how it handles them when it runs out of cuda work.
____________
BOINC blog

George
Volunteer tester
Send message
Joined: 14 Oct 08
Posts: 100
Credit: 435,680
RAC: 0
United States
Message 843027 - Posted: 21 Dec 2008, 13:15:26 UTC - in response to Message 843026.

Interestingly 3+1 mode has the gpu at 61 degrees, when I switched back to 4+1 mode its down to 59 degrees so obviously get a little bit more out of it when less load on the cpu.

Will see how my remaining Einsteins fare when they try and run 5 of them, normally they take 8 hrs 38 mins so will see how it handles them when it runs out of cuda work.

was beaten to this buti found a fix for the gpu not getting the data it needs
there this line in the file app_info.xml just chang this
<max_ncpus>0.040000</max_ncpus>
to
<max_ncpus>0.080000</max_ncpus>
its in there 3 times

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 938
Credit: 23,774,871
RAC: 72,611
Australia
Message 843042 - Posted: 21 Dec 2008, 13:59:35 UTC

I have raised Trak #802 for the BOINC guys to look at the scheduler in regard to 4+1 scheduling. Don't hold your breath.

Link is here
____________
BOINC blog

1 · 2 · 3 · 4 . . . 25 · Next

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization

Copyright © 2014 University of California