Modified SETI MB CUDA + opt AP package for full GPU utilization

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 842760 - Posted: 20 Dec 2008, 22:20:16 UTC Last modified: 20 Dec 2008, 23:09:10 UTC 1) This package (Raistmer's_opt_package.rar) can be downloaded from http://files.mail.ru/CIXVXO . It also can be downloaded from this post on Lunatics forums: http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html;topicseen#msg12177 Targed hosts: Windows x86, SSE3 support for AP, CUDA support for MB. 2) It consist of modified SETI MB CUDA and current SSE3 opt SETI AP binaries with corresponding app_info.xml file 3) Modification that I have done increases CUDA worker thread priority in SETI MB CUDA that allows more fully GPU usage while keeping all CPU cores busy too. That is, using of this build can increase total performance of your host for BOINC tasks. 4) MB binaries based on CUDA MB sources recived from Eric (with small modification), opt AP is just repacking of current Lunatics opt AP release (SSE3 build). 5) It's not "official" Lunatics release so you could blame only me (or yourself, or BOINC bugs and so on and so forth) for any issues you encounter. 6) I still can' check AP+MB work (no AP tasks here) but it works just fine with CUDA MB + einstein@home combination. 7) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU. 8) Installation instructions are the same as for any opt app: stop BOINC, decompress all files in archive into SETI project directory, restart BOINC. ID: 842760 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842844 - Posted: 21 Dec 2008, 2:36:36 UTC - in response to Message 842760. Last modified: 21 Dec 2008, 2:37:57 UTC 1) This package (Raistmer's_opt_package.rar) can be downloaded from http://files.mail.ru/CIXVXO . It also can be downloaded from this post on Lunatics forums: http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.msg12177.html;topicseen#msg12177 Targed hosts: Windows x86, SSE3 support for AP, CUDA support for MB. 2) It consist of modified SETI MB CUDA and current SSE3 opt SETI AP binaries with corresponding app_info.xml file 3) Modification that I have done increases CUDA worker thread priority in SETI MB CUDA that allows more fully GPU usage while keeping all CPU cores busy too. That is, using of this build can increase total performance of your host for BOINC tasks. 4) MB binaries based on CUDA MB sources recived from Eric (with small modification), opt AP is just repacking of current Lunatics opt AP release (SSE3 build). 5) It's not "official" Lunatics release so you could blame only me (or yourself, or BOINC bugs and so on and so forth) for any issues you encounter. 6) I still can' check AP+MB work (no AP tasks here) but it works just fine with CUDA MB + einstein@home combination. 7) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU. 8) Installation instructions are the same as for any opt app: stop BOINC, decompress all files in archive into SETI project directory, restart BOINC. Hi Raistmer, I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being. BOINC blog ID: 842844 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 842905 - Posted: 21 Dec 2008, 5:49:34 UTC Last modified: 21 Dec 2008, 5:50:01 UTC To both MarkJ and Raistmer....... What version of NVIDIA drivers are you using? Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here. Boinc....Boinc....Boinc....Boinc.... ID: 842905 ·

popandbob Volunteer tester Send message Joined: 19 Mar 05 Posts: 551 Credit: 4,673,015 RAC: 0	Message 842928 - Posted: 21 Dec 2008, 7:21:38 UTC I'd recommend the latest stable (non beta) drivers available for your vid card. Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957 Or Good Shop? http://www.goodshop.com/?charityid=888957 ID: 842928 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842940 - Posted: 21 Dec 2008, 8:34:23 UTC - in response to Message 842905. To both MarkJ and Raistmer....... What version of NVIDIA drivers are you using? Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here. I used 180.48. Before that I was running the one supplied on CD with the card, which was an even earlier version. Things seemed to go a little faster after I upgraded. BOINC blog ID: 842940 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842943 - Posted: 21 Dec 2008, 8:43:14 UTC - in response to Message 842844. Last modified: 21 Dec 2008, 8:45:04 UTC I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being. I gave it (the Raistmer version) a bit more of a run in 4+1 mode. However after it had fnished all its cuda work it then tried to run 5 other tasks. That is while it had cuda work it did other stuff (Astropulse & Einstein) plus the Seti cuda, but after running out of cuda work it was still trying to run 5 tasks, which would mean 2 tasks have to compete for a core or take it in turns. They don't progress too well. I have backed out the cc_config change, which of course gets me back to 3+1 mode. Still running the Raistmer version though to get some speed comparisons. BOINC blog ID: 842943 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 842950 - Posted: 21 Dec 2008, 9:02:34 UTC - in response to Message 842844. Last modified: 21 Dec 2008, 9:44:59 UTC Hi Raistmer, I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being. Probably you did short WUs before and now recived longer ones. Could you give link on your host as some factual evidence of such behavior? I stress again my build is CUDA MB version + priority increase nothing more. Can't imagine how it could be slower :) Just one possibility - I have outdated CUDA sources and 6.06 has some speed gain over prev build. It would be nice to check this looking on your host log. ADDITION: Yesterday my GPU finishes task in ~20 min wall clock time. Today it fnishes task just in 7 minutes of wall clock time. It means (just as it's with CPU version) that GPU task run time varies widely with AR of WU. So look on claimed credits for task and on task result itself (you need true angle range field) to check if you compare the same or different by AR tasks. ID: 842950 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 842952 - Posted: 21 Dec 2008, 9:06:05 UTC - in response to Message 842943. Last modified: 21 Dec 2008, 9:08:38 UTC I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being. I gave it (the Raistmer version) a bit more of a run in 4+1 mode. However after it had fnished all its cuda work it then tried to run 5 other tasks. That is while it had cuda work it did other stuff (Astropulse & Einstein) plus the Seti cuda, but after running out of cuda work it was still trying to run 5 tasks, which would mean 2 tasks have to compete for a core or take it in turns. They don't progress too well. I have backed out the cc_config change, which of course gets me back to 3+1 mode. Still running the Raistmer version though to get some speed comparisons. Sure it will do that (not my version but BOINC with modified ncpu value). BOINC thinks your host has 5 CPUs instead of 4 and behaves accordingly. This artifical CPU number increase is just workaround for BOINC inability of correct scheduling of CUDA tasks. Surely it should be corrected in BOINC itself to avoid such hacks. And about too well or not too well progress: do you have some numbers? What relative slowdown you see? Just remember you have 4 cores. So if these 4 cores will do 5 tasks surely wall clock times will be bigger for each task but you will report 5 tasks at once, notjust four. Look at CPU times for CPU tasks, are CPU times increased? ID: 842952 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 842954 - Posted: 21 Dec 2008, 9:13:33 UTC - in response to Message 842905. Last modified: 21 Dec 2008, 9:42:23 UTC To both MarkJ and Raistmer....... What version of NVIDIA drivers are you using? Raistmer, what version do you recommend? I have both 180.48 and 180.84 available here. I can't recommend anything regarding drivers cause didn't make any comparison. My host using 180.48 now. It works pretty fine but VLAR tasks cause driver crash with restart under Vista x86. ID: 842954 ·

Byron S Goodgame Volunteer tester Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0	Message 842981 - Posted: 21 Dec 2008, 10:30:21 UTC Last modified: 21 Dec 2008, 10:32:09 UTC Hi Raistmer, Ran Einstein and CUDA with the new modified app, ran smooth, no problems to report. Much like the runs I had in Beta setting the priority to real time. Thanks, and appreciate the hard work you're putting in. ID: 842981 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 842996 - Posted: 21 Dec 2008, 11:09:02 UTC - in response to Message 842981. Hi Raistmer, Ran Einstein and CUDA with the new modified app, ran smooth, no problems to report. Much like the runs I had in Beta setting the priority to real time. Thanks, and appreciate the hard work you're putting in. Thanks. Report w/o any problems is good one too :) I had few BSODs with RealTime priority mode on my host so even reverted back to default priority for few days. Current build with increases priority runs here w/o any problems much longer already than it was to get BSOD before and no problems. And (now main is better place for experiments than beta, we can see validation results here and still no one of my results has defined status on beta) results are mostly valid (exception - computation errors on VLAR, but it's 6.06 CUDA MB bug/feature ;) too). ID: 842996 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842997 - Posted: 21 Dec 2008, 11:09:31 UTC - in response to Message 842950. Hi Raistmer, I gave it a run this morning and while it works the wall-clock (or elapsed time) seems to be longer than the stock 6.05 cuda app. I tried both as 3+1 and 4+1 on my quaddie. I have reverted back to stock for the time being. Probably you did short WUs before and now recived longer ones. Could you give link on your host as some factual evidence of such behavior? I stress again my build is CUDA MB version + priority increase nothing more. Can't imagine how it could be slower :) Just one possibility - I have outdated CUDA sources and 6.06 has some speed gain over prev build. It would be nice to check this looking on your host log. ADDITION: Yesterday my GPU finishes task in ~20 min wall clock time. Today it fnishes task just in 7 minutes of wall clock time. It means (just as it's with CPU version) that GPU task run time varies widely with AR of WU. So look on claimed credits for task and on task result itself (you need true angle range field) to check if you compare the same or different by AR tasks. Hi Raistmer, I was looking in the messages tab in BOINC and had to find the start/finish times for the tasks. As you say they vary by angle range so possibly I had a bunch of big ones. I did notice some 7 minute ones earlier today. At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing). My host with the cuda card is called Qui-Gon (from Star Wars) and is here. I added the card on the 20th, so ignore any work before the 20th of Dec. I had a bunch of wu error out yesterday, but this was using the 6.05 app (ie stock) and I understand it doesn't cope too well with VLAR or VHAR, which is probably why you had video driver issues. I also had some just hang. The only way to get them going was to shutdown/start up BOINC and then they error out. Nothing you've done, just bugs with the current cuda app. Hopefully 6.06 will correct these issue. BOINC 6.5 is supposed to address the issue of 4+1 scheduling. This is mentioned in another couple of message threads. Looks like in their haste to release a cuda-aware BOINC they didn't get around to scheduling issues. Thanks for all your efforts with this one and the other optimized apps. BOINC blog ID: 842997 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 843001 - Posted: 21 Dec 2008, 11:21:15 UTC - in response to Message 842997. .... (this would be right up Fred's alley for graphing). Looks like I might have to re-visit Seti_Pal in the New Year. ATM it turns up its toes when it sees an AP WU. Should be OK if the CPU cores are being used for a different project altogether though. F. ID: 843001 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 843009 - Posted: 21 Dec 2008, 11:36:47 UTC - in response to Message 842997. At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing). Well 3+1 mode should run just similar for stock CUDA app and my mod. The key meaning of mod is to allow adequate GPU feeding while all CPU cores are busy with other BOINC tasks. Stock CUDA app suffer in this mode from lack of CPU. Wall clock times for stock app increase very much and GPU becomes almost idle (you can see it measuring GPU temperature). My mod will keep GPU busy in such config too. If you free whole core to feed GPU you no need any mods. But you will lose whole core... So mod's targed config is 4+1, not 3+1. ID: 843009 ·

George Volunteer tester Send message Joined: 14 Oct 08 Posts: 100 Credit: 435,680 RAC: 0	Message 843017 - Posted: 21 Dec 2008, 12:19:36 UTC Last modified: 21 Dec 2008, 12:19:56 UTC i have this working on a duel core and two gpu's going good so far do two ap's and two mb's at the same time look here and 12/21/2008 7:03:04 AM\|\|CUDA devices: GeForce GTX 260, nForce 780a SLI ty for all the hard work ID: 843017 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 843021 - Posted: 21 Dec 2008, 12:35:33 UTC - in response to Message 843017. Fine! It's just as it intended to be - many cores, many co-processors and all busy with SETI, very nice picture :) ID: 843021 ·

George Volunteer tester Send message Joined: 14 Oct 08 Posts: 100 Credit: 435,680 RAC: 0	Message 843022 - Posted: 21 Dec 2008, 12:43:04 UTC Ya once i get two 8600gts back from a rma i will have 4gpu's in this computer. Well that be two much for the cpus cores to handle. where as I should put them in a different computer. ID: 843022 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 843026 - Posted: 21 Dec 2008, 13:08:37 UTC - in response to Message 843009. At the moment i'm running your app but in 3+1 mode. I can adjust as needed, so let me know if you'd like to do a run in different modes (this would be right up Fred's alley for graphing). Well 3+1 mode should run just similar for stock CUDA app and my mod. The key meaning of mod is to allow adequate GPU feeding while all CPU cores are busy with other BOINC tasks. Stock CUDA app suffer in this mode from lack of CPU. Wall clock times for stock app increase very much and GPU becomes almost idle (you can see it measuring GPU temperature). My mod will keep GPU busy in such config too. If you free whole core to feed GPU you no need any mods. But you will lose whole core... So mod's targed config is 4+1, not 3+1. Interestingly 3+1 mode has the gpu at 61 degrees, when I switched back to 4+1 mode its down to 59 degrees so obviously get a little bit more out of it when less load on the cpu. Will see how my remaining Einsteins fare when they try and run 5 of them, normally they take 8 hrs 38 mins so will see how it handles them when it runs out of cuda work. BOINC blog ID: 843026 ·

George Volunteer tester Send message Joined: 14 Oct 08 Posts: 100 Credit: 435,680 RAC: 0	Message 843027 - Posted: 21 Dec 2008, 13:15:26 UTC - in response to Message 843026. Interestingly 3+1 mode has the gpu at 61 degrees, when I switched back to 4+1 mode its down to 59 degrees so obviously get a little bit more out of it when less load on the cpu. Will see how my remaining Einsteins fare when they try and run 5 of them, normally they take 8 hrs 38 mins so will see how it handles them when it runs out of cuda work. was beaten to this buti found a fix for the gpu not getting the data it needs there this line in the file app_info.xml just chang this <max_ncpus>0.040000</max_ncpus> to <max_ncpus>0.080000</max_ncpus> its in there 3 times ID: 843027 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 843042 - Posted: 21 Dec 2008, 13:59:35 UTC I have raised Trak #802 for the BOINC guys to look at the scheduler in regard to 4+1 scheduling. Don't hold your breath. Link is here BOINC blog ID: 843042 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.