Summarize Cuda?

Author	Message
Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0	Message 842203 - Posted: 19 Dec 2008, 21:31:20 UTC - in response to Message 842152. Note that s@h on Boinc should really be considered as 'experimental'. We were warned (on these forums at least) that the CUDA stuff was very new and 'exciting' (in all ways)! Very true; I thought I'd give it a try, and found it far more 'exciting' than I'd like my science contribution to be. I didn't really expect a lot out of it, so my disappointment isn't too great, but perhaps some hope for a reasonably polished launch is an acceptable thing to ask? The biggest two issues for me, really, are the DCF and only one Enhanced WU can be processed at a time. They are, in fact, dealbreakers as I'm fairly well convinced now that my machine can process more overall SETI data using the optimised apps, plus the DCF issue causes real problems for those who want to crunch Astropulse as well (as I'm sure any reasonably well informed contributor with an acceptably strong computer would want to). I might join the beta project to see how 6.06 is going, but more likely I'll just hold off on CUDA until it can demonstrably improve overall SETI crunching efficiency for me. However, the present exposure should promote some rapid development and fixes. more likely it's a combination of Nvidias PR and the donation drive to attract as much users that fall for the gpu hype and squeeze money out of them ... Both very good points. Hopefully there'll be enough donations related to the latter point to enable the rapid development hoped for in the first! ID: 842203 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 842279 - Posted: 19 Dec 2008, 23:31:35 UTC The use of CUDA is breaking the "Resource Share" model as I have mentioned elsewhere and this is why the tracking numbers like DCF and others are not up to the task. Also there is no apparent mechanism to track the time used in the co-processor (GPU) this will potentially require changes to the API or a separate tracking method. Lastly, we are going to need to expand the way in which we can allocate the CPU AND GPU to be used on one project instead of the either or model that seems to currently exist. Well, it is still early days. I got more work from GPU Grid and now I need to see how it goes with the next couple tasks and then I may take a shot at running SaH for a bit to see how the two work together ... and then I can try the other two GPU tests we considered ... ID: 842279 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 842282 - Posted: 19 Dec 2008, 23:35:37 UTC I might join the beta project to see how 6.06 is going, but more likely I'll just hold off on CUDA until it can demonstrably improve overall SETI crunching efficiency for me. Instead of hours to run I disposed of 11 tasks about 9 minutes per task. So far 4 have completely validated one failed to compare with the standard app and the rest are still waiting for wingman action. I do agree that it is early days. But, we need to play and take the risks so that the buglets and problems can be identified. Though it DOES make sense to keep your head down until the early adopters have gotten the worst of it fixed ... ID: 842282 ·

Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0	Message 842314 - Posted: 20 Dec 2008, 1:28:04 UTC - in response to Message 842279. The use of CUDA is breaking the "Resource Share" model as I have mentioned elsewhere and this is why the tracking numbers like DCF and others are not up to the task. Also there is no apparent mechanism to track the time used in the co-processor (GPU) this will potentially require changes to the API or a separate tracking method. Lastly, we are going to need to expand the way in which we can allocate the CPU AND GPU to be used on one project instead of the either or model that seems to currently exist. Thanks for your comment. From what you say it sounds like I managed to fall foul of the broad issues that you highlight in this post. Instead of hours to run I disposed of 11 tasks about 9 minutes per task. So far 4 have completely validated one failed to compare with the standard app and the rest are still waiting for wingman action. So about 99 minutes for the lot? My CUDA capable box, using the AK SSE4.1 app, eats a typical ~15 credit S@H Enhanced task, similar to the majority of CUDA WUs I saw, in about 18 minutes - per CPU thread. My CPU typically has four threads processing S@H Enhanced WUs out of the eight it has available (fewer as I write as it's currently chewing on 4 Astropulse WUs), so giving a time for 11 WUs of around an hour. I can expect all 11 of the WUs processed by the AK to properly validate as well, unlike my CUDA experience. But, we need to play and take the risks so that the buglets and problems can be identified. Well, I played, and got burned. But I do hope that my feedback has been of some use. I am, by the way, very much for GPGPU technologies, but I will reserve the right to choose the processing methods that allows the Monolith to make the greatest contribution! ID: 842314 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 842340 - Posted: 20 Dec 2008, 3:01:51 UTC - in response to Message 842314. I am, by the way, very much for GPGPU technologies, but I will reserve the right to choose the processing methods that allows the Monolith to make the greatest contribution! I did not mean to imply other wise ... :) As to the other comments, yeah, wall clock time was about 9 minutes per task with CPU time pegged at roughly a minute. I installed 6.5.0 and finally got some more GPU Grid tasks and one of those is now happily in work. In that all the tasks that failed died virtually immediately, I have hopes. I DID move the other CUDA capable card into the one computer, a lower end 8500 and when I fired up the rig I actually had 10 tasks in work. SO, it would seem that if I did get two more GPU cards I could have the one machine running 11 tasks on 8 virtual CPUs and 3 GPUs (unless I tied the GPUs into one processing beast ... which is likely more common) ... Sadly, the low end card has now developed a REAL loud fan and so I put it back in the other machine where I need to see if I can get the fan rattle down to a dull roar (or replace the daratted card ... ID: 842340 ·

Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0	Message 842358 - Posted: 20 Dec 2008, 4:16:20 UTC - in response to Message 842340. I am, by the way, very much for GPGPU technologies, but I will reserve the right to choose the processing methods that allows the Monolith to make the greatest contribution! I did not mean to imply other wise ... :) Well, when I was writing that post, I realised that I may have been sounding more negative about the CUDA effort than I actually feel, so I wanted to set the record straight in at least one of the threads on the matter that I've contributed to. Good luck with your GPU fan woes! ID: 842358 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 842371 - Posted: 20 Dec 2008, 5:31:10 UTC - in response to Message 842358. Good luck with your GPU fan woes! Thanks ... :) It seems to have settled down ... it was $19 card (if I recall correctly) so it would not be a big deal to replace. Sadly, I will not be able to run out to Frys to buy a couple of those inexpensive 9800 GT cards with 1G VRAM as Nancy is up north till Tuesday and I am not sure if the day she is back will be long enough for me to squeeze in a trip... 4:26 in and the GPU Grid task is still running still running 9 tasks so the world is well ... ID: 842371 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842398 - Posted: 20 Dec 2008, 7:07:27 UTC - in response to Message 841539. The ncpus fields, are something to do with boinc's application scheduling that doesn't quite seem to work properly yet, but I gather determines how many apps to run (in concert with the coproc section). Leaving them out would probably imply the app needs a whole cpu core. so with that extra, probably will default to run separate apps on all the cores + the GPU every time properly. Unfortunately it doesn;t seem to work within the same application version domain, so on a quad that would probably mean 4x Astropulse + 1 x MBCuda. After a bit of experimenting it does indeed use a whole CPU if you leave it out, in fact I only had 2+1 running on my quaddie without it. I put it back in and its back to 3+1 at the moment (3cpu + 1gpu). I have Einsteins, Astropulses running on the cpu and Multibeam on the gpu at the moment. I believe the coproc stuff & extra fields have a minimum boicapi version they were introduced, so require functionality from Boinc not found in earlier versions. I dug this one out of the client state, or similar location, but seems the reasonable explanation. I left the <boincapi> out. It doesn't seem to need it. The <flops> has no effect I can discern yet, but I'd expect it to ultimately, in an updated boinc, be used for scheduling between multiple apps for the same & other projects, by calculating the best throughput combo available with the given CPUs & coprocessors installed. ... doesn't work yet AFAICT. I agree with you on this one, maybe for future use. I left that out too. In one of the other message threads the guys were suggesting <ncpus>5 in the cc_config.xml, I was going to experiment with that one next to see if I can get to 4+1 Speedwise it doesn't appear to be that fast, but its quite good at trashing a heap of work units :) BOINC blog ID: 842398 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842482 - Posted: 20 Dec 2008, 10:53:07 UTC - in response to Message 842398. In one of the other message threads the guys were suggesting <ncpus>5 in the cc_config.xml, I was going to experiment with that one next to see if I can get to 4+1 I tried this. It is now running 4+1. I will look at the times and see how they go. I also want to see what it does when it runs out of cuda work, hopefully goes back to only 4 tasks. I will report back later. BOINC blog ID: 842482 ·

mr.kjellen Volunteer tester Send message Joined: 4 Jan 01 Posts: 195 Credit: 71,324,196 RAC: 0	Message 842513 - Posted: 20 Dec 2008, 12:55:12 UTC - in response to Message 842482. If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. ID: 842513 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842522 - Posted: 20 Dec 2008, 13:13:03 UTC - in response to Message 842513. If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. Umm yep, it does indeed. So for those of you who want to try - Don't bother BOINC blog ID: 842522 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 842529 - Posted: 20 Dec 2008, 13:21:32 UTC - in response to Message 842513. If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. What would be nice is an option in BOINC Manager to set the amount of CPU cycles/percentage of CPU you want to give to the CPU for the GPU to work with, while BOINC takes up the rest of the CPU for BOINC related work. I am running Folding@Home on my ATI card, set it to use 30% of my CPU(s). This translates into FAH_Core11.exe (the GPU client) using up to 25% of my CPUs, while the rest goes to the BOINC client. But alas, first there will be an option to set BOINC to only use the CPUs or the GPU (in test now at Seti Beta, if I am not mistaken). Then there will be a BOINC in which you can set this through the cc_config.xml file. And then way way later, there will once upon a time be a version of BOINC that hopefully can do both at the same time. ID: 842529 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 842534 - Posted: 20 Dec 2008, 13:32:41 UTC - in response to Message 842513. If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. Lunatics are experimenting with a potential fix for this, which could be incorporated into the SETI application (if the project staff are listening, give Raistmer a call). That would allow the 4+1, 8+1 etc. settings to work properly without waiting for BOINC support. ID: 842534 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 842678 - Posted: 20 Dec 2008, 20:08:18 UTC I'm the kind of person who has to learn something for himself. I installed the app_info file recomended by others and got my NVIDIA CUDA working. After about 6 hour became fustrated with the erratic operation of the system. Actually it's just plain flaky. Reverted back to the trusted Lunatics setup for AP and MB. I hear they are working on incorporating CUDA as a sort of co-processor to their allready fine work. Hope they are successful. Go Lunatics!! Boinc....Boinc....Boinc....Boinc.... ID: 842678 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 842691 - Posted: 20 Dec 2008, 20:29:22 UTC - in response to Message 842534. If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. Lunatics are experimenting with a potential fix for this, which could be incorporated into the SETI application (if the project staff are listening, give Raistmer a call). That would allow the 4+1, 8+1 etc. settings to work properly without waiting for BOINC support. So maybe it would be possible to let run 5 threads Enhanced (MB) and a Quad with one GPU? Or it will be only possible to let run AP on the CPU? ID: 842691 ·

The Gas Giant Volunteer tester Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0	Message 842692 - Posted: 20 Dec 2008, 20:30:19 UTC I'm currently running 6.5.0 on my quaddie with stock settings and BOINC is running 4 CPU's (Malaria Control) and 1 GPU (SETI@home). Now what would be really good is if I could get 4 CPU's (optimised SETI@home app) and 1 GPU (SETI@home). That would be SMOKIN'! ID: 842692 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 842851 - Posted: 21 Dec 2008, 3:06:21 UTC - in response to Message 842691. Last modified: 21 Dec 2008, 3:07:17 UTC If that work out as it did in beta, you will have completion times in hours instead of minutes. In my experience CUDA needs its own core. Lunatics are experimenting with a potential fix for this, which could be incorporated into the SETI application (if the project staff are listening, give Raistmer a call). That would allow the 4+1, 8+1 etc. settings to work properly without waiting for BOINC support. So maybe it would be possible to let run 5 threads Enhanced (MB) and a Quad with one GPU? Or it will be only possible to let run AP on the CPU? The potential fix is described in this message It will run AP (or other projects) on the cpu and Seti MB on the gpu. The current MB app (6.05) only runs on the gpu or you can stick with the 6.03 app running on the cpu, but not both MB apps. I saw in another message thread an email from Eric to Dr A asking for the dual functionality and Dr A was saying it wouldn't be in BOINC for at least a month. Besides that they (Seti) would need to either have 2 science apps (as we do now) or merge them into one that could run on either the cpu or gpu. BOINC blog ID: 842851 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.