CUDA MB V12b rebuild supposed to work with Fermi GPUs

Author	Message
Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 990268 - Posted: 18 Apr 2010, 22:12:01 UTC Last modified: 18 Apr 2010, 22:20:27 UTC I made the change to the coprocessor section of the app_info.xml file to 0.5 and now am processing 2 WU's per GPU (6 total) at the same time. Thus far the impact has been that each WU takes about 3m30s - 4m longer to complete and CPU load also has been raised up to about 44% on average (each task has its own core) And I have set the processor priority to "Above Average" since I am running Server 2k8 - Processor Affinity has not been changed at this time - but might in the future to remove OS overhead. To put this into perspective the CPU's are dual socket quad core Xeon x5470's running at 3.8Ghz (FSB of 380mhz) so there is no lack of performance from the CPU's. Average time to complete WU's previously was around 10m - 10m30s. It appears that the GPU's previously were not worked as hard as they could have been with just one running task per card. ID: 990268 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 990322 - Posted: 19 Apr 2010, 3:18:42 UTC - in response to Message 990268. Last modified: 19 Apr 2010, 3:18:57 UTC They definitely didn't work at their best. Sometimes kernel call can't load all available cores even for 9600GSO. No wonder Fermi was underloaded. But with their simultaneous run of different kernels situation could be improved a lot. Running 2 tasks per GPU gave too big overhead to be effective for my GPUs (9400GT/9600GSO), but it seems with Fermi situation could change. It seems you don't run CPU-only SETI apps ? Then maybe it's worth to run with even lower coproc value like 0.33 or even 0.25 ID: 990322 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 990328 - Posted: 19 Apr 2010, 3:48:06 UTC - in response to Message 990322. Considering that twice as much work was done in an extra 3-4 min I would say that was accurate. Right now I'm not running any cpu apps until we get things together for the GPU's - right now that seems more important give the amount of work that they can do. Better to load them up as effectively as possible - better bang for the buck. I'll change the coproc level when I log back into the machine and see how it goes. Todd ID: 990328 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 990330 - Posted: 19 Apr 2010, 3:57:40 UTC The increase to .25 was too much I think given that we are allocating 1 core to each cuda task - my cpu usage went to 98% so I backed it down to .375 (1 core to each task) and now it is running about 65% - 70% load. Processing times seem about the same as before but it will take a bit of monitoring to see an average. ID: 990330 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 990473 - Posted: 19 Apr 2010, 17:49:56 UTC Last modified: 19 Apr 2010, 17:50:43 UTC So do I understand correctly that the best operation is currently with 3 active tasks per GPU (9 total on the 3 cards) using the V13 experimental app ? Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 990473 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 990477 - Posted: 19 Apr 2010, 17:58:44 UTC Good day Jason - I changed it back to .5 after seeing a large amount of tasks completing in 25sec - but this could have been caused from not rebooting for a while. I don't know if there are memory leakage issues or not that might be a factor. With the current batch of tasks that are processing they are running 3m50s (if they are running high priority) and 8m30s if not. But overall it does seem to be running well - of course looking at the tasks with a trained eye might reveal something different. Todd ID: 990477 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 990485 - Posted: 19 Apr 2010, 18:10:03 UTC - in response to Message 990477. I don't know if there are memory leakage issues or not that might be a factor... Possible, but not likely in the build, since I run the2.3 equivalent myself without seeing it crop up. Definitely possible in the drivers though. I read in your most recently reported tasks: clockRate = 810000 Is this figure correct ? If so I take it that's OC'd. Be aware that if you're loading multiple tasks extra heat might be generated through heavy resource usage, and cache contention within the GPUs might also make more heat and stress the RAM more. If this is occurring you might be able to drop the clocks a fraction yet see things speed up a bit. Counterintuitive I know, but I have read OCer reports that the memory controller on the crds might be a weak link at this point in the game. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 990485 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 990505 - Posted: 19 Apr 2010, 19:00:57 UTC - in response to Message 990485. Ok - was just going off on a tangent with the memory leak idea. Mainly because a reboot corrected the problem and everything was fine after that. The cpu speed is @ 3.8Ghz and being that these are the previous generation of the Xeon processors -(x5470) the memory controller is not in the cpu and the ram is running locked at 667mhz. The GPU's on the other hand are running stock freq's and voltages -those have not been changed. Before doing any sort of OC'ing of those the first rule would be for it to be stable at stock settings. Cooling shouldn't be significant issue since the cards run at about 65c at load on each - the cpu's have much lower utilization than they did when running the CPU apps and even ramp down via speedstep from 3.8ghz to 2.2Ghz. (OC max stable was 4.4Ghz on the CPU x 8 cores) Really there was more heat/stress in the previous configuration. Overall this has been a incredibly stable machine (it better be!) with nothing but the top notch components that can be had with ample headroom taken into consideration. Todd ID: 990505 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 990506 - Posted: 19 Apr 2010, 19:05:54 UTC - in response to Message 990505. All good, I'll watch the stats/results for a few days & see if I can spot patterns suggesting which way to go with development... while reading up on the optimisation guidelines for these things. Cheers, Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 990506 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 992217 - Posted: 27 Apr 2010, 4:11:41 UTC Hey Guys, The system seems to be running as it should for the last week since we went with the V13 build and I just wanted to check in with you to see if there had been any further developements on the code you were working on. Hope you all have a good week - Todd ID: 992217 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 992224 - Posted: 27 Apr 2010, 5:30:58 UTC - in response to Message 992217. Last modified: 27 Apr 2010, 5:58:27 UTC Hey Guys, The system seems to be running as it should for the last week since we went with the V13 build and I just wanted to check in with you to see if there had been any further developements on the code you were working on. Hope you all have a good week - Todd Hi Todd, Have you tried the fermi app in beta for comparison yet ? From what we've seen it looks pretty good too, and would use less cpu than my experimental V13. My V13 was a branch off Raistmer's V12 builds for me exploring the VLAR problems, and served its purpose narrowing down & understanding a lot of pulsefinding and other related issues toward further development, but wasn't specifically designed or optimised for fermi (or any specific hardware in particular). It happened to be the first build we had that appeared to work on your cards. AFAIK nVidia haven't released/posted their newest code yet so we can see what changes they made to stock to make it work, and those changes might make V12b functional soon. I'll be continuing down some exploratory paths from my branch, but expect those to be extended experiments, with lots of wrong turns & roadblocks, and be travelled in slow motion due to personal time constraints. Cheers, Jason [Edit:] I do see you've had a whopping run of validated VLAR tasks, which do work on V13 but possibly won't be the best way to maximise throughput for the cards. Perhaps you could try rescheduling those to a CPU app using Marius' rescheduler 1.9. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 992224 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 992316 - Posted: 27 Apr 2010, 15:14:45 UTC Jason, where can I get the Fermi beta app? My thought is to move two of the Fermi cards to a different machine to cut down on heat and noise at my home. Not to mention that at my office I do get free power/and cooling (included in my rent) With summer here in the states fast approaching I'm not that interested in paying a fortune to keep my home office cool enough to run this monster 24/7. Right now I'm not running any CPU apps on that machine so there is a great deal of headroom for performance to increase. Also I'm getting my 32 Core + 32 Hyperthread machine ready to deploy and the Fermi cards would go very nicely in that box - there is no way I'm going to run that at home either as it just makes too much noise. Although I should expect that since it is a quad socket server :) Either way I am going to be making some changes when time allows - it has been a very busy spring here with many projects taking priority. Thanks for the info ID: 992316 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 992317 - Posted: 27 Apr 2010, 15:22:54 UTC - in response to Message 992316. Last modified: 27 Apr 2010, 16:16:59 UTC Jason, where can I get the Fermi beta app? ... Good question, I lost my bookmarks to the download severs & will go looking for them momentarily. In the meantime hopefully someone will beat be to the punch, and has the server link handy. Yes those kindof monster machines would be well out of my budget, and even household power capacity & climate tolerance. Best of luck wirh getting them going. Jason [Later:] should be http://boinc2.ssl.berkeley.edu/beta/download I reckon, but seems to be down atm :S, so not sure. You would need the application exe, and likely the same fftw dll as previous apps (except V13 didn't need that), and the two appropriate Cuda v3 dlls. It might be easier to attach the host to beta & see what it grabs & runs when beta has some work. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 992317 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 992325 - Posted: 27 Apr 2010, 16:20:26 UTC - in response to Message 992317. should be http://boinc2.ssl.berkeley.edu/beta/download I reckon, but seems to be down atm :S, so not sure. You would need the application exe, and likely the same fftw dll as previous apps (except V13 didn't need that), and the two appropriate Cuda v3 dlls. That looks like the right path, but I'm getting a 403 forbidden when I try to see the directory listing. Ever since the web server hack a couple of months ago, I think they've been ultra-cautious with the directory permissions. So you can download something if you know exactly what you're asking for, but you need to know the filename in advance. It would be useful if someone with a Fermi card could attach to beta, allow the automated download of the stock app, and then post the exact filename here for people to use (or the fully qualified path+name from client_state). Otherwise, every Fermi card installed here is just going to waste time, energy, workunits and database space with the incompatible apps supplied at the moment. ID: 992325 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 992326 - Posted: 27 Apr 2010, 16:22:19 UTC We are getting ready for the weekly downtime - so it may be offline at the moment. ID: 992326 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 992330 - Posted: 27 Apr 2010, 16:24:12 UTC - in response to Message 992326. We are getting ready for the weekly downtime - so it may be offline at the moment. That would be a 404 not found. 403 forbidden is a deliberate (probably permanent) project choice. ID: 992330 ·

Todd Hebert Volunteer tester Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0	Message 992332 - Posted: 27 Apr 2010, 16:26:13 UTC - in response to Message 992330. This is what I am getting The service is not available. Please try again later. ID: 992332 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 992335 - Posted: 27 Apr 2010, 16:29:02 UTC - in response to Message 992332. Me too. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 992335 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 992337 - Posted: 27 Apr 2010, 16:30:19 UTC - in response to Message 992332. Last modified: 27 Apr 2010, 16:30:51 UTC Actually, uploads and downloads are normally allowed during maintenance: you hust don't see any downloads, because you can't contact the scheduler to get a work allocation. ID: 992337 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 992344 - Posted: 27 Apr 2010, 21:52:13 UTC The name is either setiathome_6.09_windows_intelx86__cuda_fermi.exe or setiathome_6.09_windows_intelx86_cuda_fermi.exe (one underscore less than the previous one). Yet both will yield the "Service is not available. Please try again later." message. ID: 992344 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.