Message boards :
Number crunching :
CUDA MB V12b rebuild supposed to work with Fermi GPUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next
Author | Message |
---|---|
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Well, AR 0.44, midrange one, that usually the best for GPU, fails with -9 on 480 and Lunatics site down (at least I can't reach it). 197.45 doesn't support the GTX470 or GTX480 according to nvidia's doco. They recommend 197.41 for the 400 series at the moment. BOINC blog |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Thanks for info. This way disabled then. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
@Todd Hebert Please, look PM with link to Jason's V13 hybrid build linked vs CUDA 3.0 SDK. Maybe it would help with overflow issue. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
Ok, will do here shortly and post back the behavior that is found with the build. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I looked at last results - it seems V13 works OK, it produces correct results for ARs that were errors with V12b. Next step is to understand current host productivity. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
fascinating :D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
W-H-Oami Send message Joined: 6 Mar 10 Posts: 15 Credit: 168,510 RAC: 0 |
Question: The new Fermi GPU's are multi-tasking, multi-cored etc etc. Does that mean they will be able to run multi CUDA23's at the same time??? If so, how many??? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Question: The new Fermi GPU's are multi-tasking, multi-cored etc etc. Does that mean they will be able to run multi CUDA23's at the same time??? 'Probably' but unknown overhead at this stage .. theoretically up to 16, but there may not be enough RAM for that & overheads may be too high. Most likely the best approach will involve implementing concurrent kernel execution with a single Fermi specific App (increasing locality but reducing overhead). This is because the device is most certainly under-utlised with current builds. I've been looking at how to approach that for quite some time (basically since the fermi whitepaper release last september, or so), and once basic operation is confirmed/fixed then we can explore the options a bit more. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
I don't believe that would be possible - it would be very challenging to isolate the cores at that level. Just think how long it has taken normal applications to access multi-core cpu's in the correct fashion. But given the right sdk anything is possible - it would just take a long time - and then a new method would come along. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I don't believe that would be possible - it would be very challenging to isolate the cores at that level. Just think how long it has taken normal applications to access multi-core cpu's in the correct fashion. But given the right sdk anything is possible - it would just take a long time - and then a new method would come along. When the appropriate non-default mode concurrent streams are used (which they aren't yet), the driver & hardware is 'supposed' to pack the kernels 'Tetris-Style' to fill the execution resources. That's one of the major design progressions apparent in this architecture over any before, which traditionally could only execute one device kernel at a time. That meant most of the cores sit idle on for example a GTX285 running pulsefind kernels in a very low angle range task, which can get as narrow in execution width as one part of a core, and very long on execution time. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
Everything changes doesn't it :) The technology must progress and with it the complexity. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
BTW, for now it would be interesting to set coproc number to 0.5 and see how Fermi will perform with 2 V13 apps at once. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
LoL, let it run for a while, see if it actually works, then fiddle after ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
Let me know if you would like it changed - it only takes but a second. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
LoL, let it run for a while, see if it actually works, then fiddle after ;) Yeah, day-two in current mode to see stability and base performance, then tweaking :) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
That's up to you (your machine ;) ). A smattering of different angle ranges might paint some sort of picture we could analyse with the current setup. I suggest a day as is, then trying 0.5 (then if works maybe 0.25 ... ) that way we might be able to determine where the change occurred & gauge any change (if any). Already I've figured out that the high cpu usage is due to the PTX JIT compiler in the driver being used ... Embedding fermi native kernels in future builds will reduce that time. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
I think for the moment I will leave it as is currently and make a change later tonight. So for the next 6-8 hours it will be the same to maintain stability. I can tell you this much - the fans on these cards are LOUD when running at 100% - not something I would want to sit next to all day. And my ears are tempered from working in server rooms. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
When I built V12b with CUDA 3.0 two targets were generated, for 1.0 and 2.0 compute capabilities. It seems like not JIT compiler used for Fermi, at least if one build using provided build rule file. |
W-H-Oami Send message Joined: 6 Mar 10 Posts: 15 Credit: 168,510 RAC: 0 |
'Probably' but unknown overhead at this stage .. theoretically up to 16, but there may not be enough RAM for that & overheads may be too high. Perhaps we should ask nVidia to design a Fermi with 256 or 512K RAM per SM |
Todd Hebert Send message Joined: 16 Jun 00 Posts: 648 Credit: 228,292,957 RAC: 0 |
Wow! That would be expensive with 512 SM's per GPU and the target market would be very limited. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.