Message boards :
Number crunching :
Validation question
Message board moderation
Author | Message |
---|---|
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
Since bringing this computer back online it seems to be doing quite well. Perhaps I am oversensitive about errors but is it possible I am correct on this one and the validated ones are both incorrect? Just asking. http://setiathome.berkeley.edu/workunit.php?wuid=1697885327 SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
With the other two both overflowing the cuda32 app, on GTX9x0 hardware? Yes, I think there's every likelihood that yours was the correct solution. |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
For me that is good to hear, for the project, maybe, now so good. Thanks SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Zule Send message Joined: 1 Jul 06 Posts: 52 Credit: 84,436,096 RAC: 0 |
Might this relate to a post I just made. http://setiathome.berkeley.edu/forum_thread.php?id=76676 Maybe there is a problem with cuda32 on 9x0 hardware? Both those machines and mine are failing all cuda32 with -9 overflows.. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
I've sent a PM to the application developer, who is perhaps in the best position to advise. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Looking into it. I'm not aware of any specific issue running Cuda 3.2 (apart from it being slower for these GPUs) though things can change/break. I'll need to re-examine in the context of recent drivers etc. The 980 in the other thread seems to be failing on Cuda 5.0 as well. From recent experience in GPU user's group, could be a power supply issue, though not looked enough into it. Though extremely efficient, this generation seems to want good quality clean power, and some headroom over the base gaming spec is warranted for number crunching. [Edit:] yes different applications can load different circuits... differently, and marginal operation is often OK for games etc. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Zule's managed to rule out power via PM (I'm convinced, Thanks), and some error codes there appear to indicate executable damage (either Cuda DLLs, systemm, drivers, or app executable). Since different apps don't have the symptoms there, and the user's switching to Lunatics for a fixed app version, which should overwrite the DLLs, that should sort that out (fingers crossed) Now onto the OP's line.. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I've sent a PM to the application developer, who is perhaps in the best position to advise. Richard, This looks like a good one to grab (if available) with signals close to threshold. There are minute precision differences between Cuda 3.2's FFT library, and newer ones (which are improved). A lot of signals around threshold, particularly with an overflow result, can also expose small differences elsewhere. For example Cuda 3.2 predates some instructions that the recent driver would be compiling to. In all, in this case if so, it's more a reflection on the 'threshold problem' that Eric's aware of, and might be addressed in multibeam 8 (not sure if you were in on that) Running some tests here, but in essence all three overflow results would then be technically 'correct', but pushing precision limits of the reporting and validation mechanism itself. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Since bringing this computer back online it seems to be doing quite well. Perhaps I am oversensitive about errors but is it possible I am correct on this one and the validated ones are both incorrect? I'd definitely keep an eye on it, if it happens again, especially in increasing frequency. In the genuine overflow result case it's a tough call as described to Richard in a little more detail. If it's not something that's 'broken' per se, but instead exposing design limitations, then nothing to worry about. Still setting up the checks here all the same. [Edit:] going to take a couple of hours, since I need to run some fresh CPU reference results for some test WUs. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I've sent a PM to the application developer, who is perhaps in the best position to advise. Hold the phone, something going on here :) [Edit:] mailed Eric while my test continues munching away Hi Eric, [Edit2:] Eric's responded that sure is possible, so will probably happen. I'll be digging deeper over the weekend, in between a fair number of responsibilities. [Edit3:] stock 4.2 and 5.0, along with various unreleased builds appear to be unaffected. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.