Message boards :
Number crunching :
CUDA and the BLUE SCREEN OF DEATH
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I distinctly disagree with any suggestion that CUDA should be paired with CUDA now, that would put too many doubtful results in the master science database. Once the bugs are fixed I doubt many will care what pairings happen, but more flexibility would at least make sense.Joe By "doubtful" I meant that the bugs have not yet been fully characterized, many of the CUDA results do validate against CPU results but a significant fraction don't. Having comparisons with CPU results provides both the validation needed to keep the science database reasonably clean, plus data to determine what bugs need to be squashed. What comprises "too many" is of course open to judgement, but it's clearly better to have 3% of 3% uncertainty rather than the straight 3% which would come from always pairing CUDA with CUDA. As a client-side participant, I never expected to be able to run the project and can't even begin to answer "why" questions. I simply don't know enough. I did express my doubts that having the CUDA app here was sensible immediately after it appeared, but I don't think any amount of complaining by the small minority of participants who frequent these forums is going to affect the project's decisions. I'm sure the project would be glad to have an application for the TI-84 if you think you can produce one which will meet deadlines. Almost any hand calculator has sufficient accuracy for the calculations, it's primarily a matter of handling large amounts of data quickly enough. If you could even come close, the project might well be willing to produce smaller WUs more suited to the hardware. When the Boincoid programmers inquired about reduced S@H WUs for their JAVA/Android version which may run on some cell phones, Eric asked how much reduction they thought would be appropriate. Joe |
kittyman Send message Joined: 9 Jul 00 Posts: 51492 Credit: 1,018,363,574 RAC: 1,004 |
I distinctly disagree with any suggestion that CUDA should be paired with CUDA now, that would put too many doubtful results in the master science database. Once the bugs are fixed I doubt many will care what pairings happen, but more flexibility would at least make sense.Joe Geez......... I can't get reproducable resutlts fast enough on my slide rule to make the date.......... will Seti slow down enough for me to fit in??? "Time is simply the mechanism that keeps everything from happening all at once." |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Well i tried to get the most ouf of the system with both CPU and Cuda running together, yesterday it BSODed on me.. And what do you know, my profile got erased so Vista made me a new one automatically with my files that i had on the desktop.. Stuff that also errored out is for instance the c:\programdata\boinc drawer so when i entered the darn system again my client detected a faulty client_state file so it automatically detached everything.. With shadowcopy i managed to restore the file and are starting to emptying my cache just as it was before but i got occasional red messages saying that the result that i just crunched returned positive before and if i got 100 results and send it on i don't get any errors or so but i can't seem to see if the server changes the line "detached" to the date it received the file instead. I just wonder does S@H servers know if i restore files so i got everything back and starts to crunch and send in results if the results would get validated later on EVEN if it says on over thousand of lines "client detached" and would fix so i'll be rewarded credits later on? Does someone know how that works.. This is my really first "Darn Cuda" sentence that i need to point out.. Kind regards Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Nope, once the project says 'Client Detached' they are dead for you. Since you said you like to get everything there is to be had from your host, I'd detach/reattach again for real to get rid of the clinkers. This will clear out the SAH directory, so you'd have to reinstall your opti apps, custom app_info file, etc as well. If there aren't that many, you can go through them manually and abort the ones which are showing as detached on the project. Alinator |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
but if you don't ask you don't get. Its a good idea so i'll put the suggestion in and we will see. Response to the request for a throttling mechanism from the "Nvidia developer" (whoever that is). Can the GPU be throttled in another way? (BOINC uses a pause system to throttle CPU calculations, if set by the user. It then pauses all of BOINC for the duration of a second or more. I'm wondering what effect that has on the GPU's lifespan.) In other words, can the GPU be set to use only half its capacity (50% comparable CPU cycles) or not? BOINC blog |
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0 |
I expect more and more chips going bye bye, they are not designed to run for too long, see charlies articles on the inquirer about this. this is my personal opinion. Who? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Current BSoD issues in general not connected with any GPU overheating or any hardware troubles. It's just not imperfect OS-driver interactions, that is - software issue, not hardware one. Still haven't seen any report about some hardware failures. Who? just goes blame and panic (as usually :P ) |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmm... Perhaps you should review the literature some more. In this case, Francois' observation and point is well taken (IMHO) about the state of the current nVidia's GPU family hardware difficulties. Of course, that doesn't mean some of the troubles we are seeing here on SAH are a direct result of that, but it sure can't be helping. Alinator |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hmmm... Sure hardware failures are possible - but most of them will be from too weak or poor fans on cheap boards. And I stress, its to early to say about "more chip failures will come". Can't be second when there is no first still. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Links please! :-) ..where I can read this about CUDA-GPUs? |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmm... Agreed, to a point. However one big difference here is the form factor you are trying to cram that 'heat factory' into. A graphics card is a far more constrained environment to be pumping 100 watts plus steady state into (a lot plus in some cases). One only has to look at how well some of the ultra-thin laptops handle having the hammer dropped when running BOINC flatout to see a similar effect for CPU's. Not all of them were what I would call 'cheap'. @ Sutaro: Just look for Charlie Demerjian's articles in 'The Inquirer'. You can find more if you root around a bit. Alinator |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Hmmm... I wouldn't make commercial for a special manufacturer.. For example 'Zotac' overclock his GPUs in the AMP!-series.. Maybe it's not recommended to buy a OCed GPU for CUDA-crunching? Hey, to my knowledge they have 5 years warranty.. ;-D ..if a GPU can't calculate with CUDA well, it's a 'warranty-topic'? |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Point well taken there. I'm sure the problem is dependent on the particular card vendor to some extent. However, the take away point from a lot of the articles is that this is really new ground the GPU manufacturers are moving into. Perhaps this is a case where the ATi-AMD merger has a real positive benefit, since the kinds of hot spot thermal management issues mentioned are something CPU manufacturers have had to deal with from day one out of necessity. Alinator |
Edboard Send message Joined: 4 Jun 08 Posts: 9 Credit: 1,043,577 RAC: 0 |
I have been trying seti@home cuda units in two PCs (each one with a gtx280 without OC) and almost every day I have a reset or video driver failure with some unit. The temperatures are relatively low (if compared with the temps. I get crunching in folding and Gpugrid which are higher and without problems). I think I'll wait until the units delivered be safer for crunching. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Raistmer has released a new version that he says fixes the VLAR bug. If you would like to try it. http://setiathome.berkeley.edu/forum_thread.php?id=50829 PROUD MEMBER OF Team Starfire World BOINC |
Edboard Send message Joined: 4 Jun 08 Posts: 9 Credit: 1,043,577 RAC: 0 |
Installed V5 opt. One of my PCs has just done a reset. It seems that it doesn't work for me. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
Installed V5 opt. One of my PCs has just done a reset. It seems that it doesn't work for me. Details please? Especially model of CUDA card installed, and operating system version? |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
The version I linked is version 6. It has been vastly improved over ver 5 PROUD MEMBER OF Team Starfire World BOINC |
Edboard Send message Joined: 4 Jun 08 Posts: 9 Credit: 1,043,577 RAC: 0 |
Both PC's have a gtx280 running under Windows Vista Home Premium 32 bits. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
Both PC's have a gtx280 running under Windows Vista Home Premium 32 bits. Have a try of the v6 then. That build (as yet relatively untested) does at least run through to the end of the difficult tasks. Be prepared for long-ish running times for some tasks, and perhaps more screen-lag while they run, but the latest bug-fix has been deliberately designed to avoid the driver reset problem. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.