Message boards :
Number crunching :
Anonumous host throwing only errors, 3223 right now
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Yanivicious Send message Joined: 29 Mar 12 Posts: 157 Credit: 15,529,301 RAC: 0 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. I think sometimes the cards just freak out and go on a killing spree. My GT 8500 does this about once a month without any consistent reason. I just reboot and I'm good until it occurs again. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. Not only have I changed my app_info.xml file to do 1 MB WU per NVidia GPU, I've tuned both the GTX470 and 480, 20% and 10% down. Also my ATI 5870 GPUs are doing 1 MB and AstroPulse WU per GPU, although the 3 systems weren't making errors or throwing inconclusives. It's just easier while compairing runtimes with other GPUs. And the 'stock' CUDA app.. And I haven't seen much gain, by running multiple WU on a GPU, just because it's possible with FERMI/KEPLERs and (5000/6000/7000 series of) ATIs. It's not that difficult to take a look, (once a while) if they're working as they should. Well the good thing is, you spotted them yourself ;-) Pentium Dual-Core CPU E5300 @ 2.60GHz [Family 6 Model 23 Stepping 10] is the host that's making these errors. -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS, all of them. In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device cannot be used Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs) Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> </stderr_txt> ]]> |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. Might be a problem with your edit of count in app_info.xml: STDERR contains these lines: Cuda error 'Couldn't get cuda device count ' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected. setiathome_CUDA: cudaGetDeviceCount() call failed. setiathome_CUDA: No CUDA devices found lots of other lines, but they repeat until it exits with too many temporary exits: -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
Area 51 Send message Joined: 31 Jan 04 Posts: 965 Credit: 42,193,520 RAC: 0 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. You've immediately differentiated yourself form the 'others' by spotting your problem and asking for help ;-). Puts you orders of magnitude in front of those who shall not be menioned as far as I am concerned! Posting your app_info.xml here would be a good start. Also, did you recently update your drivers and if so, exctly how did you go about it? |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks. He answered that already........ |
Yanivicious Send message Joined: 29 Mar 12 Posts: 157 Credit: 15,529,301 RAC: 0 |
i hear ya :) yes i actually updated the nvidia drivers for all 3 cards/computers the same night i edited the appinfo through windows update. everything seemed to be running fine though when i left them (i stuck around long enough to see all 3 computers finishing tasks with the new GPU drivers/app_info edits, and monitored them with GPU-Z to make sure everything looked okay). the only difference between the computer getting the errors and the other two which seem to be fine is that the two that are not having errors are set to run CPU and GPU tasks 100% of the time, and those computers are completely dedicated to S@H. the computer that is giving the errors (e5300 pentium dual core) is set to run tasks with gpu and cpu only after the computer is idle for 2 minutes because my father uses that computer for a few hours each day and i didn't want to slow it down while he is actively using it. i have 8 computers that are dedicated to S@H and are never used for anything else, and i have 3 that are used actively but run S@H just about 24/7 as well. the erroneous computer is one of the latter. |
Yanivicious Send message Joined: 29 Mar 12 Posts: 157 Credit: 15,529,301 RAC: 0 |
to rephrase that, the only one of my 11 computers that suspends tasks when the computer is actively being used, happens to be the one giving out errors. i wonder if this has something to do with it. but it only started giving the errors AFTER i edited the app_info for the GPU to run 2 tasks simultaneously, and updated the driver for the GPU as well. |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
That driver 295.73 has the sleep bug - See Claggy's sticky thread. If monitor goes to sleep & is DVI connected, the driver disappears. But I thought the error messages were different with that bug. Guess you can't get to it to post that part of the app_info.xml. If you use venues for those computers, you could turn off gpu fetch in your website project preferences for that venue until you can get to it & trouble shoot. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
i hear ya :) I can tell you precisely what is causing the error. It is the CUDA sleep bug in the 295.xx and 296.xx drivers. Tell you dad to go into Power Options, click on the change plan settings for the selected option and then have him change the "Turn off the display" option to never. He will physically have to turn off the monitor for the time being but it will continue to process work that way. After you get back there, update to the newest 301.xx driver as that does not have the bug. |
Yanivicious Send message Joined: 29 Mar 12 Posts: 157 Credit: 15,529,301 RAC: 0 |
thanks guys, that must be it because that computer is set to have the monitor power down, while my other systems are not. i didn't know there is a newer driver than that. should i just roll back the drivers to the previous ones which were working fine on that computer, or update further? |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
thanks guys, that must be it because that computer is set to have the monitor power down, while my other systems are not. i didn't know there is a newer driver than that. should i just roll back the drivers to the previous ones which were working fine on that computer, or update further? Going back to the 'old driver', is OK, unless you want to play the latest games or other software requiering the latest drivers. I tend to stick with drivers that work OK and seldom choose the 'latest drivers'. Still using 275.33 and 280.xx working just fine. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Found this MB WU, both validated, but outcome is not convincing. MB WU 1011301279. Am I missing something apart from no canonnical result is established? (I think, they'll validate and make a canonnical result). |
Yanivicious Send message Joined: 29 Mar 12 Posts: 157 Credit: 15,529,301 RAC: 0 |
alright i updated the driver on the offending computer to 301.42. hopefully this will solve the problem |
Wiggo Send message Joined: 24 Jan 00 Posts: 36783 Credit: 261,360,520 RAC: 489 |
Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :( Computer 5932466 Computer 5389162 Computer 1901895 Computer 5348349 Computer 6598140 Computer 5877728 Computer 6204067 Computer 5218485 Computer 6249533 Computer 6236663 Computer 6462813 But I guess that I'll eventually wear most of them down as well except for maybe Computer 3378825 which seems to be relentless with a pending list with a number that seems to just keep increasing (I bet some would like a RAC with those numbers). Cheers. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :( not bad for a pending number, my total pending is 5,688 ... L. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36783 Credit: 261,360,520 RAC: 489 |
Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :( At the rate it's going it'll have 3x yours by the end of the weekend. :o Cheers. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :( OMG, what did I start? Maybe a (-) negative RAC will help?!? |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
But I do think it'll help I mean, looking at your results and also your wingmen, often I now see, 3 hosts needed to get 2 valid results. Host 6642577, 1 month old, already starts making errors with CPU and stock app.. Hope partissipants will pay more attention, seeing this. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36783 Credit: 261,360,520 RAC: 489 |
Computer 1334363 Now this is just getting a bit far past the funny side now as I checked in this morning to find that my pendings had taken another leap upwards again so I checked them all again only to find out that I've been hit up again by a new lot of problematic hosts. :( Computer 4799081 Computer 5821256 Computer 5935372 Computer 6028874 Computer 6128026 Computer 6175649 Computer 6189897 Computer 6247549 Computer 6387513 Computer 6585255 Computer 6641972 Computer 6649287 Now that makes 63 that I've teamed, double teamed, triple teamed and quadrupled teamed with this month alone. When will this slop stop? Cheers. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.