Anonumous host throwing only errors, 3223 right now

Message boards : Number crunching : Anonumous host throwing only errors, 3223 right now
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1246550 - Posted: 15 Jun 2012, 19:58:23 UTC

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.
ID: 1246550 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1246552 - Posted: 15 Jun 2012, 20:08:00 UTC - in response to Message 1246550.  

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.

I think sometimes the cards just freak out and go on a killing spree. My GT 8500 does this about once a month without any consistent reason. I just reboot and I'm good until it occurs again.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1246552 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1246559 - Posted: 15 Jun 2012, 20:20:35 UTC - in response to Message 1246550.  
Last modified: 15 Jun 2012, 20:52:58 UTC

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.


Not only have I changed my app_info.xml file to do 1 MB WU per NVidia
GPU, I've tuned both the GTX470 and 480, 20% and 10% down.

Also my ATI 5870 GPUs are doing 1 MB and AstroPulse WU per GPU, although
the 3 systems weren't making errors or throwing inconclusives.

It's just easier while compairing runtimes with other GPUs. And the 'stock'
CUDA app..
And I haven't seen much gain, by running multiple WU on a GPU, just because
it's possible with FERMI/KEPLERs and (5000/6000/7000 series of) ATIs.
It's not that difficult to take a look, (once a while) if they're working
as they should.

Well the good thing is, you spotted them yourself ;-)
Pentium
Dual-Core
CPU E5300 @ 2.60GHz [Family 6 Model 23 Stepping 10] is
the host that's making these errors.
-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS, all of them.

In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
Cuda initialisation FAILED, Initiating Boinc temporary exit (180 secs)
Preemptively Acknowledging temporary exit -> boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->

</stderr_txt>
]]>
ID: 1246559 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1246562 - Posted: 15 Jun 2012, 20:29:36 UTC

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.


Might be a problem with your edit of count in app_info.xml: STDERR contains these lines:

Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.
setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found


lots of other lines, but they repeat until it exits with too many temporary exits:

-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1246562 · Report as offensive
Profile Area 51
Avatar

Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1246566 - Posted: 15 Jun 2012, 20:45:40 UTC - in response to Message 1246550.  

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.



You've immediately differentiated yourself form the 'others' by spotting your problem and asking for help ;-). Puts you orders of magnitude in front of those who shall not be menioned as far as I am concerned!

Posting your app_info.xml here would be a good start. Also, did you recently update your drivers and if so, exctly how did you go about it?
ID: 1246566 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1246572 - Posted: 15 Jun 2012, 20:55:30 UTC - in response to Message 1246550.  
Last modified: 15 Jun 2012, 20:57:03 UTC

oh boy it looks like im a culprit now! i changed the app_info in my 3 computers that are running GT 430's so that they will run 2 tasks at a time. one of these computers is now spitting out errors from what i can see (computer 6606314). i live in an apt during the week and these computers are at my house so i don't have physical access to them at the moment. can somebody look into the errors that system is putting out and tell me if they know what seems to be the cause? i'd like to fix it ASAP, thanks.


He answered that already........
ID: 1246572 · Report as offensive
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1246575 - Posted: 15 Jun 2012, 20:57:28 UTC - in response to Message 1246566.  

i hear ya :)

yes i actually updated the nvidia drivers for all 3 cards/computers the same night i edited the appinfo through windows update. everything seemed to be running fine though when i left them (i stuck around long enough to see all 3 computers finishing tasks with the new GPU drivers/app_info edits, and monitored them with GPU-Z to make sure everything looked okay).

the only difference between the computer getting the errors and the other two which seem to be fine is that the two that are not having errors are set to run CPU and GPU tasks 100% of the time, and those computers are completely dedicated to S@H. the computer that is giving the errors (e5300 pentium dual core) is set to run tasks with gpu and cpu only after the computer is idle for 2 minutes because my father uses that computer for a few hours each day and i didn't want to slow it down while he is actively using it. i have 8 computers that are dedicated to S@H and are never used for anything else, and i have 3 that are used actively but run S@H just about 24/7 as well. the erroneous computer is one of the latter.
ID: 1246575 · Report as offensive
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1246578 - Posted: 15 Jun 2012, 21:02:04 UTC

to rephrase that, the only one of my 11 computers that suspends tasks when the computer is actively being used, happens to be the one giving out errors. i wonder if this has something to do with it. but it only started giving the errors AFTER i edited the app_info for the GPU to run 2 tasks simultaneously, and updated the driver for the GPU as well.
ID: 1246578 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1246583 - Posted: 15 Jun 2012, 21:11:42 UTC

That driver 295.73 has the sleep bug - See Claggy's sticky thread. If monitor goes to sleep & is DVI connected, the driver disappears. But I thought the error messages were different with that bug.

Guess you can't get to it to post that part of the app_info.xml. If you use venues for those computers, you could turn off gpu fetch in your website project preferences for that venue until you can get to it & trouble shoot.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1246583 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1246585 - Posted: 15 Jun 2012, 21:13:17 UTC - in response to Message 1246575.  

i hear ya :)

yes i actually updated the nvidia drivers for all 3 cards/computers the same night i edited the appinfo through windows update. everything seemed to be running fine though when i left them (i stuck around long enough to see all 3 computers finishing tasks with the new GPU drivers/app_info edits, and monitored them with GPU-Z to make sure everything looked okay).

the only difference between the computer getting the errors and the other two which seem to be fine is that the two that are not having errors are set to run CPU and GPU tasks 100% of the time, and those computers are completely dedicated to S@H. the computer that is giving the errors (e5300 pentium dual core) is set to run tasks with gpu and cpu only after the computer is idle for 2 minutes because my father uses that computer for a few hours each day and i didn't want to slow it down while he is actively using it. i have 8 computers that are dedicated to S@H and are never used for anything else, and i have 3 that are used actively but run S@H just about 24/7 as well. the erroneous computer is one of the latter.


I can tell you precisely what is causing the error.

It is the CUDA sleep bug in the 295.xx and 296.xx drivers.

Tell you dad to go into Power Options, click on the change plan settings for the selected option and then have him change the "Turn off the display" option to never.

He will physically have to turn off the monitor for the time being but it will continue to process work that way.

After you get back there, update to the newest 301.xx driver as that does not have the bug.

ID: 1246585 · Report as offensive
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1246589 - Posted: 15 Jun 2012, 21:16:45 UTC - in response to Message 1246585.  

thanks guys, that must be it because that computer is set to have the monitor power down, while my other systems are not. i didn't know there is a newer driver than that. should i just roll back the drivers to the previous ones which were working fine on that computer, or update further?
ID: 1246589 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1246606 - Posted: 15 Jun 2012, 22:07:33 UTC - in response to Message 1246589.  
Last modified: 15 Jun 2012, 22:09:38 UTC

thanks guys, that must be it because that computer is set to have the monitor power down, while my other systems are not. i didn't know there is a newer driver than that. should i just roll back the drivers to the previous ones which were working fine on that computer, or update further?


Going back to the 'old driver', is OK, unless you want to play the latest
games or other software requiering the latest drivers.
I tend to stick with drivers that work OK and seldom choose the 'latest drivers'.
Still using 275.33 and 280.xx working just fine.
ID: 1246606 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1246608 - Posted: 15 Jun 2012, 22:14:46 UTC - in response to Message 1246606.  
Last modified: 15 Jun 2012, 22:25:16 UTC

Found this MB WU, both validated, but outcome is not convincing.
MB WU
1011301279.


Am I missing something apart from no canonnical result is established?
(I think, they'll validate and make a canonnical result).
ID: 1246608 · Report as offensive
Profile Yanivicious
Avatar

Send message
Joined: 29 Mar 12
Posts: 157
Credit: 15,529,301
RAC: 0
United States
Message 1246686 - Posted: 16 Jun 2012, 1:57:51 UTC

alright i updated the driver on the offending computer to 301.42. hopefully this will solve the problem
ID: 1246686 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34773
Credit: 261,360,520
RAC: 489
Australia
Message 1246732 - Posted: 16 Jun 2012, 4:50:38 UTC - in response to Message 1244923.  

Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :(

Computer 5932466
Computer 5389162
Computer 1901895
Computer 5348349
Computer 6598140
Computer 5877728
Computer 6204067
Computer 5218485
Computer 6249533
Computer 6236663
Computer 6462813

But I guess that I'll eventually wear most of them down as well except for maybe Computer 3378825 which seems to be relentless with a pending list with a number that seems to just keep increasing (I bet some would like a RAC with those numbers).

Cheers.
ID: 1246732 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1246828 - Posted: 16 Jun 2012, 7:46:19 UTC - in response to Message 1246732.  

Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :(

Computer 5932466
Computer 5389162
Computer 1901895
Computer 5348349
Computer 6598140
Computer 5877728
Computer 6204067
Computer 5218485
Computer 6249533
Computer 6236663
Computer 6462813

But I guess that I'll eventually wear most of them down as well except for maybe Computer 3378825 which seems to be relentless with a pending list with a number that seems to just keep increasing (I bet some would like a RAC with those numbers).

Cheers.


not bad for a pending number, my total pending is 5,688 ...

L.
ID: 1246828 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34773
Credit: 261,360,520
RAC: 489
Australia
Message 1246876 - Posted: 16 Jun 2012, 10:09:04 UTC - in response to Message 1246828.  

Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :(

Computer 5932466
Computer 5389162
Computer 1901895
Computer 5348349
Computer 6598140
Computer 5877728
Computer 6204067
Computer 5218485
Computer 6249533
Computer 6236663
Computer 6462813

But I guess that I'll eventually wear most of them down as well except for maybe Computer 3378825 which seems to be relentless with a pending list with a number that seems to just keep increasing (I bet some would like a RAC with those numbers).

Cheers.


not bad for a pending number, my total pending is 5,688 ...

L.

At the rate it's going it'll have 3x yours by the end of the weekend. :o

Cheers.
ID: 1246876 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1246882 - Posted: 16 Jun 2012, 11:05:24 UTC - in response to Message 1246876.  

Just as I start to get on top of those computers I listed before (except for Computer 3378825 which just keeps coming back at me time after time) I get another lot throwing great clumps of inconclusives into my pendings. :(

Computer 5932466
Computer 5389162
Computer 1901895
Computer 5348349
Computer 6598140
Computer 5877728
Computer 6204067
Computer 5218485
Computer 6249533
Computer 6236663
Computer 6462813

But I guess that I'll eventually wear most of them down as well except for maybe Computer 3378825 which seems to be relentless with a pending list with a number that seems to just keep increasing (I bet some would like a RAC with those numbers).

Cheers.


not bad for a pending number, my total pending is 5,688 ...

L.

At the rate it's going it'll have 3x yours by the end of the weekend. :o

Cheers.


OMG, what did I start? Maybe a (-) negative RAC will help?!?


ID: 1246882 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1247088 - Posted: 16 Jun 2012, 20:09:01 UTC - in response to Message 1246882.  

But I do think it'll help I mean, looking at your results and also your
wingmen, often I now see, 3 hosts needed to get 2 valid results.

Host
6642577,
1 month old, already starts making errors with CPU and stock app..

Hope partissipants will pay more attention, seeing this.


ID: 1247088 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34773
Credit: 261,360,520
RAC: 489
Australia
Message 1247242 - Posted: 17 Jun 2012, 2:17:36 UTC - in response to Message 1247088.  

ID: 1247242 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Anonumous host throwing only errors, 3223 right now


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.