Error

Message boards : Number crunching : Error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1337791 - Posted: 13 Feb 2013, 16:59:57 UTC

Just noticed one machine had a problem yesterday and errored 97 tasks.

Any one know what this error is?

http://setiathome.berkeley.edu/result.php?resultid=2832633964


ID: 1337791 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1337811 - Posted: 13 Feb 2013, 17:34:38 UTC - in response to Message 1337791.  

Were all these GPU WUs? were they all consecutive - did they all die rapid fire, one after the other?

Just a guess - did your graphics card have a problem? Maybe the fan is failing and the card overheated?

Try rebooting your machine and see if that cures it. Try getting Speedfan (freeware) and checking the temps.
ID: 1337811 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1337847 - Posted: 13 Feb 2013, 19:03:59 UTC

Yes they were all GPU and yes they all errored in seconds.

First thing I did was reboot.

I am running GPU Tweak and it reports a steady 65ºC at 30% fan
ID: 1337847 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1337907 - Posted: 13 Feb 2013, 21:07:45 UTC - in response to Message 1337847.  

OK, I'm 0 for 1 so far.

Had you done anything to the environment like install new graphics drivers? A newer version of BOINC? Are you running the stock apps or Lunatics?

(I'm thinking maybe some sort of CUDA library mismatch that has been discussed here recently - I don't remember the exact situations - search the forums here and try to find it)
ID: 1337907 · Report as offensive
Glen

Send message
Joined: 5 Feb 13
Posts: 8
Credit: 26,875
RAC: 0
Australia
Message 1338240 - Posted: 14 Feb 2013, 22:00:41 UTC

just a small problem i have the setting for seti set not receve Astropulse units especially the cuda ones as i have already lost 1 graphic card and i'm on a fixed income and can not afford to replace hardweare every couple of months but this morning while updateing seti it started to download a Astopluse cuda unit why did this happen ?? i hope i can trust seti not to push my system to the point it breaks ????.If you have no normal units i do not wish to recive Astropluse i'll wait till you have more units and start doing another progect .
ID: 1338240 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1338256 - Posted: 14 Feb 2013, 22:45:45 UTC - in response to Message 1338240.  

just a small problem i have the setting for seti set not receve Astropulse units especially the cuda ones as i have already lost 1 graphic card and i'm on a fixed income and can not afford to replace hardweare every couple of months but this morning while updateing seti it started to download a Astopluse cuda unit why did this happen ?? i hope i can trust seti not to push my system to the point it breaks ????.If you have no normal units i do not wish to recive Astropluse i'll wait till you have more units and start doing another progect .

That will happen to a card if you stick them in a poorly ventilated case and/or let their cooling fan/s run to slowly.

I'm a pensioner but that doesn't stop me from running my video cards at full bore though I do have them in very well ventilated cases and their fan speeds are set not to drop below 60%.

Cheers.
ID: 1338256 · Report as offensive
Glen

Send message
Joined: 5 Feb 13
Posts: 8
Credit: 26,875
RAC: 0
Australia
Message 1338292 - Posted: 15 Feb 2013, 1:13:10 UTC - in response to Message 1338256.  

so why did seti send me a Astropluse unit when the settings are set not to ??? .Good on you for running your graphics crad full bore .For the record you can't have better ventilation than haveing the side covers removed which i have and i have been doning seti longer than you have bro i origanally joined in 1999 so i think i got a bit more expereance than you running seti . Your a Legand mate keep the good work up and keep a few bucks spear just incase that card of your does burn out .
ID: 1338292 · Report as offensive
Glen

Send message
Joined: 5 Feb 13
Posts: 8
Credit: 26,875
RAC: 0
Australia
Message 1338299 - Posted: 15 Feb 2013, 1:38:20 UTC - in response to Message 1338256.  

One last thing you will find my stats under Glenn i'm in 2019th place for the country with a points score of 350,000+ before the graphic card shit it's self and i stoped doing it and rejoined only recently as a new user your stats tell me you have a computer set up to do only Bionic doing 67,000 points a day where i use my 1 computer for everything not just seti and Astropulse hangs the system too much to be able to use, opening client takes 3 seconds but normal cuda units don't do this i only have troubble if i'm waching a movie so i turn the gpu off but leave the cpu's running with no prob's. I'm not a newbee or somebody that doesn't know much about computers i'm very knolegeable and have formal training .
ID: 1338299 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1338376 - Posted: 15 Feb 2013, 7:18:55 UTC - in response to Message 1338299.  

so why did seti send me a Astropluse unit when the settings are set not to ??? .Good on you for running your graphics crad full bore .For the record you can't have better ventilation than haveing the side covers removed which i have and i have been doning seti longer than you have bro i origanally joined in 1999 so i think i got a bit more expereance than you running seti . Your a Legand mate keep the good work up and keep a few bucks spear just incase that card of your does burn out.

As to why you were sent AP's I cannot say but better than having the side off is to have a 200mm side fan in the side blowing over 2 or 3 cards, my main 2 rigs are housed in CoolerMaster 942HAF cases so ventilation is well above just having a side off.

Plus I have done SETI since it 1st started but the original account was lost due to an expired email addy but that's another story and the old AMD 33MHz 386 couldn't really do much so those few months were really no loss.

Actually I'm considering another 2 new video cards myself so that I can retire my old E6300 and 3 old 9800GT's.

One last thing you will find my stats under Glenn i'm in 2019th place for the country with a points score of 350,000+ before the graphic card shit it's self and i stoped doing it and rejoined only recently as a new user your stats tell me you have a computer set up to do only Bionic doing 67,000 points a day where i use my 1 computer for everything not just seti and Astropulse hangs the system too much to be able to use, opening client takes 3 seconds but normal cuda units don't do this i only have troubble if i'm waching a movie so i turn the gpu off but leave the cpu's running with no prob's. I'm not a newbee or somebody that doesn't know much about computers i'm very knolegeable and have formal training .

Well I won't tell you my position the country as it may be to much for you. :)

I will tell you that SETI is my main project though due to supply & demand I have 3 other backup projects to take up the slack when SETI can't keep me supplied with work which seems to be a day or 2 each week these days, especially for the GPU's but my 2500K is also hard to kept fed.

I do pause my crunching when doing audio/video editing as my programs both CPU and GPU's but for anything else I let them keep going and I do use all 3 every day so none are totally dedicated crunchers.

I will say that if I did do OpenCL AP's on video card I would at least reserve a CPU core just to feed them so that lag wouldn't be a problem.

I was not trying to dis you and I have no formal training but I have been building my own PC's since well before SETI started.

I also noticed that you don't belong to a team but if you do want to join one then please consider team Australia.

Cheers.


ID: 1338376 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1338388 - Posted: 15 Feb 2013, 7:54:01 UTC - in response to Message 1338292.  
Last modified: 15 Feb 2013, 7:57:27 UTC

so why did seti send me a Astropluse unit when the settings are set not to ???

Since you didn't copy/post the exact settings we can only wonder how are they set.

Instead of "the settings are set not to" more informative is to just copy the lines, e.g.:

SETI@home Enhanced: yes
Astropulse v505: no
SETI@home v7: no
AstroPulse v6: yes
If no work for selected applications is available, accept work from other applications? yes


And without exact information from you I can only guess you forget to set:
If no work for selected applications is available, accept work from other applications? NO


P.S.
To protect your CPU/GPU from overheating use TThrottle
http://www.efmer.eu/boinc/


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1338388 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1339473 - Posted: 19 Feb 2013, 22:35:27 UTC

Seems like I have more problems, all the GPU units in this machine error.

http://setiathome.berkeley.edu/results.php?hostid=5708417&offset=0&show_names=0&state=6&appid=

I did a re-boot when I saw the problem then noticed this in the log


SETI@home 19/02/2013 22:25:09 Aborting task 08dc12af.32581.62739.7.10.36_0: exceeded elapsed time limit 2143.66 (53690.16G/25.05G)

I have just changed to BOINC 7.0.52, I have 4 other machines using it and not reporting errors

Any help wpuld be appreciated
ID: 1339473 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1339475 - Posted: 19 Feb 2013, 22:47:56 UTC - in response to Message 1339473.  
Last modified: 19 Feb 2013, 22:49:41 UTC

Seems like I have more problems, all the GPU units in this machine error.

http://setiathome.berkeley.edu/results.php?hostid=5708417&offset=0&show_names=0&state=6&appid=

I did a re-boot when I saw the problem then noticed this in the log


SETI@home 19/02/2013 22:25:09 Aborting task 08dc12af.32581.62739.7.10.36_0: exceeded elapsed time limit 2143.66 (53690.16G/25.05G)

I have just changed to BOINC 7.0.52, I have 4 other machines using it and not reporting errors

Any help wpuld be appreciated

This is because of a change in Boinc 7.0.33, the flops value for GPU tasks gets increased by by times 10 (when no flops values have been supplied in the app_info), existing GPU tasks will be put on the verge of going Maximum time exceeded:

client: when estimating FLOPS for an anonymous-platform app version for which no estimate has been supplied by user, use (CPU speed)*(cpu_usage + 10*gpu_usage) (--> add the 10*)


Fresh GPU tasks will get revised <rsc_fpops_est> and <rsc_fpops_bound> values and will complete O.K, you can eithier reset the project and get your tasks resent,
or use the -177 protection option in BoincRecheduler (but that'll mess up the time estimates)

Claggy
ID: 1339475 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1339476 - Posted: 19 Feb 2013, 22:58:18 UTC
Last modified: 19 Feb 2013, 22:59:34 UTC

Thanks I knew it had to be 7.0.52 but did not know what!!

Why did it only affect this machine?

PS I did a reset and the tasks are being downloaded now.
ID: 1339476 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1339479 - Posted: 19 Feb 2013, 23:04:14 UTC - in response to Message 1339476.  
Last modified: 19 Feb 2013, 23:06:02 UTC

Thanks I knew it had to be 7.0.52 but did not know what!!

Why did it only affect this machine?

PS I did a reset and the tasks are being downloaded now.

It has effected both machines that run anonymous platform and Boinc 7.0.52:

Error tasks for computer 5708417

Error tasks for computer 6851722

the other machine is running the Stock apps, the project supplies a flops value for that,

Claggy
ID: 1339479 · Report as offensive
Profile Floyd
Avatar

Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1339565 - Posted: 20 Feb 2013, 5:30:47 UTC
Last modified: 20 Feb 2013, 5:43:41 UTC

I guess this is the ERROR Thread...
So I have this "Error"

http://setiathome.berkeley.edu/result.php?resultid=2842392732


http://setiathome.berkeley.edu/result.php?resultid=2842392732

That same WU has 2 inconclusives and 2 error while computing... ???
Maybe just a corrupted file ?
ID: 1339565 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1339599 - Posted: 20 Feb 2013, 6:44:05 UTC - in response to Message 1339565.  

....That same WU has 2 inconclusives and 2 error while computing... ???
Maybe just a corrupted file ?

The answer is in the bottom line of the stderr_txt output file.

cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel"


Yes, they are noisy units :-)

T.A.
ID: 1339599 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1339643 - Posted: 20 Feb 2013, 11:26:20 UTC - in response to Message 1339565.  
Last modified: 20 Feb 2013, 11:26:44 UTC

I guess this is the ERROR Thread...
So I have this "Error"

http://setiathome.berkeley.edu/result.php?resultid=2842392732

That same WU has 2 inconclusives and 2 error while computing... ???
Maybe just a corrupted file ?


It's not a generic error thread, Bernie just wasn't very specific with his thread title ;)

That WU shows very nicely where we stand with the optimised CUDA app.

The error itself is as old as the CUDA app. I call it a bug (because it throws an error), Jason begs to differ and calls it a design flaw.

The original CUDA application can not handle certain triplet conditions and will error with a -12.

You are running x41g where Jason already removed quite a few (but not all) of the -12 causes.

Host 6269362 belonging to Juan is running the latest x41zc, which IIRC does not do any -12 errors.

Why did the unit go inconclusive then?

If you look at stderr you'll see that the CPU found 31 pulses and the GPU 26 pulses and 5 triplets. Both WUs reported as -9 (overflow) but they reported a different subset of signals because of different processing order between the CPU and the GPU app.

As I said a very nice example of how the app matured and what the issues left for further development are.

William the Silent
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1339643 · Report as offensive

Message boards : Number crunching : Error


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.