What Are Computational Errors?

Questions and Answers : Windows : What Are Computational Errors?

To post messages, you must log in.

AuthorMessage
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1323410 - Posted: 1 Jan 2013, 23:48:24 UTC

Since the power outages, I have been getting s lot of these, and I can not seem to report finished results.


ID: 1323410 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1323445 - Posted: 2 Jan 2013, 2:23:08 UTC

Computational errors means that something went wrong with the computation such that it could not be completed on your system. If you get an occasional one of these, it is nothing to worry about. If you have a constant stream of these, you need to figure out what is wrong with your system.

The most likely culprit is overheating.




BOINC WIKI

ID: 1323445 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1323943 - Posted: 3 Jan 2013, 0:08:07 UTC
Last modified: 3 Jan 2013, 0:10:29 UTC

About half of the tasks get errors. As far as I know, nothing is wrong with my system, as Milky Way breezes right through them with never an error. How would I know anyway? Also, in 12 years, I can't ever remember an error. Now, since the recent crashes, I am getting lots of these.


ID: 1323943 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1324728 - Posted: 4 Jan 2013, 22:58:59 UTC - in response to Message 1323943.

Maybe this will help.


ID: 1324728 · Report as offensive
Profile Ageless
Avatar

Send message
Joined: 9 Jun 99
Posts: 13810
Credit: 3,269,733
RAC: 0
Netherlands
Message 1324731 - Posted: 4 Jan 2013, 23:23:26 UTC - in response to Message 1324728.

Not really.

However, when you go to your account, your tasks list, then select to show only the Errors (like so), you can now click on any of the task IDs.

This then opens into the stderr.txt for that task, which includes the output of what BOINC recorded happened to the task.
E.g. http://setiathome.berkeley.edu/result.php?resultid=2775068596 shows

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
   Device 1 : GeForce GT 640 
           totalGlobalMem = -2147483648 
           sharedMemPerBlock = 49152 
           regsPerBlock = 65536 
           warpSize = 32 
           memPitch = 2147483647 
           maxThreadsPerBlock = 1024 
           clockRate = 1045500 
           totalConstMem = 65536 
           major = 3 
           minor = 0 
           textureAlignment = 512 
           deviceOverlap = 1 
           multiProcessorCount = 2 
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GT 640 is okay
SETI@home using CUDA accelerated device GeForce GT 640
setiathome_enhanced 6.09 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is :  2.702953
Optimal function choices:
-----------------------------------------------------
name                
-----------------------------------------------------
              v_BaseLineSmooth (no other)
            v_GetPowerSpectrum 0.00022 0.00000 
                   v_ChirpData 0.01324 0.00000 
                  v_Transpose4 0.00728 0.00000 
               FPU opt folding 0.00191 0.00000 
CUFFT error in file 'd:/Projects/SETI/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 62.

</stderr_txt>
]]>

Seeing how you have a Kepler type GPU, I suspect that you're being bitten by the fact that the present Seti CUDA application being sent to your GPU is enhanced for use on its predecessor, the Fermi type GPU. There are workarounds for being able to use this application, as specified in this sticky thread in the GPU forum, where it states:
2) Kepler cards unsupported

Drivers affected: All drivers from 304.48 (BETA) onwards.
Hardware affected: nVidia 'Kepler' - GT 6xx and GTX 6xx cards.

Symptom: All tasks end in errors when using the stock v6.10 'cuda_fermi' application.

Solution/workround:
a) Use an optimised Cuda application, where available.
b) Downgrade to the 301.42 driver - only possible on GTX 670/680/690 cards, not on newer releases like the 650/660 or their Ti variants.
c) Set an environment variable, as shown below.

The environment variable to use is

CUDA_GRID_SIZE_COMPAT

and it needs to have the value 1 (one)

You need to be running in Administrator mode, and you need to be running at least driver 306.02 (BETA) or 306.23 (WHQL)

Here's how to reach the setting screen:

Windows 7/Vista:



Windows XP:



The options given are Either, or, or. In other words, choose one and try it, there's no need to use all three.

Optimized applications --if you don't have one yet on another machine-- aren't available at this time due to copyright problems with one of the compilers.

Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!

ID: 1324731 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1324755 - Posted: 5 Jan 2013, 1:29:01 UTC - in response to Message 1324731.

Thanks. I did add the CUDA_GRID_SIZE_COMPAT variable and will try it. However, this is new, out of nowhere, nothing on my system changed, it just started happening AFTER the problem with the server outages.


ID: 1324755 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,591,107
RAC: 1,278
Bulgaria
Message 1325191 - Posted: 6 Jan 2013, 8:19:43 UTC - in response to Message 1324755.
Last modified: 6 Jan 2013, 8:27:02 UTC

Thanks. I did add the CUDA_GRID_SIZE_COMPAT variable and will try it. However, this is new, out of nowhere, nothing on my system changed, it just started happening AFTER the problem with the server outages.

Even the video driver version?

The "server outages" can't do that, the app is the same since:
6.10 (cuda_fermi) 8 Jun 2010

http://setiathome.berkeley.edu/apps.php

(New improved CUDA apps are in works/tests on SETI@home-Beta:
http://setiweb.ssl.berkeley.edu/beta/
http://setiweb.ssl.berkeley.edu/beta/apps.php
)





- ALF - "Find out what you don't do well ..... then don't do it!" :)

ID: 1325191 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1325608 - Posted: 7 Jan 2013, 21:48:38 UTC - in response to Message 1325191.

Yes, everything was the same. Nothing has changed in my machine at all, until the SETI server problems, then it just would not complete half the jobs, or threw that error. I did add the code above, and now it competes some, but never transfers them. I give up. I can't code my box around this software. I'll just do the MilkyWay jobs. They run, load and upload fine.


ID: 1325608 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1325669 - Posted: 8 Jan 2013, 3:15:36 UTC - in response to Message 1325608.

Yes, everything was the same. Nothing has changed in my machine at all, until the SETI server problems, then it just would not complete half the jobs, or threw that error. I did add the code above, and now it competes some, but never transfers them. I give up. I can't code my box around this software. I'll just do the MilkyWay jobs. They run, load and upload fine.

They should upload eventually. BOINC will try after 24 hours after completion if not sooner.


BOINC WIKI

ID: 1325669 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,591,107
RAC: 1,278
Bulgaria
Message 1325731 - Posted: 8 Jan 2013, 8:07:51 UTC - in response to Message 1325608.

Nothing has changed in my machine at all, until the SETI server problems

The same time of "SETI server problems" and your computing problems is just a coincidence.
And if you have some automatic updates turned ON you can't be sure that "Nothing has changed in my machine at all".

I'll just do the MilkyWay jobs. They run, load and upload fine.

Not really, there are some CPU-tasks errors:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=439930&offset=0&show_names=0&state=5&appid=

Probably CPU is overclocked or overheating?





- ALF - "Find out what you don't do well ..... then don't do it!" :)

ID: 1325731 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1326250 - Posted: 10 Jan 2013, 1:26:08 UTC - in response to Message 1325731.

Probably CPU is overclocked or overheating?


Not overclocked and has never run hot as far as I know. I have overtemp alarms that do work and nothing has gone off.

ID: 1326250 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1326330 - Posted: 10 Jan 2013, 11:58:03 UTC - in response to Message 1326250.

Probably CPU is overclocked or overheating?


Not overclocked and has never run hot as far as I know. I have overtemp alarms that do work and nothing has gone off.


Is the GPU overheating?

When was the last time the dust bunnies were blown out of the case?


BOINC WIKI

ID: 1326330 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,591,107
RAC: 1,278
Bulgaria
Message 1326400 - Posted: 10 Jan 2013, 17:34:34 UTC - in response to Message 1326250.


http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=378072214

The "exit code -1073740940 (0xc0000374)" seems to be about "heap corruption"
http://forums.iis.net/t/1150912.aspx

So this may be faulty app as some tasks error on different computers:
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=293007522

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)





- ALF - "Find out what you don't do well ..... then don't do it!" :)

ID: 1326400 · Report as offensive
Mitch

Send message
Joined: 27 Jun 01
Posts: 16
Credit: 962,035
RAC: 0
United States
Message 1334142 - Posted: 3 Feb 2013, 1:24:54 UTC - in response to Message 1326400.

When was the last time the dust bunnies were blown out of the case?


LOL, I vacuum it every couple weeks. I live in a dustbin of a house. Never seen anything like it. I'd love to know where this stuff comes from.

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)


Not since I started using XP Pro about, um, let's see.... I can't remember that far back. But I only use Intel and don't play games.

However, a few days back I reset the preferences from 90% CPU usage to I think 50% and have not seen a "Computation Error since. I was beginning to see some from MilkyWay also, but they have all stopped.

Time to plan my next box though, this one is 3 years old now.

ID: 1334142 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1334159 - Posted: 3 Feb 2013, 2:41:48 UTC - in response to Message 1334142.

When was the last time the dust bunnies were blown out of the case?


LOL, I vacuum it every couple weeks. I live in a dustbin of a house. Never seen anything like it. I'd love to know where this stuff comes from.

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)


Not since I started using XP Pro about, um, let's see.... I can't remember that far back. But I only use Intel and don't play games.

However, a few days back I reset the preferences from 90% CPU usage to I think 50% and have not seen a "Computation Error since. I was beginning to see some from MilkyWay also, but they have all stopped.

Time to plan my next box though, this one is 3 years old now.

That would indicate an overheating problem.

Suspects are fans that are wearing out and a heat sink that is no longer in good contact. If you are sufficiently capable mechanically, you can pull th heat sink off the CPU, scrape the heat contact past off and replace with new.


BOINC WIKI

ID: 1334159 · Report as offensive
J.D. Gallaway

Send message
Joined: 11 Aug 08
Posts: 2
Credit: 380,397
RAC: 1,089
United States
Message 1342057 - Posted: 1 Mar 2013, 16:27:30 UTC

I'm also getting this error, in fact out of ~50 projects only two passed through to "Complete".

I'm running a PhenomII x6, 16gb Corsair ram with a PNY GTX650. I am running prime95 right now stressing all six cores to 100% and the temps are sitting still at 146F/64C, which while warm isn't that hot I wouldn't thing for six cores being maxed out.

Anyone have an idea on how to check the temps on the GPU? I'm running a PNY built card, but using the stock nvidia drivers because PNY's disc wouldn't install to Windows 8.

ID: 1342057 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1342123 - Posted: 1 Mar 2013, 18:20:34 UTC - in response to Message 1342057.

I'm also getting this error, in fact out of ~50 projects only two passed through to "Complete".

I'm running a PhenomII x6, 16gb Corsair ram with a PNY GTX650. I am running prime95 right now stressing all six cores to 100% and the temps are sitting still at 146F/64C, which while warm isn't that hot I wouldn't thing for six cores being maxed out.

Anyone have an idea on how to check the temps on the GPU? I'm running a PNY built card, but using the stock nvidia drivers because PNY's disc wouldn't install to Windows 8.

Have you overclocked the machine?


BOINC WIKI

ID: 1342123 · Report as offensive
J.D. Gallaway

Send message
Joined: 11 Aug 08
Posts: 2
Credit: 380,397
RAC: 1,089
United States
Message 1342288 - Posted: 2 Mar 2013, 3:45:34 UTC

Negative to the overclock. While I went to work, I left my system running Prime95 on all 6 cores, my CoreTemp log shows a max CPU temp of 152F during the run.


Any thoughts?

ID: 1342288 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 754,585
RAC: 140
United States
Message 1342383 - Posted: 2 Mar 2013, 15:25:33 UTC - in response to Message 1342288.

Negative to the overclock. While I went to work, I left my system running Prime95 on all 6 cores, my CoreTemp log shows a max CPU temp of 152F during the run.


Any thoughts?

What version of BOINC? Are they GPU tasks that are failing, or CPU tasks?


BOINC WIKI

ID: 1342383 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3654
Credit: 8,591,107
RAC: 1,278
Bulgaria
Message 1342393 - Posted: 2 Mar 2013, 15:39:43 UTC - in response to Message 1342057.
Last modified: 2 Mar 2013, 15:44:09 UTC


You are using the 'unfortunate' combination of Kepler GPU + new drivers + stock CUDA app:
NVIDIA GeForce GTX 650 (1024MB) driver: 310.70
SETI@home Enhanced v6.10 (cuda_fermi)

The stock CUDA app is a few years old and until they fix it:
You need Solution 2) c) from here:
http://setiathome.berkeley.edu/forum_thread.php?id=69735

(Set an environment variable CUDA_GRID_SIZE_COMPAT)

And if you had read the posts - this same advice was already given in the beginning of this thread
http://setiathome.berkeley.edu/forum_thread.php?id=70473&postid=1324731#1324731





- ALF - "Find out what you don't do well ..... then don't do it!" :)

ID: 1342393 · Report as offensive

Questions and Answers : Windows : What Are Computational Errors?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.