What Are Computational Errors?


log in

Advanced search

Questions and Answers : Windows : What Are Computational Errors?

Author Message
Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1323410 - Posted: 1 Jan 2013, 23:48:24 UTC

Since the power outages, I have been getting s lot of these, and I can not seem to report finished results.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1323445 - Posted: 2 Jan 2013, 2:23:08 UTC

Computational errors means that something went wrong with the computation such that it could not be completed on your system. If you get an occasional one of these, it is nothing to worry about. If you have a constant stream of these, you need to figure out what is wrong with your system.

The most likely culprit is overheating.
____________


BOINC WIKI

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1323943 - Posted: 3 Jan 2013, 0:08:07 UTC
Last modified: 3 Jan 2013, 0:10:29 UTC

About half of the tasks get errors. As far as I know, nothing is wrong with my system, as Milky Way breezes right through them with never an error. How would I know anyway? Also, in 12 years, I can't ever remember an error. Now, since the recent crashes, I am getting lots of these.
____________

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1324728 - Posted: 4 Jan 2013, 22:58:59 UTC - in response to Message 1323943.

Maybe this will help.


____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12324
Credit: 2,629,532
RAC: 1,085
Netherlands
Message 1324731 - Posted: 4 Jan 2013, 23:23:26 UTC - in response to Message 1324728.

Not really.

However, when you go to your account, your tasks list, then select to show only the Errors (like so), you can now click on any of the task IDs.

This then opens into the stderr.txt for that task, which includes the output of what BOINC recorded happened to the task.
E.g. http://setiathome.berkeley.edu/result.php?resultid=2775068596 shows

<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> setiathome_CUDA: Found 1 CUDA device(s): Device 1 : GeForce GT 640 totalGlobalMem = -2147483648 sharedMemPerBlock = 49152 regsPerBlock = 65536 warpSize = 32 memPitch = 2147483647 maxThreadsPerBlock = 1024 clockRate = 1045500 totalConstMem = 65536 major = 3 minor = 0 textureAlignment = 512 deviceOverlap = 1 multiProcessorCount = 2 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GT 640 is okay SETI@home using CUDA accelerated device GeForce GT 640 setiathome_enhanced 6.09 Visual Studio/Microsoft C++ libboinc: 6.3.22 Work Unit Info: ............... WU true angle range is : 2.702953 Optimal function choices: ----------------------------------------------------- name ----------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.00022 0.00000 v_ChirpData 0.01324 0.00000 v_Transpose4 0.00728 0.00000 FPU opt folding 0.00191 0.00000 CUFFT error in file 'd:/Projects/SETI/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 62. </stderr_txt> ]]>

Seeing how you have a Kepler type GPU, I suspect that you're being bitten by the fact that the present Seti CUDA application being sent to your GPU is enhanced for use on its predecessor, the Fermi type GPU. There are workarounds for being able to use this application, as specified in this sticky thread in the GPU forum, where it states:
2) Kepler cards unsupported

Drivers affected: All drivers from 304.48 (BETA) onwards.
Hardware affected: nVidia 'Kepler' - GT 6xx and GTX 6xx cards.

Symptom: All tasks end in errors when using the stock v6.10 'cuda_fermi' application.

Solution/workround:
a) Use an optimised Cuda application, where available.
b) Downgrade to the 301.42 driver - only possible on GTX 670/680/690 cards, not on newer releases like the 650/660 or their Ti variants.
c) Set an environment variable, as shown below.

The environment variable to use is

CUDA_GRID_SIZE_COMPAT

and it needs to have the value 1 (one)

You need to be running in Administrator mode, and you need to be running at least driver 306.02 (BETA) or 306.23 (WHQL)

Here's how to reach the setting screen:

Windows 7/Vista:



Windows XP:



The options given are Either, or, or. In other words, choose one and try it, there's no need to use all three.

Optimized applications --if you don't have one yet on another machine-- aren't available at this time due to copyright problems with one of the compilers.

____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1324755 - Posted: 5 Jan 2013, 1:29:01 UTC - in response to Message 1324731.

Thanks. I did add the CUDA_GRID_SIZE_COMPAT variable and will try it. However, this is new, out of nowhere, nothing on my system changed, it just started happening AFTER the problem with the server outages.
____________

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2795
Credit: 6,313,733
RAC: 7,568
Bulgaria
Message 1325191 - Posted: 6 Jan 2013, 8:19:43 UTC - in response to Message 1324755.
Last modified: 6 Jan 2013, 8:27:02 UTC

Thanks. I did add the CUDA_GRID_SIZE_COMPAT variable and will try it. However, this is new, out of nowhere, nothing on my system changed, it just started happening AFTER the problem with the server outages.

Even the video driver version?

The "server outages" can't do that, the app is the same since:
6.10 (cuda_fermi) 8 Jun 2010

http://setiathome.berkeley.edu/apps.php

(New improved CUDA apps are in works/tests on SETI@home-Beta:
http://setiweb.ssl.berkeley.edu/beta/
http://setiweb.ssl.berkeley.edu/beta/apps.php
)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1325608 - Posted: 7 Jan 2013, 21:48:38 UTC - in response to Message 1325191.

Yes, everything was the same. Nothing has changed in my machine at all, until the SETI server problems, then it just would not complete half the jobs, or threw that error. I did add the code above, and now it competes some, but never transfers them. I give up. I can't code my box around this software. I'll just do the MilkyWay jobs. They run, load and upload fine.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1325669 - Posted: 8 Jan 2013, 3:15:36 UTC - in response to Message 1325608.

Yes, everything was the same. Nothing has changed in my machine at all, until the SETI server problems, then it just would not complete half the jobs, or threw that error. I did add the code above, and now it competes some, but never transfers them. I give up. I can't code my box around this software. I'll just do the MilkyWay jobs. They run, load and upload fine.

They should upload eventually. BOINC will try after 24 hours after completion if not sooner.
____________


BOINC WIKI

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2795
Credit: 6,313,733
RAC: 7,568
Bulgaria
Message 1325731 - Posted: 8 Jan 2013, 8:07:51 UTC - in response to Message 1325608.

Nothing has changed in my machine at all, until the SETI server problems

The same time of "SETI server problems" and your computing problems is just a coincidence.
And if you have some automatic updates turned ON you can't be sure that "Nothing has changed in my machine at all".

I'll just do the MilkyWay jobs. They run, load and upload fine.

Not really, there are some CPU-tasks errors:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=439930&offset=0&show_names=0&state=5&appid=

Probably CPU is overclocked or overheating?


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1326250 - Posted: 10 Jan 2013, 1:26:08 UTC - in response to Message 1325731.

Probably CPU is overclocked or overheating?


Not overclocked and has never run hot as far as I know. I have overtemp alarms that do work and nothing has gone off.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1326330 - Posted: 10 Jan 2013, 11:58:03 UTC - in response to Message 1326250.

Probably CPU is overclocked or overheating?


Not overclocked and has never run hot as far as I know. I have overtemp alarms that do work and nothing has gone off.


Is the GPU overheating?

When was the last time the dust bunnies were blown out of the case?
____________


BOINC WIKI

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2795
Credit: 6,313,733
RAC: 7,568
Bulgaria
Message 1326400 - Posted: 10 Jan 2013, 17:34:34 UTC - in response to Message 1326250.


http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=378072214

The "exit code -1073740940 (0xc0000374)" seems to be about "heap corruption"
http://forums.iis.net/t/1150912.aspx

So this may be faulty app as some tasks error on different computers:
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=293007522

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Mitch
Send message
Joined: 27 Jun 01
Posts: 16
Credit: 780,228
RAC: 660
United States
Message 1334142 - Posted: 3 Feb 2013, 1:24:54 UTC - in response to Message 1326400.

When was the last time the dust bunnies were blown out of the case?


LOL, I vacuum it every couple weeks. I live in a dustbin of a house. Never seen anything like it. I'd love to know where this stuff comes from.

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)


Not since I started using XP Pro about, um, let's see.... I can't remember that far back. But I only use Intel and don't play games.

However, a few days back I reset the preferences from 90% CPU usage to I think 50% and have not seen a "Computation Error since. I was beginning to see some from MilkyWay also, but they have all stopped.

Time to plan my next box though, this one is 3 years old now.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1334159 - Posted: 3 Feb 2013, 2:41:48 UTC - in response to Message 1334142.

When was the last time the dust bunnies were blown out of the case?


LOL, I vacuum it every couple weeks. I live in a dustbin of a house. Never seen anything like it. I'd love to know where this stuff comes from.

Do other computing programs or games on your computer crash often (e.g. several crashes per day in random moments/places)


Not since I started using XP Pro about, um, let's see.... I can't remember that far back. But I only use Intel and don't play games.

However, a few days back I reset the preferences from 90% CPU usage to I think 50% and have not seen a "Computation Error since. I was beginning to see some from MilkyWay also, but they have all stopped.

Time to plan my next box though, this one is 3 years old now.

That would indicate an overheating problem.

Suspects are fans that are wearing out and a heat sink that is no longer in good contact. If you are sufficiently capable mechanically, you can pull th heat sink off the CPU, scrape the heat contact past off and replace with new.
____________


BOINC WIKI

J.D. Gallaway
Send message
Joined: 11 Aug 08
Posts: 2
Credit: 251,605
RAC: 0
United States
Message 1342057 - Posted: 1 Mar 2013, 16:27:30 UTC

I'm also getting this error, in fact out of ~50 projects only two passed through to "Complete".

I'm running a PhenomII x6, 16gb Corsair ram with a PNY GTX650. I am running prime95 right now stressing all six cores to 100% and the temps are sitting still at 146F/64C, which while warm isn't that hot I wouldn't thing for six cores being maxed out.

Anyone have an idea on how to check the temps on the GPU? I'm running a PNY built card, but using the stock nvidia drivers because PNY's disc wouldn't install to Windows 8.

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1342123 - Posted: 1 Mar 2013, 18:20:34 UTC - in response to Message 1342057.

I'm also getting this error, in fact out of ~50 projects only two passed through to "Complete".

I'm running a PhenomII x6, 16gb Corsair ram with a PNY GTX650. I am running prime95 right now stressing all six cores to 100% and the temps are sitting still at 146F/64C, which while warm isn't that hot I wouldn't thing for six cores being maxed out.

Anyone have an idea on how to check the temps on the GPU? I'm running a PNY built card, but using the stock nvidia drivers because PNY's disc wouldn't install to Windows 8.

Have you overclocked the machine?
____________


BOINC WIKI

J.D. Gallaway
Send message
Joined: 11 Aug 08
Posts: 2
Credit: 251,605
RAC: 0
United States
Message 1342288 - Posted: 2 Mar 2013, 3:45:34 UTC

Negative to the overclock. While I went to work, I left my system running Prime95 on all 6 cores, my CoreTemp log shows a max CPU temp of 152F during the run.


Any thoughts?

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24687
Credit: 522,659
RAC: 19
United States
Message 1342383 - Posted: 2 Mar 2013, 15:25:33 UTC - in response to Message 1342288.

Negative to the overclock. While I went to work, I left my system running Prime95 on all 6 cores, my CoreTemp log shows a max CPU temp of 152F during the run.


Any thoughts?

What version of BOINC? Are they GPU tasks that are failing, or CPU tasks?
____________


BOINC WIKI

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2795
Credit: 6,313,733
RAC: 7,568
Bulgaria
Message 1342393 - Posted: 2 Mar 2013, 15:39:43 UTC - in response to Message 1342057.
Last modified: 2 Mar 2013, 15:44:09 UTC


You are using the 'unfortunate' combination of Kepler GPU + new drivers + stock CUDA app:
NVIDIA GeForce GTX 650 (1024MB) driver: 310.70
SETI@home Enhanced v6.10 (cuda_fermi)

The stock CUDA app is a few years old and until they fix it:
You need Solution 2) c) from here:
http://setiathome.berkeley.edu/forum_thread.php?id=69735

(Set an environment variable CUDA_GRID_SIZE_COMPAT)

And if you had read the posts - this same advice was already given in the beginning of this thread
http://setiathome.berkeley.edu/forum_thread.php?id=70473&postid=1324731#1324731


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Questions and Answers : Windows : What Are Computational Errors?

Copyright © 2014 University of California