Well, too many "Computation Error" tasks abound lately. Time to give up.

Message boards : Number crunching : Well, too many "Computation Error" tasks abound lately. Time to give up.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
CAHess-Den

Send message
Joined: 20 May 99
Posts: 21
Credit: 2,575,162
RAC: 0
United States
Message 1395321 - Posted: 26 Jul 2013, 20:56:37 UTC

I've been loaning background cycles to SETI@Home from several personal computers since May 1999, and easily dealt with minor problems/changes along the way. However now it is no longer worth it.

At least half of my WUs are now popping up as "Computation Error"s since the v7 changes.

From what I read back when it started happening (when the v7 changes were made) it had to do with memory allocation "problems." (Alas I can't find the topic/thread I read about it back then.)

Heating up my GPUs just to throw away the results aren't my cup of tea. Hence I've played with occasional switch-over to "No new tasks," but now am giving up on that and have turned all my cycles over to Einstein@Home (which I was running in parallel.)

I'll be checking back at SETI@Home now and again to see if this problem ever gets resolved, or when I get my next machine (with enough memory?) that can handle WUs.

As the Terminator says: "I'll be back [someday]."

ID: 1395321 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1395339 - Posted: 26 Jul 2013, 21:19:14 UTC - in response to Message 1395321.  
Last modified: 26 Jul 2013, 21:21:03 UTC

Your memory problem MAY not be a simple lack of RAM. On one of my machines (an AMD 6-core with 2 GTX 460s running 2 WUs each) - with a total of 9 WUs running at the same time - I have 4GB of RAM, and it never goes above 3GB in use total, including the OS (Win 7) and everything else that might be in memory. Including watching football in HD sometimes on NFL.com over the Web.

You might want to check out your CPU and GPU (if you are using it) temps, and, if it is an older machine(s), clean out all the dust bunnies and (maybe?) pet hair inside.
ID: 1395339 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1395367 - Posted: 26 Jul 2013, 22:10:15 UTC

Dunno 'cause your computers are hidden but this thread may be helpful:

http://setiathome.berkeley.edu/forum_thread.php?id=71953

The workaround got rid of errors for me. But again, with your computers hidden I've no idea if this is the problem you are suffering from.

If not you can unhide your computers and let the resident gurus work their magic, if you wish.
ID: 1395367 · Report as offensive
Profile Vicki
Avatar

Send message
Joined: 30 Nov 01
Posts: 65
Credit: 1,640,576
RAC: 46
New Zealand
Message 1395412 - Posted: 27 Jul 2013, 0:18:46 UTC

If you are using an older machine, my guess is that you are using a 32 bit version of windows. If this is the case, windows will only use a max of 4 gig of Ram & ignore anything above that. Still it is prob worth opening up your pc & dusting all the fans, the circuit boards, addon cards <such as your graphics card etc. You might also want to remove all your ram sticks & then put them back in, in case any have vibrated loose somehow & check your SATA / IDE Cables. Remember to use an anti static wrist strap when handling ram etc & clip it to the desktop case. I use a pastry brush to dust mny pc once a month or so & find it does make a difference. The other thing to try is perhaps running 1 less work units at a time & see if that helps. Good Luck
Rae
A city destroyed by an earthquake is an opportunity to Rebuild, redeign & make it a better place to be. Better, stronger, faster like the 6 Million Dollar Man
ID: 1395412 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1395427 - Posted: 27 Jul 2013, 1:06:33 UTC - in response to Message 1395412.  

Rae, Alex, and jravin,

It speaks well of you that you're trying to help
a fellow cruncher. Even though he hasn't provided enough
information to figure out his problem and has his computers
hidden.

He's been around long enough to know better and has indicated
that he's already left for greener pastures. I can only
wonder why he even bothered posting.
ID: 1395427 · Report as offensive
Profile Vicki
Avatar

Send message
Joined: 30 Nov 01
Posts: 65
Credit: 1,640,576
RAC: 46
New Zealand
Message 1395557 - Posted: 27 Jul 2013, 11:10:36 UTC - in response to Message 1395427.  

The advice may well help some other crunchers Bill, who might search the topic. It might even help him with his other project(s). The only other tyhing none of us suguested at the time was a project reset and or reinstall bonic.
A city destroyed by an earthquake is an opportunity to Rebuild, redeign & make it a better place to be. Better, stronger, faster like the 6 Million Dollar Man
ID: 1395557 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1395559 - Posted: 27 Jul 2013, 11:58:51 UTC
Last modified: 27 Jul 2013, 12:12:27 UTC

Beside an hardware related problem but nothing leave to that path because all works on E@H... Must be a software/configuration related problem

If you look at E&H the computer is not hidden.

That shows a GenuineIntel Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7](4 processadores) with [2] NVIDIA GeForce 9600 GT (512MB) driver: 30697 on Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)

The first thing i note is the driver version: 30697 not sure if it have the "sleeping bug" (more info on others threads the forum), so the first thing to do is update the driver. DL a latest driver from the nvidia site only! and allways do a clean instalation.

And don´t forget the BOINC can´t be installed as a service if you use GPU crunching on Win 7, just in case if that mistake was made reinstall the Boinc itself.

If the above is OK, in theory the souce of the memory allocation "problems" could be the size of the RAM on the GPUs (512k), if you try to run something else at the same time (that uses part of the VRAM) it will be easly exaust the RAM capacity of the GPU, even if he try to run 2 SETI WU at a time each GPU could do that.

If all is working on E@H why don´t work on SETI, then let´s think... E&H uses about 50MB of the VRAM to crunch it´s WU, compared with more than 256MB of SETI, so it´s easy to see what i say.

Of course more info is needed to debug but that could be a path to follow to fix the problem.

Some questions:

- Do you try to run more than 1 SETI WU at the same time on the GPU´s?
> If yes, change to 1 one WU at a time.

- In the SETI preferences you select "leave aplications at memory when suspended"?
> If yes, try by disable that.

- Are you try to run SETI in parallel of another program like a game, video editing software, renderizing, etc. that heavly uses the GPU VRAM
> If yes, stop boinc when you runs that kind of software

- You open a large numbers of windows when you run SETI in parallel?
> If yes, use less windows (2 or 3 max) or stop boinc when you do that.

Of course could be a compleate diferent problem but you need to give a help with more data, specily unhidde the host in SETI.

Just a tip: nobody could do nothing with the info if you unhide the host, not even your IP is actualy show to the rest of us.

Besides the number of hosts and it´s configurations, the only important thing you allow us to see is the Stderr output that help us to try to understand what is happening with your WU.

So if you need help, the first think to do is unhide the host, then the rest of the community could have some clues from the WU log and sure will try to give you a hand.
ID: 1395559 · Report as offensive
CAHess-Den

Send message
Joined: 20 May 99
Posts: 21
Credit: 2,575,162
RAC: 0
United States
Message 1396453 - Posted: 30 Jul 2013, 3:30:08 UTC - in response to Message 1395559.  

Well, I'll try to respond top to bottom, but frankly: It's NOT MY FAULT.
The fact that these failures are occurring with such regularity *all of a sudden/since v7 update* tells me changes were made in SETI@Home for which my machine "no longer fits the bill" for SETI WUs.

Why should I have to go debugging things given this fact?

Anywho, on with the show:

(Just rambling replies here, sorry....)

Thanks for putting up my system specs, juan. I was simply too lazy/careless about seeking a solution.

No heat problems or the like. 'Gets checked every-other month and air-dusted as necessary. It usually requires dusting about twice a year. (I build and service computers....)

Windows 7 Home Premium 64-bit SP-1 (Fully tweaked)

I've been running SETI@Home "nonstop" since 2000, and once the failures began (sometime after v7 update) I added Einstein soas to keep donating at a steady rate. That was Dec 1012. I do not know exactly when I first noticed the plethora of errors, but it's been several months, to be sure.

I'm going to try running S@H alone ("SETI@home alone" heh...) by halting Einstein tasks for a while to perhaps tease out the possibility of it simply being a "duality" problem. (Again, users should not have to confront this, especially since it was working for fine months....)

Nvidia's track record with both feature access and stability have never met my standards, and I load only their WHQL-certified drivers via Micro$oft Update, and even then I do not update every time. (They've dumbed-down the control so much....) I will take this opportunity to do an update after some testing of the previous.

From juan:
Some questions:

- Do you try to run more than 1 SETI WU at the same time on the GPU´s?
> If yes, change to 1 one WU at a time.
Me: When they run, I have run 3 cores and both GPUs (running 1 each) without trouble since early 2008.

- In the SETI preferences you select "leave aplications at memory when suspended"?
> If yes, try by disable that.
Me: 'Not gonna happen. I'll turn down/suspend my WU workload instead. Sorry. I do a lot of work on this computer too.... Sorry.

- Are you try to run SETI in parallel of another program like a game, video editing software, renderizing, etc. that heavly uses the GPU VRAM
> If yes, stop boinc when you runs that kind of software
Me: Once again, this would seem to be a shame to have to do, since it's all been working fine for so long. (Repeating: I'll try the "S@H-only" test for a while first tho.)

- You open a large numbers of windows when you run SETI in parallel?
> If yes, use less windows (2 or 3 max) or stop boinc when you do that.
Me: Well, I guess you're telling me to "stop trying to run WUs if you're doing something else too" and since things had been working fine for years and years....
I say again: Why should I have to adapt like that?

I see two thing happening here:
1) I see two distinct audiences of BOINC users here: "casual" and "serious." Serious users will expend more effort to shepherd their WUs along, while casual users will make do with what they're given, perhaps with a little tweaking. I would hope that all users are given the least-common-denominator instance, however, soas to engage as many of the "casual" users to participate as possible, since it likely the larger audience/greater opportunity.
2) Complexity of the system may possibly be outpacing the above, larger group's ability, and instead of contributing their "possible maximum" they end up having to contribute to the lower-level of "at least it works out of the box" without ever reaching their best (safe) potential.

To each their own, I suppose, eh?

I'll be back in a few weeks after trying the experiments above.

Thanks for all your input! It's great to have "volunteers" like you folks to help raise the bar and keep things running, user-wise!



ID: 1396453 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34761
Credit: 261,360,520
RAC: 489
Australia
Message 1396483 - Posted: 30 Jul 2013, 4:23:29 UTC - in response to Message 1396453.  

Before you come back can you un-hide your computer so that we can actually see the errors that you are generating (no one can see your important details, take a look at mine and see) as that will help heaps.

Cheers.
ID: 1396483 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1396561 - Posted: 30 Jul 2013, 8:29:37 UTC - in response to Message 1396453.  
Last modified: 30 Jul 2013, 9:26:45 UTC


Nvidia's track record with both feature access and stability have never met my standards, and I load only their WHQL-certified drivers via Micro$oft Update, and even then I do not update every time. (They've dumbed-down the control so much....) I will take this opportunity to do an update after some testing of the previous.

Just a tip, never allow M$ update update your nvidia driver, thats could easely be the source of your problems, why? the simple answer they don´t update by doing a clean instalation, they update the driver over the old one so a lot of °garbage° is lefted behind, and that allways causes problems. That makes old allready working systems stop working, exactly as you report.

If you do GPU work (like SETI or E@H) the correct and safe way to update the nvidia drives is by dl it from the nvidia site only and allways do a clean instalation to remove first all the old stuff. Newer nvidia drivers have an update program that helps you with that job. As a safe precaution, i even manualy compleate stop all boinc before i start to update the driver, yes i´m extremate, but you know to many hosts/configurations and sh*** allways happening.

Nobody said it´s your fault, what we say, when you ask for help the first thing you need to do is unhide the hosts so the comunity could see the error log of the wu´s and by looking that get somes clues that shows why the errors happens and sugest a fix of that error. We have a large comunity of volunteers, normaly if there is an error it´s happens in severals hosts so is easy to track and debug.

When I say try, i´m not sayin stop do what you do for years, is just to test a theory, the answer could provide the clue that could fix the problem. Things like, keep the task on memory when suspended is the normal way boinc works, and must work that way, but i´m looking for a clue that shows if the vram memory capacity is realy exausted. I try to folllow exactly what you said " I do a lot of work on this computer too" and "why i need to adapt". Newer drives uses more Vram memory than old ones so that could be the source of the problem.

And yes 512 mB could be very close to the edge in if you run simultaneusly another lot of stuff that uses vram while doing SETI GPU crunching and not when runing E@H, because E@H uses less vram memory to do it´s work, the only way to be sure is by testing, remember each system configuration/usage is unique.

As you could see there are to many "could be" so to debug we need to test each one at a time in order to try to fix the problem. I agree with you, in theory boinc must work without this stuff, but there are so many possible configurations and sources of problems, so sometimes a lot of digging is needed.

Have a good day/night
ID: 1396561 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1396587 - Posted: 30 Jul 2013, 11:58:14 UTC

It's very difficult to troubleshoot without seeing the errors firsthand, so please unhide. And be advised that from the moment of unhiding it takes a few hours for the machines to actually show up.
If you don't want to show all of your hosts you could provide the link to the problem host only.

As an offshot - and that's really not my area of expertise - have you tried reseating the card/clean the contacts?

Einstein and SETI utilise the cards differently - but usually it's not a problem to have the projects coexist.

I take it the problem is with the GPU work only? It makes a vast difference if you are looking at GPU errors or whole system errors.

There are still quite a few 9600 around so the card itself and the amount of VRAM shouldn't be a problem.

BTW what BOINC version? That is important information, too.

To be fair for most of the people it works out of the box. And a lot of those were it doesn't work, they either don't notice or walk away. Some will lurk and find the solution to their problem that way. A very few will post their problem. We can only try to do our best to help those few.

And of course even though the problems started with V7 the card may just be reaching the end of its lifespan, the heavier utilisation from optimised V7 accelerating the process. And that you don't get errors with Einstein doesn't mean that the card is sound - different app, diffrent stress.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1396587 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1396709 - Posted: 30 Jul 2013, 22:34:44 UTC - in response to Message 1396453.  

I've been running SETI@Home "nonstop" since 2000, and once the failures began (sometime after v7 update) I added Einstein soas to keep donating at a steady rate. That was Dec 1012. I do not know exactly when I first noticed the plethora of errors, but it's been several months, to be sure.

When you are talking v7, are you talking Boinc v7, or the Setiathome v7 apps?

Setiathome v7 only went live a couple of months ago, so if you had trouble last December, it wasn't with the setiathome v7 apps, it was with the v6 apps, or with Boinc v7.

Claggy
ID: 1396709 · Report as offensive
CAHess-Den

Send message
Joined: 20 May 99
Posts: 21
Credit: 2,575,162
RAC: 0
United States
Message 1397176 - Posted: 31 Jul 2013, 19:49:05 UTC - in response to Message 1396709.  

Well, I let all my Einstein tasks complete, and that went fine. (All my SETI work was already done/gone while the Einstein work ran, BTW.)

Einstein was on "no new tasks," hence my Project list was now empty.

Then I Resumed SETI and within minutes got scads of WUs. (Couple dozen maybe?)

Within minutes the first ten immediately cried "Computation error," so I aborted the remainder and Suspended SETI as well.

My machine should now be visible once I reboot my machine (shortly.)


ID: 1397176 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1397232 - Posted: 31 Jul 2013, 20:40:06 UTC - in response to Message 1397176.  

If you don't have any work for Seti on board at the moment just go and reset the project. If you have any complete but not yet reported work on board you'll want to report those first. Also untick v6/enhanced in your preferences.

As Alex Storey hinted your problems are caused by CUDA DLLs used by v6 app that aren't compatible with v7 cuda22/23 app. Resetting the project gets rid of those DLLs.
ID: 1397232 · Report as offensive
CAHess-Den

Send message
Joined: 20 May 99
Posts: 21
Credit: 2,575,162
RAC: 0
United States
Message 1397239 - Posted: 31 Jul 2013, 20:47:19 UTC - in response to Message 1397176.  
Last modified: 31 Jul 2013, 20:54:17 UTC

S@H now running smoothly with a fresh chunk of newly-allocated memory. (e.g. Apres reboot.)

2 on GPU
3 on CPU core
23 in queue
===========
28 total

Addendum:
>Sigh< Just saw the notice. Will abort 23 tasks and reset once existing work completes.

I'll bet that was the crux of all this.... >Slap-head<
ID: 1397239 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1397245 - Posted: 31 Jul 2013, 20:58:00 UTC - in response to Message 1397239.  

Will abort 23 tasks and reset once existing work completes.

You don't need to abort any tasks. You'll get them resent after the reset.
ID: 1397245 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1397250 - Posted: 31 Jul 2013, 21:00:49 UTC - in response to Message 1397239.  
Last modified: 31 Jul 2013, 21:11:47 UTC

>Sigh< Just saw the notice. Will abort 23 tasks and reset once existing work completes.

No need to abort anything. You don't have any cuda22 or cuda23 tasks in your queue. Everything you have now will run just fine. Just set for No New Tasks, then once your current queue is drained, deselect Seti Enhanced (v6) and then Reset the project. Then, any cuda22/23 tasks your get in the future should run just fine.

Edit: Oh, well. I guess Juha and I were too slow to stop the carnage!
ID: 1397250 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34761
Credit: 261,360,520
RAC: 489
Australia
Message 1397264 - Posted: 31 Jul 2013, 21:23:57 UTC - in response to Message 1397250.  
Last modified: 31 Jul 2013, 21:27:39 UTC

By the looks of it he's aborted the work (cuda32) that should've worked, his errors were cuda23 work with the old "too many exit(0)s" (which has a thread here somewhere).

Please let the work run, don't abort any more work.

[edit] Here's the thread, v7 cuda23 WUs getting ERR_TOO_MANY_EXITS [/edit]

Cheers.
ID: 1397264 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34761
Credit: 261,360,520
RAC: 489
Australia
Message 1397785 - Posted: 2 Aug 2013, 3:12:15 UTC - in response to Message 1397264.  

It looks like CAHess-Den may have got a handle on his problem as he's now producing valid cuda32 work instead of aborting it.

Cheers.
ID: 1397785 · Report as offensive
CAHess-Den

Send message
Joined: 20 May 99
Posts: 21
Credit: 2,575,162
RAC: 0
United States
Message 1399152 - Posted: 5 Aug 2013, 19:56:51 UTC - in response to Message 1397785.  
Last modified: 5 Aug 2013, 20:17:07 UTC

So after all that "cleansing" I ran several dozen-odd S@H WUs without a problem. Then I set "No new tasks" and let them all run to complete and Suspended S@H.

Then I ran Einstein@H. All I got were Computation errors, so I let that play out and Suspended E@H.

Just now I started up S@H again, and now all I'm getting are Computation errors again.

I'm GUESSING this is a memory problem, so I'll let the remaining S@H's crap out/complete and then reboot my machine. (As opposed to just shutting down BOINC and seeing what happens.)

Since I only picked up E@H to fill in the time when S@H was down so much (back months ago now), and S@H seems to be reliable again, I'll probably not make use of E@H until/unless S@H breaks down again.
ID: 1399152 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Well, too many "Computation Error" tasks abound lately. Time to give up.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.