Aborted Tasks


log in

Advanced search

Message boards : Number crunching : Aborted Tasks

Author Message
Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1296101 - Posted: 17 Oct 2012, 8:58:48 UTC

I've had to abort over 800 ATI tasks on one of my machines.

Over the last day or 2 , 800 ATI tasks suddenly downloaded, which is very strange as I only usually have at most 20 waiting and 3 running.

I noticed later that these tasks began to hang at 100% in Boinc Manager.

I'm not prepared to mince around with this, as 800 ATI tasks should not have downloaded in the first place.

I normally only get anything from 1-30, with my settings of 5.00 and 0.01 days of work.

CPU tasks are unaffected.

My other pc processing CPU and ATI tasks are also unaffected.

I've therfore aborted these tasks, and will await further tasks, and see how they progress.

Never had this occurence before.

Apology to Wingpeople, but I simply have not got time to faff around with this

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1296488 - Posted: 18 Oct 2012, 8:34:13 UTC

It happened again

Whats going on here.

Another 800 ATI GPU tasks downloaded today.

I should only get around 30 maximum.

2 of thses 800 tasks have reached 100% and just stay there.

They appear to hang preventing futher ATI tasks from starting.

I'm in the dark here, so I've reset the project this time, and wait to see what happens tomorrow.

Setis servers wont allow any more saying the computer has finished a daily quota of 1 tasks

What-ever that appears to mean ?

Can anyone throw any light on this annoying problem

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,745,454
RAC: 55,475
United Kingdom
Message 1296490 - Posted: 18 Oct 2012, 8:59:26 UTC

You probably need to set the <sched_op_debug> logging flag in cc_config.xml, so you can see in a little more detail how much work (how many seconds) your computer is requesting, and how long the work you receive in response to those requests is estimated (again, by your computer) to take.

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1296573 - Posted: 18 Oct 2012, 16:02:36 UTC

800 isnt "a lot". I assume your GPU is getting these WU's because seti thinks it can crunch them before the deadline. A newer GPU can run through 300+ a day.
I assume your's isnt the latest greatest.
If you think you are getting to many WU's, then reduce your cache to 1-2 days. If that is too much then change the "contact server every" to 1 day or less.

Just a demonstration. I have 1 PC that currently has 4800 WU's in progress. My PC's with GPUs all have large caches.

____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1296706 - Posted: 18 Oct 2012, 21:30:25 UTC

Guys

I appreciate what has been said, but whats has happened of late, is abnormal for my set-up, and these taks hang at 100% preventing other tasks from running.

I never get this many ATI GPU tasks.

The other computer is almost identical to this one, and its running as this one was until a day or 2 ago.

I had changed nothing on my computer before these tasks downloaded

Anyway I've taken drastic action on this computer by detatching from the project, deleting all Boinc and Seti files including those in the program file and data directories.

Reinstalled Boinc, reattached Seti, and optimised it for Ati gpu tasks with my backed copy of L*****c.

I cant get any Ati tasks at present as Seti server is saying "this computer has finished a daily quota of 10 tasks"

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1296859 - Posted: 19 Oct 2012, 11:02:13 UTC - in response to Message 1296706.

Don't stress. I've aborted 5000 at least twice and 3000 another time due to sheer stupidity on my part.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1296888 - Posted: 19 Oct 2012, 14:19:27 UTC

The clean install of Boinc etc, made no difference.

600+ ATI GPU tasks downloade this morning on this computer.

I've set the " no more tasks" parameter till I get rid of this lot.

See what happens when I re-enable this parameter in the future.

:)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,745,454
RAC: 55,475
United Kingdom
Message 1296890 - Posted: 19 Oct 2012, 14:35:07 UTC - in response to Message 1296888.

What did the event log say about the amount of work requested and received?

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1296923 - Posted: 19 Oct 2012, 17:17:04 UTC

Unfortunately Richard the event log has been reset.

The log only goes back 2 hours, and there's no work request data logged, because i'm now not requesting any.

When I start requesting work again , I'll see whats been logged

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8491
Credit: 49,745,454
RAC: 55,475
United Kingdom
Message 1296928 - Posted: 19 Oct 2012, 17:31:13 UTC - in response to Message 1296923.

Unfortunately Richard the event log has been reset.

The log only goes back 2 hours, and there's no work request data logged, because i'm now not requesting any.

When I start requesting work again , I'll see whats been logged

Older messages, predating the last BOINC restart, can usually be found in the file stdoutdae.txt in the BOINC data directory (for location, see the startup messages in the current message log).

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,371,128
RAC: 25,178
United States
Message 1297002 - Posted: 19 Oct 2012, 22:05:54 UTC - in response to Message 1296923.
Last modified: 19 Oct 2012, 22:06:49 UTC

Just what manufacturer make and model is the graphics card on the
machine in question?

Example Gigabyte GTX560Ti GV-N560OC-1GI

Just what do you have set for "Minimum work buffer"?

Just what do you have set for "Max additional work buffer"?

Christoph
Volunteer tester
Send message
Joined: 21 Apr 03
Posts: 76
Credit: 262,351
RAC: 0
Germany
Message 1297004 - Posted: 19 Oct 2012, 22:27:15 UTC
Last modified: 19 Oct 2012, 22:29:44 UTC

If I have tasks which continue running after reaching 100% (No, I don't mean just 10 Minutes) I have to reboot my machine.
If I only close BOINC and restart it it will resume from last checkpoint but still not complete the task.
I will write to the Alpha mailing list about it tomorow or so to find out which log flags to set.

And then wait for it to happen again in a couple of weeks..........

Addition: did the estimated time to completion go down significantly? Then you need to wait and complete at least 10 tasks (which need to validate) for that the server will re-calculate the time correction factor.
____________
Christoph

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1297151 - Posted: 20 Oct 2012, 11:44:24 UTC

I'm using Radeon HD 6670

Min work buffer is 5.00 days, Max additional work buffer is 0.01

I've reset the app_info count settings from .33 back to default 1

It seems to be reporting ok now, but I did get another 200 ATI files today.

Funny my other computer has the same grahics card, virtually the same setup, and that is running 3 ATI tasks ok using count settings .33

This computer was also running ok until a few days ago using the count setting of .33

I'll leave both running with a default count setting of 1

See if things settle down, then ajust the count settings back to .33

Running 3 ATI tasks on the card, didnt seem to affect my set up, other than cause video to stutter occasionally.

I therefor included settings in BOINC exclusive apps, to suspend BOINC if I am using iexplore.exe, vlc.exe and googleearth.exe.

Temperatures are about 48C for the other computer GPU and about 56 for this one.

CPU temps are about the same as the GPU


bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,371,128
RAC: 25,178
United States
Message 1297365 - Posted: 20 Oct 2012, 22:50:13 UTC - in response to Message 1297151.

I'm using Radeon HD 6670

Min work buffer is 5.00 days, Max additional work buffer is 0.01
(sniporest)


Ok, there was a huge shorty storm a while back, which may have caused your experience. If you card was capable of crunching 7 work units an hour set to crunch 3 at a time, 24 hours a day with a 5 day cache, 800+ work units in your cache is a reasonable number.

My wild ass guess is the servers saw you going like gangbusters through work units (shortys remember) and at the same time something was done at the servers to increase the number of downloads per request. There were posts of people getting 100+ work units per download so what you saw was your cache FINALLY being filled up to its maximum for 5+ days worth of work.

I can see where that might be unsettling to someone who never has experienced that sort of thing before.

What you can do is set from .33 to 1 for how many work units to run at one time (which I see you have done). This should drop the number of downloads by 2/3rds BUT NOT IMMEDIATELY. There is some time lag in the system that would make this take a few days. Quicker would be to set your cache down to 1 day max and .01 additional. This will probably cause Seti to go into high priority mode to clear out the excess. DON'T WORRY about it, just be patient and give Boinc a week to settle down.

What the problem really was was not that you got a metric buttload of work units all at once but that you weren't getting all the work units you were previously requesting in the first place.

To figure out the cache you want take the number of work units your gpu and cpu can do per hour (on average), multiply that by how many hours a day you are going to crunch work units.
Say 5 wu per hour for gpu and 2 per cpu. (5 + 2)X 24hrs= 168 per day
If you want 5 days cache expect to see about 5 X 168= 840 work units in your cache if everything works as it should.

In the mean time patience, there is quite a bit of lag built into Boinc.
Don't worry about High Priority.
Don't worry about missed deadlines.
Don't worry about wingmen.
What ever you don't get done will be done by somebody else eventually.
This is not a race. Just give your system a few days to weeks to settle down. As long as you are not producing errors everything will adjust itself eventually.

Now here is where others tell you where I'm wrong, The good thing is that between me and them is that you'll get the answer you want. :)

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1297546 - Posted: 21 Oct 2012, 12:06:07 UTC

Thanks for those comments Bill

I'll run with a GPU count of 1 for a week, and then see if I can move to .33 again.

I was more worried about WUs that reached 100% staying there and not moving from BOINC Manager preventing others from starting.

Anyway, I'm currently running with app_inf set to count 1 and everthing is running and reporting ok.

Again thanks Bill

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,371,128
RAC: 25,178
United States
Message 1297688 - Posted: 21 Oct 2012, 19:01:00 UTC - in response to Message 1297546.

Thanks for those comments Bill

I'll run with a GPU count of 1 for a week, and then see if I can move to .33 again.

I was more worried about WUs that reached 100% staying there and not moving from BOINC Manager preventing others from starting.

Anyway, I'm currently running with app_inf set to count 1 and everthing is running and reporting ok.

Again thanks Bill



You're welcome.

I would wait at most an hour on the ones that seem to be hung at 100% then abort them. maybe it's the work unit or maybe it's machine but no sense letting it run for ever. Like I said it will be crunched by somebody eventually if it's crunchable. If all of the work units hang then I'd look into a computer solution. If it's a bad work unit a history of more than one person aborting it will show it. Aborting a work unit is not a crime.

musicplayer
Send message
Joined: 17 May 10
Posts: 1431
Credit: 687,186
RAC: 0
Message 1297698 - Posted: 21 Oct 2012, 19:44:59 UTC

Apparently too late to validate.

http://setiathome.berkeley.edu/workunit.php?wuid=1091613861

http://setiathome.berkeley.edu/result.php?resultid=2661068021

Message boards : Number crunching : Aborted Tasks

Copyright © 2014 University of California