Not requesting tasks

Message boards : Number crunching : Not requesting tasks
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895837 - Posted: 17 May 2009, 9:23:33 UTC

I have a core i7 that suddenly doesn't request new tasks. I have only 3 spare tasks left so i'm in a little hurry :)

I've a clean cc_config, i've reset local overrides and no additional idea what i can do
ID: 895837 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 895853 - Posted: 17 May 2009, 10:37:47 UTC - in response to Message 895837.  

One of your i7's has had a run of the bad B3_P1 astropulse_v5 WU's,
Just need to get replacement WU's, check to see if No New Tasks is set,
does it have any tasks from another project?

and the other i7 is now just getting compute errors with - exit code -202,
you'll need to have a look at that one.
Boinc Faq Service says:

ERR_SHMEM_NAME -202

ftok() only works if there's a file at the given location.


Claggy
ID: 895853 · Report as offensive
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895865 - Posted: 17 May 2009, 11:20:19 UTC
Last modified: 17 May 2009, 11:21:36 UTC

The exit code -202 i7 was my error when i tried to run two instances of BOINC because of major scheduling problems between Seti and Milkyway, this PC is now running better than before.

The other one doesn't request work. Theres Milkyway attached too, but as Milkyway doesn't provide work all the time there are sometimes tasks and sometimes not but Seti doesn't request work either. I resetted all settings i'm aware of. It's set to request work and i believe it has no local settings at all. I changed the venue to be shure that isn't the problem. I think i missed something.

If switched to version 6.6.28 again - i had some problems with it when it was a dev-version. But the problem persists, still not requesting work
ID: 895865 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 895866 - Posted: 17 May 2009, 11:25:13 UTC

If you're really stuck and wanting answers, you could try setting <work_fetch_debug> in cc_config.xml.
ID: 895866 · Report as offensive
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895867 - Posted: 17 May 2009, 11:39:22 UTC

I did that some minutes ago and couldn't use the info but there was a lot of info about debts and so i remembered that it's possible to reset them and so i did even when the estimated times looked good. And see ... Seti is requesting work again.

Thanks for your efforts :)
ID: 895867 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 895873 - Posted: 17 May 2009, 11:59:41 UTC - in response to Message 895867.  

Good to hear you're back in business.

For future reference, 'debt' and 'estimated times' come from different parts of the BOINC universe - they aren't related to each other.

Debt controls which project will run next (short term debt - STD) or which will fetch work next (long term debt - LTD). You can see what they are by selecting a project in the 'Projects' tab and clicking the 'Properties' button - where STD is called "CPU scheduling priority" and LTD is called "work fetch priority" with separate entries for CPU and CUDA.

Estimated times are influenced by Duration Correction Factor (DCF) for all applications within a project, and by those fiddly FLOPs calculations in app_info.xml to get the balance between the different applications within the SETI project.
ID: 895873 · Report as offensive
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895884 - Posted: 17 May 2009, 12:48:51 UTC

But aren't they based on each other or at least debts are based on the DCF/runtimes especially the work fetch priority as it's needed to know how much work is done in a given time to determine if there's need for more work. That would make sense in my case too, as it looks like Milkyway had a such high priority that Seti hasn't any priority to get work.

i don't know if i made clear what i want to say - i see a calculation chain:
FLOPS -> DCF -> Estimated times -> STD/LTD

OK and there i see my error STD/LTD depends on runtimes but not vice versa

I hope i'm right on a cursory view
ID: 895884 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 895907 - Posted: 17 May 2009, 13:49:06 UTC - in response to Message 895884.  
Last modified: 17 May 2009, 13:50:39 UTC

But aren't they based on each other or at least debts are based on the DCF/runtimes especially the work fetch priority as it's needed to know how much work is done in a given time to determine if there's need for more work. That would make sense in my case too, as it looks like Milkyway had a such high priority that Seti hasn't any priority to get work.

i don't know if i made clear what i want to say - i see a calculation chain:
FLOPS -> DCF -> Estimated times -> STD/LTD

OK and there i see my error STD/LTD depends on runtimes but not vice versa

I hope i'm right on a cursory view


It's more like (simplified):

(FPOPs / FPBM) = Estimated Runtime
|
v
Actual Runtime / Estimated Runtime = TDCF (gets used by the backend for work assignment calcs)
|
v
Estimated Runtime * TDCF = Displayed Estimated Runtime

Where FPOPs is the Floating Point Operations estimate sent by the project and FPBM is the Floating Point Benchmark.

As Richard mentioned and you surmised, Debt is based strictly on CPU Time for 5x CC's, and some kind of evil mutation of Wall Time, CPU Time, Phases of the Moon, and voodoo for later 6x's. :-)

Alinator
ID: 895907 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 895912 - Posted: 17 May 2009, 14:07:07 UTC - in response to Message 895907.  

There's the old saying in the 80/20 rule, 80% of an outcome is achieved by 20% of effort. That last 20% of outcome to achieve perfection takes up 80% of available time ..... I do sometimes wonder if overall too much effort is placed on total perfection, whereas knocking off 80% of most problems quickly produces a quicker more productive overall total "solution" - then have a rolling program of improvements knocking off the detail as time goes on.

Rhetorical ramblings ..... :) ..... these things are never easy to get the balance right and always easy to apparently criticise .... I do sometimes wonder about current strategy though with debt calculations, hoops and loops over STDs/LTDs yaddie yadda .... all good stuff, but is the outcome of resolving the "problem" worth the effort when set against all the other major issues that lurketh ?? Or do we have a minor problem whose solution has gone overboard?

ramble ramble ..... mutter mutter ..... well it is a Sunday :)

Regards
Zy


ID: 895912 · Report as offensive
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895929 - Posted: 17 May 2009, 15:03:30 UTC - in response to Message 895907.  

... and some kind of evil mutation of Wall Time, CPU Time, Phases of the Moon, and voodoo for later 6x's. :-)


That exactly fits my observation and i use crystal ball v2 for my observations, so you can consider it facts :D
ID: 895929 · Report as offensive
Profile elbea64

Send message
Joined: 16 Aug 99
Posts: 114
Credit: 6,352,198
RAC: 0
Germany
Message 895979 - Posted: 17 May 2009, 17:14:33 UTC
Last modified: 17 May 2009, 17:16:14 UTC

OK Seti runs fine again but with 6.6.28 Milkyway stops after the initial 2 tasks - had this when 6.6.28 was a development version

reinstall 6.6.20 gives some errormessage - had this already when overcoming another issue

i'm now back/forward to 6.6.23 and it works - i suspect until my Seti WUs are done, then have to do the holy crap again

I should ask my crystal ball if there's any rational reason to do BOINC any further as i can hardly imagine any :(
ID: 895979 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 896802 - Posted: 19 May 2009, 1:58:31 UTC

If a project uses more resource time than it was allocated by its resource share, its debts drop. If the LTD is low, that means the project has used more than its share of resource time recently, and needs to allow the other project to get some time in.

The evil part in debt calculation for 6.6.x is the conversion between estimated CUDA and estimated CPU time (there is a major fudge factor there). Other than that it is pretty straight forward wall time for the CPU.


BOINC WIKI
ID: 896802 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 896828 - Posted: 19 May 2009, 3:08:00 UTC

Have they fixed the scheduler issue with v6.6.x and long running tasks, e.g. CPDN? I was running x.23 for a while and my cache almost ran dry on all my machines running CPDN because of some bug in the scheduler (at least, that's what I thought I read), and I was wondering if its safe to upgrade to x.28 yet?
ID: 896828 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 896926 - Posted: 19 May 2009, 11:58:30 UTC - in response to Message 896828.  

Have they fixed the scheduler issue with v6.6.x and long running tasks, e.g. CPDN? I was running x.23 for a while and my cache almost ran dry on all my machines running CPDN because of some bug in the scheduler (at least, that's what I thought I read), and I was wondering if its safe to upgrade to x.28 yet?

.28 is slightly better than .23, whole lot better than .20 ...
ID: 896926 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 897102 - Posted: 20 May 2009, 2:45:32 UTC - in response to Message 896926.  

Have they fixed the scheduler issue with v6.6.x and long running tasks, e.g. CPDN? I was running x.23 for a while and my cache almost ran dry on all my machines running CPDN because of some bug in the scheduler (at least, that's what I thought I read), and I was wondering if its safe to upgrade to x.28 yet?

.28 is slightly better than .23, whole lot better than .20 ...


But is it better than 6.2.19?
ID: 897102 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 897153 - Posted: 20 May 2009, 5:08:19 UTC - in response to Message 897102.  

Have they fixed the scheduler issue with v6.6.x and long running tasks, e.g. CPDN? I was running x.23 for a while and my cache almost ran dry on all my machines running CPDN because of some bug in the scheduler (at least, that's what I thought I read), and I was wondering if its safe to upgrade to x.28 yet?

.28 is slightly better than .23, whole lot better than .20 ...


But is it better than 6.2.19?

Having never run that version I have no idea which suite of bugs you tangle with...

If you are doing CUDA, my experience says use 6.5.0 or 6.6.28... operationally 6.5.0 is probably better in that work fetch is not as badly hammered... 6.6.28 on the other hand more correctly shows elapsed times of CUDA work so it is easier to see completion times; and the Resource Scheduler is a little more stable with wide systems. The bad news is that there are significant issues remaining some of them from ancient history ...

The good news is that the stalwart developers are rapidly discounting all bug reports so they don't have to fix anything ... your opinion may differ ...
ID: 897153 · Report as offensive
Profile FalconFly
Avatar

Send message
Joined: 5 Oct 99
Posts: 394
Credit: 18,053,892
RAC: 0
Germany
Message 897219 - Posted: 20 May 2009, 11:35:16 UTC - in response to Message 897153.  
Last modified: 20 May 2009, 11:44:47 UTC

Based on my experiences, here's a list of things to look after and possible tweaks :

Average turnaround time : x.xx days
Displayed in the SETI Computer details tables, this value normally corresponds to whatever cache setting a User has set.
However, losing a chunk of Workunits (i.e. system failure, HD crash or alike) apparently negatively affects the value. It's only my suspicion, but having lost a whole bunch of Astropulse workunits resulted in one of my systems Average turnaround time to rocket up beyond 10 days - resulting in almost no cache being maintained anymore and single cores beginning to run empty. I assume that for whatever reason the entire lost workunits deadlines were counted into the previous Average turnaround time, significantly increasing the value.
Symptom : BOINC states "requesting 0 seconds of work" despite CPU cores starting to run idle even with no other projects to run.
It took a week of normal system operation to return the Average turnaround time back to its normal value (~1 day). Ever since, the Cache problems have vanished again.
Letting the system return a high number of Mainbeam workunits only for a day should reduce the value and resulting symptoms sufficiently to allow processing of a full Astropulse cache again the next day. But other than quickly returning series after series of results, nothing can be done about it.


client_state.xml

<host_info>
<p_fpops>2228509013.328131</p_fpops>
<p_iops>4833206207.789764</p_iops>
See if the Benchmark values are correct. If for whatever reason the CPU remained in a power saving mode during the benchmark, it will only request new work according to the apparently lower performance. Manually triggering a benchmark should solve the problem.
If need be, manually increasing the values are a functional way to increase work cache - at least until the next automatic benchmark runs.


<time_stats>
<on_frac>0.950683</on_frac>
<active_frac>0.999722</active_frac>
These numbers are muliplied with the desired cache size. If they are too low for whatever reasons (longer vacation with the Host turned off or BOINC simply not running), can significantly reduce Cache. Values close to 1.0 are ideal.

<project>
<master_url>http://setiathome.berkeley.edu/</master_url>
<duration_correction_factor>0.177394</duration_correction_factor>
In case BOINC returns workunits with erroneous (or even negative) cpu_time i.e. due to a series of faulty workunits, this value can easily go havoc very fast.
Value around 1.0 should be normal, with modern hosts or when using optimized Apps, should be significantly below 1.0.


<long_term_debt>-2229.060553</long_term_debt>
This can be a naughty one. One should recheck that no Project a Host is attached to contains critical values (2000000 or higher).
Unless this year-old bug is fixed in BOINC 6.x, will effectively cripple the Client Scheduler when exceeding approx. 2500000s (BOINC 5.x, BOINC 4.x experienced problems exceeding approx. 1500000s) for any attached project.
Initially the LTD bug will reduce total cache in a frenzy attempt to favor refilling work for a work-starved project (i.e. LHC), eventually this will essentially prevent work being downloaded for any but the LTD starved project when left unattended.
Setting LTD (and short term debt) to 0 for all projects will solve potential, related deadlocks.


cc_config.xml Trick
<cc_config>
<options>
<ncpus>10</ncpus>
</options>
</cc_config>

Manually boosting the number of CPUs for a short time (i.e. allow BOINC to download, then reset back to normal value or delete file) is a dirty workaround but very useful to refill an empty cache.

Remember that BOINC needs to be shutdown before editing any config Files
---------------------------------------------------------------
That's all the potential troublemakers I'm aware of and all tweaks I know to restore a normal cache size on a troubled host.
Hope that helps...
ID: 897219 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 897396 - Posted: 20 May 2009, 20:38:02 UTC - in response to Message 897153.  

Have they fixed the scheduler issue with v6.6.x and long running tasks, e.g. CPDN? I was running x.23 for a while and my cache almost ran dry on all my machines running CPDN because of some bug in the scheduler (at least, that's what I thought I read), and I was wondering if its safe to upgrade to x.28 yet?

.28 is slightly better than .23, whole lot better than .20 ...


But is it better than 6.2.19?

Having never run that version I have no idea which suite of bugs you tangle with...

If you are doing CUDA...


No, I'm exclusively ATi.
ID: 897396 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 897445 - Posted: 20 May 2009, 22:19:16 UTC - in response to Message 897396.  

No, I'm exclusively ATi.

Then one begs pardon ... :)

I would really like to see OpenCL hit soon so that we can start the video card wars on mine is bigger ...

Interesting news, looks like MW may be close to releasing their application ...
ID: 897445 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 897463 - Posted: 20 May 2009, 22:42:01 UTC - in response to Message 897445.  

No, I'm exclusively ATi.

Then one begs pardon ... :)


Not necessary. I know you try your best to be helpful.

I would really like to see OpenCL hit soon so that we can start the video card wars on mine is bigger ...


That would be nice. Then I can finally say, "I see your Schwartz is as big as mine!"
ID: 897463 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Not requesting tasks


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.