Can't get work.

Message boards : Number crunching : Can't get work.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119729 - Posted: 6 Jun 2005, 9:17:12 UTC

Odd. I woke this morning to find all but my CPDN wu sitting 100% finished and "Uploading" all just sitting there. I discovered the connection from my router to the net was dead. I restarted it and all the wu's for all the projects uploaded.

Thing is, none have sent me any new work? I have just the CPDN unit on my machine now, which being an HT machine means that one CPU is running the Idle process.

I have tried Updating and Resetting without any result, the messages look normal. I find it hard to believe that all 4 projects have nothing for me right now.

Any ideas?



Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119729 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119738 - Posted: 6 Jun 2005, 10:16:22 UTC
Last modified: 6 Jun 2005, 10:17:06 UTC

have you checked with CPDN to see if they're down? do you see anything under teh messages tab that shows you requested work?
ID: 119738 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119740 - Posted: 6 Jun 2005, 10:36:50 UTC

You misunderstand. I have a CPDN wu, it's been running for months. What I don't have are any S@H P@H Einstein or LHC wu's, (although LHC has been quiet for a few days anyway).
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119740 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119746 - Posted: 6 Jun 2005, 11:08:44 UTC

it appears I did misunderstand. sorry, haven't had coffee yet (still don't). Do you know how to find your LTD numbers? if so, which projects have a "positive" LTD? those should be the ones requesting work. John knows about the HT problem. The current scheduler will get WUs from any project, but only after it's completely dry.

The LTD numbers aren't easy to get, but they're found in the "Clientstate.xml" file.
ID: 119746 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119748 - Posted: 6 Jun 2005, 11:16:54 UTC
Last modified: 6 Jun 2005, 11:19:48 UTC

I guess you mean the debt field in the .xml file. All 5 projects have this set to zero. If it was a different field, just say, I can look.

I have the 4.25 BOINC core.

Can I edit the .xml file to make these fields positive?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119748 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119750 - Posted: 6 Jun 2005, 11:17:49 UTC
Last modified: 6 Jun 2005, 11:18:26 UTC

The latest boincview has LTD listed for you, or you can download a small utility called boincdv.

debt is different from Long term debt

ID: 119750 · Report as offensive
Profile RPMurphy
Volunteer tester
Avatar

Send message
Joined: 2 Jun 00
Posts: 131
Credit: 622,641
RAC: 0
United States
Message 119752 - Posted: 6 Jun 2005, 11:23:05 UTC

As for P@H, they have a small side post on the front page mentioning a problem for windows clients (all but mfold v 4.28). If you are on Windows, and are running anything but mfold v 4.28, they say to abort the wu's.

My P@H WU's under 4.33 have been crunching for over 3 days with no progress.

Not a clue on the other two.
It is a sad sad day when someone takes your spoon away from you...
ID: 119752 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119754 - Posted: 6 Jun 2005, 11:27:51 UTC
Last modified: 6 Jun 2005, 11:29:19 UTC



The BOINCdv gave this output.

I reverted to MFold 4.28 some time ago. They had problems with the screensaver interface - it lost touch with BOINC and just hung.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119754 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119755 - Posted: 6 Jun 2005, 11:28:47 UTC

Adrian, I dredged this reply from JM7 to another user up from the past, it might answer some questions. I'm also trying to find a link to an official Boinc explanation of the new scheduler for you.

John McLeod VII
Volunteer developer
Volunteer tester

Joined: Jul 15, 1999
Posts: 2340
ID: 9915
Credit: 62,820
RAC: 58
Message 108016 - Posted 6 May 2005 0:37:43 UTC - in response to Message ID 107647.


I've started this thread since the other one was getting too big to wade through and I couldn't easily see the answer to my question(s). I know 4.35 is development.
------------------------
A question on the debt, long_term_debt and resource share.

On my laptop I have 37 LHC wu's, three SAH wu's and no predictor wu's with the following set up.

Project Resource Share
LHC 500
Predictor 250
SETI 250

Client_state.xml show the following for debt and long_term_debt.

Project Debt Long_term_debt
LHC 42440 28736
Predictor 0 -15053
SETI 0 -13683

Questions.

1. Are these debt numbers in seconds?


Yes.



2. BOINC has not downloaded Predictor in 24hrs and I have none in the cache (which defeats the purpose of a cache). Will BOINC download work when the long_term_debt gets to be above 0?


Yes.



3. Why does BOINC allow a cache to run down to zero work - doesn't this defeat the purpose of BOINC?


It doesn't really defeat the purpose of BOINC. This is done to ensure that deadlines are all met. BOINC is still doing multiple WUs during the same day. BOINC has never guaranteed that all projects would have work at the same time. One side effect of this is that if the debt is small and negative when the WU is completing, the report will be made fairly quickly.



4. How is the long_term_debt calculated?


The same way that debt is with the exception that LT debt is shifted so the average is always 0.



5. How often are the debt numbers calculated? Is it every time a project update happens?


Once a second. Along with debt and a query whether anything has to be done like download work, upload files, report work... There is a polling loop that does all of this once per second. It was already there, I just made a couple of modifications.



6. Do the long_term_debt numbers mean that BOINC will continue crunching LHC for another 28736 seconds?


No. The short term debt in the state file determines which runs next (highest wins) unless the CPU needs to be in crunch earliest mode in which case the earliest deadline is used instead.



7. How is the resource share used with the debt calculations?


The resource fraction and the CPU time used for each project determine the offset in the debt. Resource fraction for a project = resource share for the project / total resource share. debt += wall time * resource fraction for project - CPU time for project. All of the debts are shifted after they are all recalculated. ST debts are shifted so the smallest is 0, LT debts are shifted so the mean is 0.



8. When BOINC eventually downloads predictor work, it's deadline period is 7 days whereas both LHC and SETI are 14 days, since the predictor deadline will be sooner than both LHC and SETI, will BOINC only crunch predictor until the predictor wu's run out and then only crunch LHC and SETI to reduce their long_term_debt and not download any more predictor since it will have a negative long_term_debt?


Not necessarily. It depends on whether the CPU scheduler determines that a deadline is in danger of being missed if it does not use Earliest Deadline First mode. Normal mode (highest ST debt next) is preferred.



9. Will BOINC debt scheduling only really work properly once the LHC and SETI deadlines are within the same time frame as a predictor wu and then project resource sharing and wu caching will work effectively ? So in effect BOINC will be bouncing between having pred work to no pred work until the LHC and SETI wu's are 7 days old?


It should work just fine in just about all cases. There are a couple of pathological cases that are not protected, but I am hoping that these are rare. If a project is allowed to download work, and there is already some work that has a deadline of say 7 days, and the current work will take 2 days, and the new work that is downloaded will take 7 days and has a 7 day deadline, there will be trouble.

In general normal mode where the highest ST debt gets the next time slot (think about the way that 4.25 works, and this is what normal mode is). Only if there is a danger of a WU being late will Earliest Deadline First be used.

The criteria for Earliest Deadline First:
1) A deadline is earlier than 24 hours from now.
2) A deadline is earlier than 2 * the queue size. This allows modem users to report work on time more often.
3) If you order the WUs by deadline and start adding the remaining processing times, is there any place in the chain where the sum is greater than 0.8 * (the deadline - now).

The criteria for no more work from anywhere.
1) See #3 above.
2) More than a maximum number of projects on the host (default is 5, and it can by changed by editing the global_prefs.xml file).
3) If the sum of required time fractions is greater than 0.8. The time fraction for a WU is the processing time remaining / the wall time remaining.



Live long and crunch!



ID: 119755 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119757 - Posted: 6 Jun 2005, 11:32:35 UTC

I'm going to be gone a while trying to find the link.

It doesnn't show any LTD????? what's up with that? LTD is the important numbers here. You could just open the clientstate.xml and scroll to find them.

brb
ID: 119757 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119760 - Posted: 6 Jun 2005, 11:39:53 UTC

I couldn't find what I was looking for, but this might be better it's the Boinc WIKI
ID: 119760 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119761 - Posted: 6 Jun 2005, 11:44:21 UTC
Last modified: 6 Jun 2005, 11:49:30 UTC

OH hell, I'm misunderstanding again. You are using Boinc Core client 4.25? that's why there's no LTD numbers. The new scheduler was implemented with 4.35 and higher. Damn.

OK ..... hmmmm 4.25......forget everything i've done so far.

you have an old version. are you trying to manually update each project from the project page? if so what do you see in the Message log after each attempt? is it requesting work?

[edit] man do I feel stupid. I made a wrong assumption from the start, did a boatload of searching and pasting for nothing. Sheesh, I'm going to sit back and have my coffee.[end edit]
ID: 119761 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119781 - Posted: 6 Jun 2005, 13:34:45 UTC

Hope the coffee was good! When I press Update for the projects, the messages are...

06/06/2005 11:17:34|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
06/06/2005 11:17:35|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
06/06/2005 11:17:35|climateprediction.net|Host location: home
06/06/2005 11:17:35|climateprediction.net|Using your default project prefs
06/06/2005 15:30:32|Einstein@Home|Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
06/06/2005 15:30:34|Einstein@Home|Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
06/06/2005 15:30:34|Einstein@Home|Host location: home
06/06/2005 15:30:34|Einstein@Home|Using your default project prefs
06/06/2005 15:30:36|LHC@home|Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
06/06/2005 15:30:37|LHC@home|Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
06/06/2005 15:30:37|LHC@home|Host location: home
06/06/2005 15:30:37|LHC@home|Using your default project prefs
06/06/2005 15:30:39|ProteinPredictorAtHome|Sending request to scheduler: http://predictor.scripps.edu/predictor_cgi/cgi
06/06/2005 15:30:41|ProteinPredictorAtHome|Scheduler RPC to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
06/06/2005 15:30:41|ProteinPredictorAtHome|Host location: home
06/06/2005 15:30:41|ProteinPredictorAtHome|Using your default project prefs
06/06/2005 15:30:44|SETI@home|Sending request to scheduler: http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
06/06/2005 15:30:45|SETI@home|Scheduler RPC to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
06/06/2005 15:30:45|SETI@home|Using your default project prefs

I do not see it requesting work. Normally it says something like "May run out of work in 0.1 days, requesting x seconds of work" where x is a highly variable number.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119781 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 119783 - Posted: 6 Jun 2005, 13:38:06 UTC

4.25 does not have or use LT debt. There is some other problem.


BOINC WIKI
ID: 119783 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119786 - Posted: 6 Jun 2005, 13:57:51 UTC
Last modified: 6 Jun 2005, 14:12:49 UTC

As an experiment, I backed everything up, then edited the client_state.xml file to have a positive value in the debt field for P@H, once I restarted, it straight away downloaded a wu from Predictor. So I edited the others, (except LHC where I know there is no work), and I now have a S@H and an Einstein as well.

I wonder how that happened? It has not done that before and I've been running it since last summer.

My BOINCdv now looks like this...



... I had manually set S@H, P@H and Einstein to 1.0, so they all changed.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119786 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 119798 - Posted: 6 Jun 2005, 14:28:00 UTC

Ah, at last ... BOINC dv ...

Now, if that tool is generalized we can set all the Server side parameters ... on the client as many people wish ...
ID: 119798 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119813 - Posted: 6 Jun 2005, 15:15:36 UTC
Last modified: 6 Jun 2005, 15:17:49 UTC

Adrian, I think that "fix" you found is new to me. It seems to take some effort that should be unneccessary. What is the "connect to" preference under your "general preference" setting? Also eyeball over your other preferences to see if something has changed from what you would like. Is it possible your "connect to" setting has been set to "0" or something near it?

also, are you seeing ANY fault messages in the log?

and, before I make any other assumptions, have you tried restarting the puter?
ID: 119813 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 119816 - Posted: 6 Jun 2005, 15:23:05 UTC

Adrian, I looked at your puter. Check your setting for # of CPUs. I think it's set to 1 instead of 2 for HT use.
ID: 119816 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 14 Jul 99
Posts: 173
Credit: 1,698,756
RAC: 3
Denmark
Message 119834 - Posted: 6 Jun 2005, 16:13:26 UTC
Last modified: 6 Jun 2005, 16:23:00 UTC

The settings look to me to be set to "Use 2 CPU's". What I had done is gone into the BIOS and turned HT off when there was only the CPDN on the system. That way the wu gets the full 3.2GHz, rather then the lower performance you get when you are HT'ing but otherwise idle. It is certainly using both CPU's at the moment.

I have BIOS'd it back to HT again now I have wu's from the other projects. Even an LHC unit has arrived.

I did not see any errors in the messages. However, I have stopped and started several times so if there was anything hidden in the overnight session it will have gone now.

I think it was the fiddle of client-state.xml that fixed it. I was fiddling with all kinds of things though to try and get it going. It seemed to request work after I did that edit and restarted BOINC. I had rebooted and stopped/started BOINC - was the first thing I tried. I should have mentioned that.

"Connect to..." is set to 0.1 days. Usually I have it set to that. If I need more work for whatever reason, I change that to d/l wu's. I did that Friday, but it was working normally yesterday during the day. Whatever happened seemed to happen over night.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 119834 · Report as offensive
Profile tekwyzrd
Volunteer tester
Avatar

Send message
Joined: 21 Nov 01
Posts: 767
Credit: 30,009
RAC: 0
United States
Message 119965 - Posted: 6 Jun 2005, 22:57:09 UTC - in response to Message 119746.  

The current scheduler will get WUs from any project, but only after it's completely dry.



Not true (BOINC 4.43). I uploaded the results of my last einstein unit. I couldn't get any work until I zeroed out the long term debt.

Immediately after that it halted the running SETI units and started running two einstein units (dual P3).
ID: 119965 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Can't get work.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.