The 400 and 50 WU limits are way too small

Message boards : Number crunching : The 400 and 50 WU limits are way too small
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1198170 - Posted: 21 Feb 2012, 10:38:32 UTC

Actually, the AP creation rate is about what it has been for many months. Anywhere between 0.25-1.00 is "normal." It has spiked over 1.00 a few times, but not very often.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1198170 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1198192 - Posted: 21 Feb 2012, 11:57:19 UTC - in response to Message 1198170.  
Last modified: 21 Feb 2012, 12:51:52 UTC

How can a system reach the limit when it's just returned 3 tasks?

21/02/2012 11:47:52 | SETI@home | Reporting 3 completed tasks, requesting new tasks for CPU
21/02/2012 11:47:58 | SETI@home | Scheduler request completed: got 0 new tasks
21/02/2012 11:47:58 | SETI@home | No tasks sent
21/02/2012 11:47:58 | SETI@home | This computer has reached a limit on tasks in progress

I assume it's a race condition and the test is being done before the 3 just returned have been processed. The effect of this is to cause the CPU work fetch deferred to back off and starve the system of WUs. This is my slow (18K per day) system, so it's not a big issue. It just amused me.
ID: 1198192 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1198196 - Posted: 21 Feb 2012, 12:09:56 UTC - in response to Message 1198192.  

How can a system reach the limit when it's just returned 3 tasks?

21/02/2012 11:47:52 | SETI@home | Reporting 3 completed tasks, requesting new tasks for CPU
21/02/2012 11:47:58 | SETI@home | Scheduler request completed: got 0 new tasks
21/02/2012 11:47:58 | SETI@home | No tasks sent
21/02/2012 11:47:58 | SETI@home | This computer has reached a limit on tasks in progress



Good one :D

Maybe it reported GPU tasks - it was asking for CPU.
ID: 1198196 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1198198 - Posted: 21 Feb 2012, 12:11:56 UTC - in response to Message 1198192.  

How can a system reach the limit when it's just returned 3 tasks?


It's easy - scheduler just made re-count neсessary time to crunch received WUs and on some reasons this time is increased, for example if you're used BOINC 24\7 but now it was shutted down for a while.
ID: 1198198 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1198200 - Posted: 21 Feb 2012, 12:25:55 UTC - in response to Message 1198198.  

How can a system reach the limit when it's just returned 3 tasks?


It's easy - scheduler just made re-count neсessary time to crunch received WUs and on some reasons this time is increased, for example if you're used BOINC 24\7 but now it was shutted down for a while.

No, the limit is by quantity, not by time.
ID: 1198200 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1198216 - Posted: 21 Feb 2012, 13:07:56 UTC - in response to Message 1197937.  

"The 400 and 50 WU limits are way too small" for some computers, way too big for most. We need BOINC to provide a time-based limits option, restricting each host to about 2 days of work "in progress" would be appropriate for this project currently.
                                                                  Joe


Provided that the project servers - which is where such a limit would have to be applied - have an accurate estimate of the runtime of tasks cached. At the moment, the servers ignore DCF entirely when calculating their own idea of how long newly-allocated work will run during work allocation, and the well-rehearsed APR capping problem means that hosts are running some extreme DCFs, far from the 1.0000 taken for granted by CreditNew. (I've got a 0.0691)

But DCF is controlling the hosts requests for work. MB cpu tasks on my computers reset the DCF to 1.00 +/- 0.1 but the MB GPU or AP tsks drive it low.

If I load up the cpu with AP work then the GPU tsks drive the DCF down to below 0.2 regulary. Of course when the AP tasks complete and a MB task completes then the computer enters panic mode.
ID: 1198216 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1198224 - Posted: 21 Feb 2012, 13:30:16 UTC - in response to Message 1198216.  

But DCF is controlling the hosts requests for work.

It may be controlling the request, but it isn't controlling the reply any more.

With my 0.0691, it's a case of "request 15 hours work, allocated 1 hour" - even if there's an infinite amount of work ready to send in the feeder queue.
ID: 1198224 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1198226 - Posted: 21 Feb 2012, 13:39:37 UTC - in response to Message 1198224.  

But DCF is controlling the hosts requests for work.

It may be controlling the request, but it isn't controlling the reply any more.

With my 0.0691, it's a case of "request 15 hours work, allocated 1 hour" - even if there's an infinite amount of work ready to send in the feeder queue.

Then how is it I have to drive the DCF down to get even 1 day of work for the GPU.

At one stage I had about 8 days of work for the cpu thanks to ~20 ap's and only about 50 tasks total for the gpu.
ID: 1198226 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1198229 - Posted: 21 Feb 2012, 13:50:17 UTC - in response to Message 1198226.  
Last modified: 21 Feb 2012, 13:51:31 UTC

But DCF is controlling the hosts requests for work.

It may be controlling the request, but it isn't controlling the reply any more.

With my 0.0691, it's a case of "request 15 hours work, allocated 1 hour" - even if there's an infinite amount of work ready to send in the feeder queue.

Then how is it I have to drive the DCF down to get even 1 day of work for the GPU.

At one stage I had about 8 days of work for the cpu thanks to ~20 ap's and only about 50 tasks total for the gpu.

I can't answer that, but try enabling <sched_op_debug> logging so you get the 'estimated total task duration' after a work allocation. If DCF is low, the (client estimate of the) work allocation will be low in the same proportion, at best.

Edit - and if DCF is high, you probably won't have a resouce shortfall to trigger the work request in the first place.
ID: 1198229 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1198237 - Posted: 21 Feb 2012, 14:13:39 UTC - in response to Message 1198229.  

Edit - and if DCF is high, you probably won't have a resouce shortfall to trigger the work request in the first place.


Thats the problem, with only a project DCF, the GPU tasks are estimated, when DCF=1, at 4 or 5 times actual.
ID: 1198237 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1198244 - Posted: 21 Feb 2012, 14:53:56 UTC - in response to Message 1198237.  
Last modified: 21 Feb 2012, 14:54:49 UTC

I keep wondering why the Average turnaround time can't be used to decide how many WUs you are allowed. Currently for my 980X it is 0.38 days and it has 731 pending so it would need to be allowed about 6000 rather than 2200 so it could last 3 days. If each host had a quota that got adjusted to meet a target turn-a-round time all the hosts would get a fair share of WUs and overall there would be far less results in progress. Ideally you would have separate GPU and CPU quotas and turn-a-round times.
ID: 1198244 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1198302 - Posted: 21 Feb 2012, 22:22:59 UTC - in response to Message 1198244.  

I keep wondering why the Average turnaround time can't be used to decide how many WUs you are allowed. Currently for my 980X it is 0.38 days and it has 731 pending so it would need to be allowed about 6000 rather than 2200 so it could last 3 days. If each host had a quota that got adjusted to meet a target turn-a-round time all the hosts would get a fair share of WUs and overall there would be far less results in progress. Ideally you would have separate GPU and CPU quotas and turn-a-round times.

It might be possible, or they could already be doing it, to include that into the DCF calculations. Not that DCF isn't already messed up enough as it is.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1198302 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1198414 - Posted: 22 Feb 2012, 5:54:08 UTC - in response to Message 1198302.  
Last modified: 22 Feb 2012, 5:58:10 UTC

I keep wondering why the Average turnaround time can't be used to decide how many WUs you are allowed. Currently for my 980X it is 0.38 days and it has 731 pending so it would need to be allowed about 6000 rather than 2200 so it could last 3 days. If each host had a quota that got adjusted to meet a target turn-a-round time all the hosts would get a fair share of WUs and overall there would be far less results in progress. Ideally you would have separate GPU and CPU quotas and turn-a-round times.

It might be possible, or they could already be doing it, to include that into the DCF calculations. Not that DCF isn't already messed up enough as it is.

Obviously it doesn't work for new or re-attached computers but why not work out downloads on RAC.
By definition that says everything about what the computer can crunch in an average day.

edit] Silly me, I just remembered why it cannot happen, although Eric K did a pretty good job at it, the BOINC people cannot work out what the credits/time are.
ID: 1198414 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1198417 - Posted: 22 Feb 2012, 5:59:28 UTC - in response to Message 1198137.  
Last modified: 22 Feb 2012, 5:59:45 UTC


Yeah, I wish we could get things sorted and get back to Boinc calling the shots.
So depressing when I can't have the rigs weather a day or two outage.


Just get them to send you a thumb-drive full of WUs like I did.
ID: 1198417 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1198448 - Posted: 22 Feb 2012, 9:35:46 UTC - in response to Message 1198414.  

Obviously it doesn't work for new or re-attached computers but why not work out downloads on RAC.
By definition that says everything about what the computer can crunch in an average day.


There would be a minimum quota to deal with new and returning systems.

No using the RAC is not ideal. You could have an high RAC and be turning WUs round in 10 days. The regime should favour systems that return WUs before the turn-a-round target time to try and keep the number of WUs in progess to a minimum.
ID: 1198448 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1198455 - Posted: 22 Feb 2012, 10:11:00 UTC - in response to Message 1198448.  

Obviously it doesn't work for new or re-attached computers but why not work out downloads on RAC.
By definition that says everything about what the computer can crunch in an average day.


There would be a minimum quota to deal with new and returning systems.

No using the RAC is not ideal. You could have an high RAC and be turning WUs round in 10 days. The regime should favour systems that return WUs before the turn-a-round target time to try and keep the number of WUs in progess to a minimum.

But they could still use the RAC * "turn-a-round target" where RAC would be translated into flops/time in the formula to calculate max work downloaded.
ID: 1198455 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1200511 - Posted: 28 Feb 2012, 1:35:40 UTC

I just got:

28/02/2012 01:27:34 | SETI@home | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU
28/02/2012 01:27:46 | SETI@home | Scheduler request completed: got 0 new tasks
28/02/2012 01:27:46 | SETI@home | No tasks sent
28/02/2012 01:27:46 | SETI@home | No tasks are available for Astropulse v5
28/02/2012 01:27:46 | SETI@home | No tasks are available for Astropulse v505
28/02/2012 01:27:46 | SETI@home | This computer has reached a limit on tasks in progress

But only have 1,437 WUs. Given I have 4 GPUs I would be allowed 1,600 for the GPUs alone. Is there yet another limit?
ID: 1200511 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1200512 - Posted: 28 Feb 2012, 1:41:53 UTC - in response to Message 1200511.  

I just got:

28/02/2012 01:27:34 | SETI@home | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU
28/02/2012 01:27:46 | SETI@home | Scheduler request completed: got 0 new tasks
28/02/2012 01:27:46 | SETI@home | No tasks sent
28/02/2012 01:27:46 | SETI@home | No tasks are available for Astropulse v5
28/02/2012 01:27:46 | SETI@home | No tasks are available for Astropulse v505
28/02/2012 01:27:46 | SETI@home | This computer has reached a limit on tasks in progress

But only have 1,437 WUs. Given I have 4 GPUs I would be allowed 1,600 for the GPUs alone. Is there yet another limit?

Probably ghost tasks but that can't be checked right now with the results disabled.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1200512 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1200561 - Posted: 28 Feb 2012, 8:42:12 UTC - in response to Message 1200512.  
Last modified: 28 Feb 2012, 8:42:34 UTC

OK. Is there a way to expunge any ghost tasks that do exist pelase?
ID: 1200561 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1200566 - Posted: 28 Feb 2012, 9:10:17 UTC - in response to Message 1198161.  

I hope Eric can get that assimilator sorted out for when the new download server arrives - after all, we'll want massive download congestion so we can see what it can do under real load (and give them a chance to fine tune the software).

Well at least the number to be assimilated has dropped but I wonder why the AP creation rate is so low.

Cheers.


I think you should do the honourable thing ... and move a bit further away ... :)


ID: 1200566 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : The 400 and 50 WU limits are way too small


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.