The 400 and 50 WU limits are way too small


log in

Advanced search

Message boards : Number crunching : The 400 and 50 WU limits are way too small

1 · 2 · 3 · 4 · Next
Author Message
Profile red-ray
Avatar
Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,024,991
RAC: 0
United Kingdom
Message 1197674 - Posted: 20 Feb 2012, 9:39:57 UTC
Last modified: 20 Feb 2012, 9:52:07 UTC

Yet again my GPUs are idle and my CPUs soon will be. 48 hours ago my cache was as full as it could be given the 400 and 50 WU limits, so I did some sums.
Recently my GTX 460 GPUs are munching lots WUs in about 2 minutes so 400 WUs can last as little as 14 Hours.
For my 980X things are not much better 30 minutes per WU seems quite common so that’s 25 hours.
The GPUs emptied the WU cache in 36 hours and I expect the CPUs do the same in 56 hours.
What do I need to do to get more WUs in my cache? Is this a server imposed limit that needs to be corrected?
I first ran the BOINC SETI@home 26 days ago and initially it worked well. The last 7 days however have changed this with the stupid 6.12 back offs and now these incorrect cache limits am wondering why I bothered. It was fun writing code to sort the 6.12 back offs so I hope can do the same for the cache limits.
Edit: even with 400 things are bad!
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2247
Credit: 8,597,427
RAC: 4,259
United States
Message 1197679 - Posted: 20 Feb 2012, 9:44:02 UTC

Yes it is a server-side limit. No, there is nothing you can do about it. Sorry. There's a fix that we have all been waiting for so the limits can be removed, but we've been waiting for several months now.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,951,506
RAC: 75,693
United Kingdom
Message 1197681 - Posted: 20 Feb 2012, 9:46:32 UTC - in response to Message 1197674.

The GPU limit is actually 400 - and the only host I have with a cache setting that tests that is currently showing 390 in progress. So some work is being made and getting out, even if not enough for everyone.

Yes, the limits are too low, and the temporary server bug which caused them to be introduced (back in September) is taking a very long time to be fixed. They knows what needs doing, and we keep reminding them that it's on their 'to do' list - but that's a very long list, and other crises keep intervening. Too much to do, too few people to do it.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6790
Credit: 93,090,960
RAC: 75,785
Australia
Message 1197683 - Posted: 20 Feb 2012, 9:48:01 UTC - in response to Message 1197674.

Actually the limits are 400 per GPU and 50 per CPU core but with the amount of errors that you have I'm not surprised that you can't obtain those levels.

Errors (except for time outs) lower the amount you can obtain.

Cheers.
____________

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,559,292
RAC: 410
Germany
Message 1197686 - Posted: 20 Feb 2012, 9:56:23 UTC - in response to Message 1197674.

What do I need to do to get more WUs in my cache?

Well, you can't get more than the limits, but it might help, if you stop aborting hundreds of WUs, as that' what most of your errors are.
____________
.

Blake Bonkofsky
Volunteer tester
Avatar
Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,332,781
RAC: 0
United States
Message 1197688 - Posted: 20 Feb 2012, 10:00:40 UTC - in response to Message 1197686.

Agree with those guys. You can't really complain about lack of work when you've aborted hundreds if not thousands of tasks.
____________

Profile red-ray
Avatar
Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,024,991
RAC: 0
United Kingdom
Message 1197689 - Posted: 20 Feb 2012, 10:01:08 UTC - in response to Message 1197683.
Last modified: 20 Feb 2012, 10:20:44 UTC

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!

Profile red-ray
Avatar
Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,024,991
RAC: 0
United Kingdom
Message 1197692 - Posted: 20 Feb 2012, 10:10:55 UTC - in response to Message 1197688.
Last modified: 20 Feb 2012, 10:27:33 UTC

Agree with those guys. You can't really complain about lack of work when you've aborted hundreds if not thousands of tasks.

Given my RAC has gone from 31,893 on 2012-02-17 to currently 38,013 I feel this comment is either ill considered or I guess you are trying to get me to do the same again in the hope you get the WUs I abort!

I wish all comments were as informative as the one by Richard Haselgrove.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6790
Credit: 93,090,960
RAC: 75,785
Australia
Message 1197697 - Posted: 20 Feb 2012, 10:29:23 UTC - in response to Message 1197692.

I think that you're being very short sighted with those comments but most of us have a few backup projects to fall back on when things here get lean.

But you are going to have to stop aborting work or you'll wind up with very little work at all when things come back after the regular Tuesday outage.

Peace out & Cheers.
____________

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197699 - Posted: 20 Feb 2012, 10:34:05 UTC - in response to Message 1197689.

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!


I'm sorry, are you saying you aborted work on your second system when the first system rean out of work? What's the reasoning behind that ?!

As others have noted small systems usually have no problem staying fed - they do eventually receive the few WU's they need.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8630
Credit: 23,728,713
RAC: 19,366
United Kingdom
Message 1197701 - Posted: 20 Feb 2012, 10:43:29 UTC

As Richard said this is a problem that I noticed and reported on the 8th August last year, so it is over 6 months now.

I might point out this is a BOINC problem and the limits have been introduced by Seti to protect their servers and our computers.

I might add that the original problem, on that computer, incorrect very high initial calculation of the APR for the AP application has still not corrected itself. The general trend is still downwards, but with large spikes when a 30 repeating pulses is found.

Profile red-ray
Avatar
Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,024,991
RAC: 0
United Kingdom
Message 1197703 - Posted: 20 Feb 2012, 10:48:54 UTC - in response to Message 1197699.
Last modified: 20 Feb 2012, 10:50:07 UTC

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!


I'm sorry, are you saying you aborted work on your second system when the first system rean out of work? What's the reasoning behind that ?!

As others have noted small systems usually have no problem staying fed - they do eventually receive the few WU's they need.


There is no reason, it's just what happened.

All the comments about aborted WUs being the issue are incorrect. With the current limits my 980X is limited to 12 x 50 + 4 x 400 = 2200 * 0.3 = 660MB. At 09:00 on 18-Feb-2012 I had 720MB in the cache, so it was 100% full. I started this thread in the hope of understanding why the limits were so low. Richard answered this well.

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1197706 - Posted: 20 Feb 2012, 10:54:58 UTC

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.
____________

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197708 - Posted: 20 Feb 2012, 10:58:27 UTC - in response to Message 1197686.

What do I need to do to get more WUs in my cache?

Well, you can't get more than the limits, but it might help, if you stop aborting hundreds of WUs, as that' what most of your errors are.


If those aborts are from the 12th of February they are not going to influence getting tasks NOW. Indeed, checking the applications details page show that he has plenty of quota on both machines.

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197711 - Posted: 20 Feb 2012, 11:08:47 UTC - in response to Message 1197706.

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.


No it's not user speculation.

Trust me, I wrote the email that told the project to impose limits, when I first realised the implications of David's rash stop-gap coding.

That was of course assuming that that coding would be reversed soon enough...

We do have confirmation from Eric that he is working on the issue.

And BTW even with limits the bandwidth is maxxed out (when it's actually working) - quota (vastly) reduces the cache on the hosts, but is does not significantly reduce throughput (at least when the servers are up).
Yes, with a bigger cache big rigs wouldn't run dry all the time, but we often seem to be crunching as fast as we can anyway...

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,951,506
RAC: 75,693
United Kingdom
Message 1197712 - Posted: 20 Feb 2012, 11:15:31 UTC - in response to Message 1197706.

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.

I can confirm that the original limits were actually introduced at user request, in response to the botched fix (by BOINC, not SETI) to the problem that WinterKnight spotted and reported. Believe it or not, there are many aspects of BOINC client processing which we experience and understand much better than the project staff - over the years, I've had that conversation privately with both Matt and Jeff behind the scenes. Conversely, they deal on a daily basis with server issues that we have only the vaguest understanding of.

Having said that, I suspect that the delay in reverting to the status quo ante may indeed be because they've seen that the project runs more smoothly with fewer dormant tasks cached. OK, so we've had a bit of a slowdown over the last couple of days - for an unrelated reason, which has been explained in the news threads - but apart from that the project has been sending out and receiving back as much work as it can possibly handle. From their point of view, there's no point in generating a whole lot more work, if the only result is to increase the average turnround time.

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1197721 - Posted: 20 Feb 2012, 12:07:23 UTC
Last modified: 20 Feb 2012, 12:22:03 UTC

me too i find stupid that my little PC which crunch 2 wu per 5-6 hours (2 core) and crunch a GPU every 15-18 mins has always getting his works all the time every hours.

While my big one can do 4x WUs every 30mins-1hr (4 cores) and a GPU every 1.4min to 8mins cant even get 1 WU in 23 hrs.

and me too it happended to release 100WU last week when people were crying to get work 3 days in a row while i was super fine (mainly from my PC which takes an eternity to process them). i ve done that to share with people, i shared the love.

we cant take WUs from 1 pc and send by internal network to another one in need.
not the way the system has been written.



Those like me, and so many people who have more than 1 PC. we have BIG ones and little ones. our little have ridiculous tons of WU and our BIG has 0.

the best, we need for BOINC 8.0.xx a server/client type of the application.

you install a server on the PC you want. it s the only prog allowed to communicate with the project. all WUs are "tag" to this computer. the server take care to asking/send all WU for all its clients. Its job is to get a good nice queue of WUs to process for all its children.

and then on each PC you have (the PC you ve installed the server included too), you install a Client. the client ask the server : gimme work (cpu or a gpu) for this or that core or gpu. the server gives him a copy, the client process it and send back the result to the server (and ask for another one)

that way, the server keep them all and give them 1 by 1 by 1 to the PC who need 1. you dont end with a little with 500 WU for 10 days while a big one with 0 to process 6-8 days in a row.


i dunno what would be the stress on the Project servers. but i m sure some of their project server would have less querries. instead to have 6 PC Client asking for work, you only have 1 Server BOINC asking every 5 mins.

thats my 2cents.

cause having all these BIG BIG Cruncher empty for days & weeks is just stupid. while you have a turtle on the other desk with 500 WUs
All your PCs have equal amount of work as they need.
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4081
Credit: 111,716,552
RAC: 147,303
United States
Message 1197722 - Posted: 20 Feb 2012, 12:20:09 UTC

The idea for a super host has been around for some time: http://boinc.berkeley.edu/trac/wiki/SuperHost.
Most of the time people want to do this for a machine to proxy all of their other hosts through.

I like the limits for the faster turn around time, but dislike them the few times my machines have run out of work. Then going over their their backup project. Which is always when I am trying to do some long term benchmarking to check out configuration changes.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1197727 - Posted: 20 Feb 2012, 12:45:32 UTC - in response to Message 1197722.
Last modified: 20 Feb 2012, 13:24:53 UTC

The idea for a super host has been around for some time: http://boinc.berkeley.edu/trac/wiki/SuperHost.
Most of the time people want to do this for a machine to proxy all of their other hosts through.

I like the limits for the faster turn around time, but dislike them the few times my machines have run out of work. Then going over their their backup project. Which is always when I am trying to do some long term benchmarking to check out configuration changes.


i am too new to have heard this superhost discussion. thx for the info.

take your situation , Hal, you have .... 32 PC ? (unless you didnt merged them ^^). all your 32 PCs are asking 32 times every 5 mins for work.
i dont need to type in here 32x times "Gimme work!". instead of that you would have only 1 asking work, and would get more constancy at each time you ask it.
i hope you dont need to go all the floor and check 32 PC in 32 different room ^^

imagine 1,000 persons like you => 32,000 pc asking for work instead of just 1,000 servers. thats stress.

Is it right when we ask for work, we are downloading them from a Server that has only a buffer of 1000WU only at the time ?
even if on the server page it s written 500,000 WU ready to sent, if that 1000WU buffer server is empty, you get : "sorry, the project has no work available!" ?
____________

Profile red-ray
Avatar
Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,024,991
RAC: 0
United Kingdom
Message 1197730 - Posted: 20 Feb 2012, 12:54:23 UTC - in response to Message 1197721.

I too would like to share WUs between all my PCs and would also like GPUs to automatically use appropriate CPU WUs when there are no GPU ones available.
I used to do this is with SETI@home classic and it worked very well for me. As I recall in those days there was no cache at all, so I wrote my own.

SETI@home classic workunits 270,147
SETI@home classic CPU time 1,329,970 hours

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : The 400 and 50 WU limits are way too small

Copyright © 2014 University of California