The 400 and 50 WU limits are way too small

Message boards : Number crunching : The 400 and 50 WU limits are way too small
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1197674 - Posted: 20 Feb 2012, 9:39:57 UTC
Last modified: 20 Feb 2012, 9:52:07 UTC

Yet again my GPUs are idle and my CPUs soon will be. 48 hours ago my cache was as full as it could be given the 400 and 50 WU limits, so I did some sums.
Recently my GTX 460 GPUs are munching lots WUs in about 2 minutes so 400 WUs can last as little as 14 Hours.
For my 980X things are not much better 30 minutes per WU seems quite common so that’s 25 hours.
The GPUs emptied the WU cache in 36 hours and I expect the CPUs do the same in 56 hours.
What do I need to do to get more WUs in my cache? Is this a server imposed limit that needs to be corrected?
I first ran the BOINC SETI@home 26 days ago and initially it worked well. The last 7 days however have changed this with the stupid 6.12 back offs and now these incorrect cache limits am wondering why I bothered. It was fun writing code to sort the 6.12 back offs so I hope can do the same for the cache limits.
Edit: even with 400 things are bad!
ID: 1197674 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1197679 - Posted: 20 Feb 2012, 9:44:02 UTC

Yes it is a server-side limit. No, there is nothing you can do about it. Sorry. There's a fix that we have all been waiting for so the limits can be removed, but we've been waiting for several months now.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1197679 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1197681 - Posted: 20 Feb 2012, 9:46:32 UTC - in response to Message 1197674.  

The GPU limit is actually 400 - and the only host I have with a cache setting that tests that is currently showing 390 in progress. So some work is being made and getting out, even if not enough for everyone.

Yes, the limits are too low, and the temporary server bug which caused them to be introduced (back in September) is taking a very long time to be fixed. They knows what needs doing, and we keep reminding them that it's on their 'to do' list - but that's a very long list, and other crises keep intervening. Too much to do, too few people to do it.
ID: 1197681 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1197683 - Posted: 20 Feb 2012, 9:48:01 UTC - in response to Message 1197674.  

Actually the limits are 400 per GPU and 50 per CPU core but with the amount of errors that you have I'm not surprised that you can't obtain those levels.

Errors (except for time outs) lower the amount you can obtain.

Cheers.
ID: 1197683 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1197686 - Posted: 20 Feb 2012, 9:56:23 UTC - in response to Message 1197674.  

What do I need to do to get more WUs in my cache?

Well, you can't get more than the limits, but it might help, if you stop aborting hundreds of WUs, as that' what most of your errors are.
ID: 1197686 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1197688 - Posted: 20 Feb 2012, 10:00:40 UTC - in response to Message 1197686.  

Agree with those guys. You can't really complain about lack of work when you've aborted hundreds if not thousands of tasks.
ID: 1197688 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1197689 - Posted: 20 Feb 2012, 10:01:08 UTC - in response to Message 1197683.  
Last modified: 20 Feb 2012, 10:20:44 UTC

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!
ID: 1197689 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1197692 - Posted: 20 Feb 2012, 10:10:55 UTC - in response to Message 1197688.  
Last modified: 20 Feb 2012, 10:27:33 UTC

Agree with those guys. You can't really complain about lack of work when you've aborted hundreds if not thousands of tasks.

Given my RAC has gone from 31,893 on 2012-02-17 to currently 38,013 I feel this comment is either ill considered or I guess you are trying to get me to do the same again in the hope you get the WUs I abort!

I wish all comments were as informative as the one by Richard Haselgrove.
ID: 1197692 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1197697 - Posted: 20 Feb 2012, 10:29:23 UTC - in response to Message 1197692.  

I think that you're being very short sighted with those comments but most of us have a few backup projects to fall back on when things here get lean.

But you are going to have to stop aborting work or you'll wind up with very little work at all when things come back after the regular Tuesday outage.

Peace out & Cheers.
ID: 1197697 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197699 - Posted: 20 Feb 2012, 10:34:05 UTC - in response to Message 1197689.  

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!


I'm sorry, are you saying you aborted work on your second system when the first system rean out of work? What's the reasoning behind that ?!

As others have noted small systems usually have no problem staying fed - they do eventually receive the few WU's they need.
ID: 1197699 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1197701 - Posted: 20 Feb 2012, 10:43:29 UTC

As Richard said this is a problem that I noticed and reported on the 8th August last year, so it is over 6 months now.

I might point out this is a BOINC problem and the limits have been introduced by Seti to protect their servers and our computers.

I might add that the original problem, on that computer, incorrect very high initial calculation of the APR for the AP application has still not corrected itself. The general trend is still downwards, but with large spikes when a 30 repeating pulses is found.
ID: 1197701 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1197703 - Posted: 20 Feb 2012, 10:48:54 UTC - in response to Message 1197699.  
Last modified: 20 Feb 2012, 10:50:07 UTC

If you look at the errors they are almost all WU aborted for my QX6700 system. When my 980X first ran out 5 days ago I aborted all outstanding work and stopped running SETI@home. The crazy thing is now the QX6700 get's more WUs than the 980X, it even got 30 GPU WUs this morning!


I'm sorry, are you saying you aborted work on your second system when the first system rean out of work? What's the reasoning behind that ?!

As others have noted small systems usually have no problem staying fed - they do eventually receive the few WU's they need.


There is no reason, it's just what happened.

All the comments about aborted WUs being the issue are incorrect. With the current limits my 980X is limited to 12 x 50 + 4 x 400 = 2200 * 0.3 = 660MB. At 09:00 on 18-Feb-2012 I had 720MB in the cache, so it was 100% full. I started this thread in the hope of understanding why the limits were so low. Richard answered this well.
ID: 1197703 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1197706 - Posted: 20 Feb 2012, 10:54:58 UTC

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.
ID: 1197706 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197708 - Posted: 20 Feb 2012, 10:58:27 UTC - in response to Message 1197686.  

What do I need to do to get more WUs in my cache?

Well, you can't get more than the limits, but it might help, if you stop aborting hundreds of WUs, as that' what most of your errors are.


If those aborts are from the 12th of February they are not going to influence getting tasks NOW. Indeed, checking the applications details page show that he has plenty of quota on both machines.
ID: 1197708 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1197711 - Posted: 20 Feb 2012, 11:08:47 UTC - in response to Message 1197706.  

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.


No it's not user speculation.

Trust me, I wrote the email that told the project to impose limits, when I first realised the implications of David's rash stop-gap coding.

That was of course assuming that that coding would be reversed soon enough...

We do have confirmation from Eric that he is working on the issue.

And BTW even with limits the bandwidth is maxxed out (when it's actually working) - quota (vastly) reduces the cache on the hosts, but is does not significantly reduce throughput (at least when the servers are up).
Yes, with a bigger cache big rigs wouldn't run dry all the time, but we often seem to be crunching as fast as we can anyway...
ID: 1197711 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1197712 - Posted: 20 Feb 2012, 11:15:31 UTC - in response to Message 1197706.  

Actually, project admins never ever said why there are suddenly limits. The theory to "protect your computer from over-fetch" is user speculation to which I personally don't subscribe to. I'm convinced that this is permanent and to protect S@H servers from sillies with 4+ GPUs and 10-day caches which then needed to fetch literally 10.000-20.000 (2 minute) workunints and that caused huge strain on the servers at every scheduler request.

I can confirm that the original limits were actually introduced at user request, in response to the botched fix (by BOINC, not SETI) to the problem that WinterKnight spotted and reported. Believe it or not, there are many aspects of BOINC client processing which we experience and understand much better than the project staff - over the years, I've had that conversation privately with both Matt and Jeff behind the scenes. Conversely, they deal on a daily basis with server issues that we have only the vaguest understanding of.

Having said that, I suspect that the delay in reverting to the status quo ante may indeed be because they've seen that the project runs more smoothly with fewer dormant tasks cached. OK, so we've had a bit of a slowdown over the last couple of days - for an unrelated reason, which has been explained in the news threads - but apart from that the project has been sending out and receiving back as much work as it can possibly handle. From their point of view, there's no point in generating a whole lot more work, if the only result is to increase the average turnround time.
ID: 1197712 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1197722 - Posted: 20 Feb 2012, 12:20:09 UTC

The idea for a super host has been around for some time: http://boinc.berkeley.edu/trac/wiki/SuperHost.
Most of the time people want to do this for a machine to proxy all of their other hosts through.

I like the limits for the faster turn around time, but dislike them the few times my machines have run out of work. Then going over their their backup project. Which is always when I am trying to do some long term benchmarking to check out configuration changes.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1197722 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1197730 - Posted: 20 Feb 2012, 12:54:23 UTC - in response to Message 1197721.  

I too would like to share WUs between all my PCs and would also like GPUs to automatically use appropriate CPU WUs when there are no GPU ones available.
I used to do this is with SETI@home classic and it worked very well for me. As I recall in those days there was no cache at all, so I wrote my own.

SETI@home classic workunits 270,147
SETI@home classic CPU time 1,329,970 hours
ID: 1197730 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1197731 - Posted: 20 Feb 2012, 12:56:35 UTC - in response to Message 1197711.  
Last modified: 20 Feb 2012, 13:02:09 UTC

Trust me, I wrote the email that told the project to impose limits, when I first realised the implications of David's rash stop-gap coding.

I can confirm that the original limits were actually introduced at user request
...

I stand corrected then, thanks for the explanation.
It's just that when the limits were imposed, the bandwidth/network contention situation improved a bit so I assumed they like it that way.

I just wish the GPU limit was a little higher than 400, though... Shorties are a killer. :(
ID: 1197731 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1197733 - Posted: 20 Feb 2012, 13:15:03 UTC - in response to Message 1197731.  

Trust me, I wrote the email that told the project to impose limits, when I first realised the implications of David's rash stop-gap coding.

I can confirm that the original limits were actually introduced at user request
...

I stand corrected then, thanks for the explanation.
It's just that when the limits were imposed, the bandwidth/network contention situation improved a bit so I assumed they like it that way.

I just wish the GPU limit was a little higher than 400, though... Shorties are a killer. :(

I think the biggest improvement, from their point of view, is Results out in the field 2,688,333. Before the limits, it could be three times that number: that saves something on the order of a terabyte of temporary online storage, for MB WU data files alone, let alone the reduction in the size of the BOINC database. Haven't we been noticing that the standard weekly backup/compaction outages have been completed quicker recently?
ID: 1197733 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : The 400 and 50 WU limits are way too small


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.