More Bandwidth = Raised (or Dropped) Limits?

Message boards : Number crunching : More Bandwidth = Raised (or Dropped) Limits?
Message board moderation

To post messages, you must log in.

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1354648 - Posted: 8 Apr 2013, 13:43:20 UTC

Is this in the near future?

Now that the bandwidth problem seems to have gone away (good job on a nice smooth transition - all thanks to the guys in Berkeley!!!) will they think about going back to (at least) the old per cpu core and per GPU limits, rather than the current 100/200 per machine limits?

Pretty please!

Let us pray!!!
ID: 1354648 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1354655 - Posted: 8 Apr 2013, 13:50:11 UTC - in response to Message 1354648.  

I understood that the limits were imposed due to database problems nothing to do with bandwidth.
ID: 1354655 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1354656 - Posted: 8 Apr 2013, 13:51:45 UTC - in response to Message 1354648.  

Is this in the near future?

Now that the bandwidth problem seems to have gone away (good job on a nice smooth transition - all thanks to the guys in Berkeley!!!) will they think about going back to (at least) the old per cpu core and per GPU limits, rather than the current 100/200 per machine limits?

Pretty please!

Let us pray!!!

Unfortunately, the reason for the current limits is that the science data base program has reached the limits of its capacity to handle transactions, not network bandwidth.

Due to this I can't see any raising or removing of the limits in the near future.

T.A.
ID: 1354656 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1354665 - Posted: 8 Apr 2013, 14:29:34 UTC - in response to Message 1354655.  

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.
ID: 1354665 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1354672 - Posted: 8 Apr 2013, 14:50:54 UTC - in response to Message 1354665.  
Last modified: 8 Apr 2013, 14:51:58 UTC

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.


That's a good point, and one I hadn't thought of.

But maybe, if there is a planned outage again, they could allow us to load up in such circumstance? To bridge the anticipated gap, I mean.
ID: 1354672 · Report as offensive
Profile Dr Grey

Send message
Joined: 27 May 99
Posts: 154
Credit: 104,147,344
RAC: 21
United Kingdom
Message 1354718 - Posted: 8 Apr 2013, 16:41:47 UTC

Now that the average turnaround time is under 36 hours would it make sense to shorten the deadline to return the workunits? What is it now, 8 weeks? That means some of those workunits are sitting on the database for many months just politely waiting for a quorum. Eg. is this one ever coming in?

http://setiathome.berkeley.edu/workunit.php?wuid=1168264448

Would halving the deadline mean that the client cache size could be doubled for little affect on the size of the database? Does it work that way?
ID: 1354718 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1354727 - Posted: 8 Apr 2013, 17:13:11 UTC - in response to Message 1354665.  

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.


Hi Richard,

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It's also "smarter" since there are machines out there that will not do 100wu in two days, so they have too much cache while some of us have not-enough. That's probably why the choice in BOINC was made for "days" instead of "units" in the first place.

And halving the number of CPU units allowed to 50 (or even lower), then giving us at least a 150wu (or more) cache for GPUs could be done without changing the number of database entries. (since a GPU does work 10-20x faster than a CPU, it would make sense to compensate for that a little)

Why would 100wu not be enough for GPUs?

Because there is still the matter of "no work available" messages and the new, longer delays between requests.

Some of us could use a little more buffer against running out of work.

I'm not complaining; I'm just observing. Since the move, a couple of my machines have run dry (and some more than once) due to "no work available" messages. No big deal, but I'm guessing that some less ham-handed limits could eliminate that problem.

What I don't know is if there is something else, like a set proportion of CPU to GPU work allowed that's been programmed into the scheduler. I admit that it's likely I'm showing my ignorance.
ID: 1354727 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1354737 - Posted: 8 Apr 2013, 17:57:52 UTC - in response to Message 1354727.  

In my opinion, the best way to reduce the size of the database is for the scheduler to match fast hosts with other fast hosts.
ID: 1354737 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1354746 - Posted: 8 Apr 2013, 18:15:51 UTC - in response to Message 1354727.  

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

That's been my thoughts pretty much since the server side limits came into place. That way the faster crunchers will be able to remain busy during even the usual weekly outage, and the size of the database will remain small.

Grant
Darwin NT
ID: 1354746 · Report as offensive
Tom*

Send message
Joined: 12 Aug 11
Posts: 127
Credit: 20,769,223
RAC: 9
United States
Message 1354756 - Posted: 8 Apr 2013, 18:28:45 UTC - in response to Message 1354737.  
Last modified: 8 Apr 2013, 18:55:47 UTC

BUMP +3

No need to try to match at the system level just have the feeder feed two ques a fast queue and a slow queue
ID: 1354756 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1354788 - Posted: 8 Apr 2013, 20:18:59 UTC

I think the larger work units, once introduced, will be the solution for longer caches. Which will probably also reduce the db load to a point that a limit higher than 100 can be easily chosen.

Hopefully once they feel that everything is happy in its new home they can get back to work on that, or maybe some of the other issues they have not had the time to work on.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1354788 · Report as offensive
Profile Alaun

Send message
Joined: 29 Nov 05
Posts: 18
Credit: 9,310,773
RAC: 0
United States
Message 1354814 - Posted: 8 Apr 2013, 21:25:36 UTC

+1 on the larger workunits. That's the way to get more work done per database entry. Fewer files to handle, fewer server requests, also fewer load/unload breaks on GPU's. Hope a new workunit would load down a GPU for many hours.
ID: 1354814 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1354984 - Posted: 9 Apr 2013, 11:54:22 UTC - in response to Message 1354727.  


The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1354984 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1355040 - Posted: 9 Apr 2013, 15:18:01 UTC - in response to Message 1354984.  


The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.


That does complicate things. I was thinking that each project could customize the allowed values for that field so that BOINC behaved with projects with differing needs.

BOINC is estimating how much work a computer can do in any given science task to know how much work "10 days" of work for that project is (in work units). There must be some way of "tricking" it, or setting it, so it calculates SETI tasks taking five times as long as they actually do.

Either that, or set the servers to only serve 20% of what BOINC calls-for, but then I guess BOINC would keep calling...

I suppose I really have no idea how to accomplish setting a two day limit, so I should shut-up.

So, the next-best ham-handed way of doing things would be to limit CPU work units per machine to 25 or 50 (whatever approximates a day's work) and raise the GPU limits by 50 or 75.

... think, think, think, bret....

Ok, ten days of work was too much, too big of a database... check.

We can't lower the number of days... check.

Ten days of work was thousands of tasks... check.

We're currently limited to 100 tasks ... check.

Raising the limits to 250 tasks should result in a database much smaller than the thousands of tasks that were formerly cached... check.

Conclusion: lower the hard number of CPU tasks and raise the freaking limits by some number of GPU tasks and see what happens. Slower machines will still be limited to 10 days and faster machines might keep work; if it doesn't behave, lower them again ...check.

ID: 1355040 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1355062 - Posted: 9 Apr 2013, 15:59:13 UTC - in response to Message 1355040.  


The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.


That does complicate things. I was thinking that each project could customize the allowed values for that field so that BOINC behaved with projects with differing needs.

BOINC is estimating how much work a computer can do in any given science task to know how much work "10 days" of work for that project is (in work units). There must be some way of "tricking" it, or setting it, so it calculates SETI tasks taking five times as long as they actually do.

Either that, or set the servers to only serve 20% of what BOINC calls-for, but then I guess BOINC would keep calling...

I suppose I really have no idea how to accomplish setting a two day limit, so I should shut-up.

So, the next-best ham-handed way of doing things would be to limit CPU work units per machine to 25 or 50 (whatever approximates a day's work) and raise the GPU limits by 50 or 75.

... think, think, think, bret....

Ok, ten days of work was too much, too big of a database... check.

We can't lower the number of days... check.

Ten days of work was thousands of tasks... check.

We're currently limited to 100 tasks ... check.

Raising the limits to 250 tasks should result in a database much smaller than the thousands of tasks that were formerly cached... check.

Conclusion: lower the hard number of CPU tasks and raise the freaking limits by some number of GPU tasks and see what happens. Slower machines will still be limited to 10 days and faster machines might keep work; if it doesn't behave, lower them again ...check.


Why would you ask for my 24 core box to go from about a 7 hour cache with 100 tasks to even less? That is just plain rude sir! :P

If they were to adjust the limits again. I would like something along the lines of how they were doing it last time. Which was per CPU/GPU core/processor. IIRC they were using 50 for CPU & 100 for GPU. I imagine they didn't use those values again as they knew it wouldn't do the job.

As they seem to have control over CPU & GPU limits independently. They might considering bumping the GPU limit +50 or so to see how the db takes it. If not they would need to back it down again.

Ideally having the controls on the back end to consistently keep the db under control is the best answer. Larger jobs is another way. Which is probably easier to accomplish.

I think PrimeGrid also uses a limit of 100 in progress tasks. However some of their tasks can run 30+ hours on mt machines that do SETI@Home work in ~2 hours.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1355062 · Report as offensive

Message boards : Number crunching : More Bandwidth = Raised (or Dropped) Limits?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.