Message boards :
Number crunching :
More Bandwidth = Raised (or Dropped) Limits?
Message board moderation
Author | Message |
---|---|
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Is this in the near future? Now that the bandwidth problem seems to have gone away (good job on a nice smooth transition - all thanks to the guys in Berkeley!!!) will they think about going back to (at least) the old per cpu core and per GPU limits, rather than the current 100/200 per machine limits? Pretty please! Let us pray!!! |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
I understood that the limits were imposed due to database problems nothing to do with bandwidth. |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Is this in the near future? Unfortunately, the reason for the current limits is that the science data base program has reached the limits of its capacity to handle transactions, not network bandwidth. Due to this I can't see any raising or removing of the limits in the near future. T.A. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I understood that the limits were imposed due to database problems nothing to do with bandwidth. I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I understood that the limits were imposed due to database problems nothing to do with bandwidth. That's a good point, and one I hadn't thought of. But maybe, if there is a planned outage again, they could allow us to load up in such circumstance? To bridge the anticipated gap, I mean. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
Now that the average turnaround time is under 36 hours would it make sense to shorten the deadline to return the workunits? What is it now, 8 weeks? That means some of those workunits are sitting on the database for many months just politely waiting for a quorum. Eg. is this one ever coming in? http://setiathome.berkeley.edu/workunit.php?wuid=1168264448 Would halving the deadline mean that the client cache size could be doubled for little affect on the size of the database? Does it work that way? |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I understood that the limits were imposed due to database problems nothing to do with bandwidth. Hi Richard, The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries. It's also "smarter" since there are machines out there that will not do 100wu in two days, so they have too much cache while some of us have not-enough. That's probably why the choice in BOINC was made for "days" instead of "units" in the first place. And halving the number of CPU units allowed to 50 (or even lower), then giving us at least a 150wu (or more) cache for GPUs could be done without changing the number of database entries. (since a GPU does work 10-20x faster than a CPU, it would make sense to compensate for that a little) Why would 100wu not be enough for GPUs? Because there is still the matter of "no work available" messages and the new, longer delays between requests. Some of us could use a little more buffer against running out of work. I'm not complaining; I'm just observing. Since the move, a couple of my machines have run dry (and some more than once) due to "no work available" messages. No big deal, but I'm guessing that some less ham-handed limits could eliminate that problem. What I don't know is if there is something else, like a set proportion of CPU to GPU work allowed that's been programmed into the scheduler. I admit that it's likely I'm showing my ignorance. |
Sakletare Send message Joined: 18 May 99 Posts: 132 Credit: 23,423,829 RAC: 0 |
In my opinion, the best way to reduce the size of the database is for the scheduler to match fast hosts with other fast hosts. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries. That's been my thoughts pretty much since the server side limits came into place. That way the faster crunchers will be able to remain busy during even the usual weekly outage, and the size of the database will remain small. Grant Darwin NT |
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
BUMP +3 No need to try to match at the system level just have the feeder feed two ques a fast queue and a slow queue |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I think the larger work units, once introduced, will be the solution for longer caches. Which will probably also reduce the db load to a point that a limit higher than 100 can be easily chosen. Hopefully once they feel that everything is happy in its new home they can get back to work on that, or maybe some of the other issues they have not had the time to work on. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Alaun Send message Joined: 29 Nov 05 Posts: 18 Credit: 9,310,773 RAC: 0 |
+1 on the larger workunits. That's the way to get more work done per database entry. Fewer files to handle, fewer server requests, also fewer load/unload breaks on GPU's. Hope a new workunit would load down a GPU for many hours. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
It doesn't work that way. The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'. A person who won't read has no advantage over one who can't read. (Mark Twain) |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
That does complicate things. I was thinking that each project could customize the allowed values for that field so that BOINC behaved with projects with differing needs. BOINC is estimating how much work a computer can do in any given science task to know how much work "10 days" of work for that project is (in work units). There must be some way of "tricking" it, or setting it, so it calculates SETI tasks taking five times as long as they actually do. Either that, or set the servers to only serve 20% of what BOINC calls-for, but then I guess BOINC would keep calling... I suppose I really have no idea how to accomplish setting a two day limit, so I should shut-up. So, the next-best ham-handed way of doing things would be to limit CPU work units per machine to 25 or 50 (whatever approximates a day's work) and raise the GPU limits by 50 or 75. ... think, think, think, bret.... Ok, ten days of work was too much, too big of a database... check. We can't lower the number of days... check. Ten days of work was thousands of tasks... check. We're currently limited to 100 tasks ... check. Raising the limits to 250 tasks should result in a database much smaller than the thousands of tasks that were formerly cached... check. Conclusion: lower the hard number of CPU tasks and raise the freaking limits by some number of GPU tasks and see what happens. Slower machines will still be limited to 10 days and faster machines might keep work; if it doesn't behave, lower them again ...check. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Why would you ask for my 24 core box to go from about a 7 hour cache with 100 tasks to even less? That is just plain rude sir! :P If they were to adjust the limits again. I would like something along the lines of how they were doing it last time. Which was per CPU/GPU core/processor. IIRC they were using 50 for CPU & 100 for GPU. I imagine they didn't use those values again as they knew it wouldn't do the job. As they seem to have control over CPU & GPU limits independently. They might considering bumping the GPU limit +50 or so to see how the db takes it. If not they would need to back it down again. Ideally having the controls on the back end to consistently keep the db under control is the best answer. Larger jobs is another way. Which is probably easier to accomplish. I think PrimeGrid also uses a limit of 100 in progress tasks. However some of their tasks can run 30+ hours on mt machines that do SETI@Home work in ~2 hours. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.