Message boards :
Number crunching :
queuing question
Message board moderation
Author | Message |
---|---|
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I wonder about the efficacy of allowing clients to stash in their queues so many tasks at any time, up to a week I recall. Consider the following: * The effect of long queues must be to increase the loading on the servers, which right now hold about 5M place holders in the various database tables corresponding to the number of results in the field. That is a lot of unproductive overhead, when perhaps a considerably smaller number could be maintained and benefit the seti project. * Further and similarly degrading productivity is the currently 4M results waiting for validation. Shorter client queues would reduce this number as well it would seem. * Note that actual run-times for the V8's are ranging roughly from say 2h to 15m, depending on the client and estimated from my old Core2Quad rig and what the GPU endowed people report. With the current result turn around time being 30h, this means that the average task sits on the average client doing nothing from 93% to 99% of the time. * As a rule Seti appears to keep up to six hours worth of results ready to send. If the clients caches were reduced, then this number would have to increase by an uncertain amount. I realize that people don't want to run out of datasets to work on. Fine. But keeping those people happy seems to be at the risk of maintaining a dynamic and efficient backend at SETI. And in the end, it is meaningless. SETI is up "most" of the time and so the risk of running out of tasks is smaller than feared or observed, so far as I can ascertain. And, most real number crunchers seem to have backup projects that Boinc transfers to automatically. Comments? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I Run Quad GPUs. Turn around time is 0.2 days. That means roughly every 5 hours I return 400 work units. Nothing ever sits more than half a day (and only because seti goes goes down for maintenance) If you restrict the amount of data allowed to go out, then you are going to be placing a HUGE amount of demand on the servers as all the machines looking for data slam them to get work. It's like what happens right after we come back from Maintenance. For the first few hours, the servers struggle to maintain flow. Once everyone gets their cache filled, the demand drops and the server are more responsive. Remove the ability to fill people's cache and the servers will be tied up 24/7 . Validation times also will go UP not down. With fewer work units going out, fewer are coming back. Instead of weeks for validation, seti could be looking at months. Run times vary based on machines. My times run 8 minutes. Older machine might be 2 hours or more. Depends on equipment. The answer isn't reducing capacity. Unfortunately time, money, man power aren't things that are readily available to Seti |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you are worried about turnaround time and the stress it puts on the servers, the quick and easy fix would be to shorten the task deadlines to something around the nominal return time we see currently. This has been discussed in length in another thread. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Maybe a bit more homework on this subject should've been done before posting. For starters, anyone with a rig that can store a week's worth of work is a slow rig. There are hard limits of 100 CPU tasks and 100 tasks per GPU for every rig which here with those limits I'm very lucky to make it through the weekly outage without running out of work (I regularly look to my backup projects before the end of an outage). Stopping those people who join, suck up a full cache's worth of work and then find out that their computers (especially those with usable GPU's) arn't responding the way they should so they dump the program and leave all their remaining tasks to timeout would be a better place to look for a solution as fixing this problem would certainly take a real load off the servers. ;-) Cheers. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Oh, yes. I forgot about the 100 task limits. That needs to play into it. But in your case, those limits sound small. Yes, killing off zombies should help and could be attacked separately. I think I remember there is a feature in Boinc that could help this, but SETI never supported it. Can't remember, though. Yes, shortening the turnaround time seems a good idea, possibly coupled with limiting the amount of work permitted to be cached. My guess is that shortening it gradually will not have an effect on results received up to a point. That point is close to where the new cut off should be set. I don't understand Zalster's comment. Modest reductions in maximum queue length, say from 10 days to 2 days shouldn't hurt many clients but cut down a load of work units waiting around on the clients and on the server. Perhaps that changed could be tested on a willing client, to see if reducing the local queue to two days would change the average output. One fly might be that 100 task limit, I suppose. In Zalster's case, care would be needed to look at the results in terms of individual compute node performance because he has sooo many compute nodes! My point is that the flow of work should have a minimal latency, though what is minimal may require some empiricism. Buffering vast amounts of work on the clients on purpose makes no sense and probably harms the project by increasing overhead time at the servers. Buying larger disks and faster servers isn't the answer, at least not the prudent answer, in my mind. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I believe you will find that the majority of fast crunchers have their cache work load set very low. I have mine set at 2 days with 0.1 additional days of work cached. That is more than enough to keep the 100 tasks per cpu and gpu at the limit. The small store additional days setting forces BOINC to ask for replacement work at every scheduler check in. We have plenty of troubles keeping the crunchers fed through a project maintenance day or unscheduled outage because of fast turnaround time and throughput. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I believe you will find that the majority of fast crunchers have their cache work load set very low. The faster the system, the less the cache setting has any meaning. Only slow systems can hold any cache of work larger than 24hours. Limiting the maximum cache to say 4 days, then increasing the server side limits so that those WUs freed up can go to faster systems so they aren't out of work for as long each week would be nice. But i suspect the actual impact on numbers in progress would be minimal. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.