Message boards :
Technical News :
Upwards and Onwards (May 28 2009)
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
W-K 666 Send message Joined: 18 May 99 Posts: 19060 Credit: 40,757,560 RAC: 67 |
Don't forget that the first AP task that user will get will be with the stock app (so ~80 hours for that machine), but it will be estimated by BOINC at DCF=1.0, rather than the DCF=~0.4 typical for stock AP. DCF maybe a kludge, but it is probably one that needs to stay. The benchmarks only really test the core of the cpu, so when running tasks there can be a large difference in performance between cpu's with similar benchmarks. As we know AMD's don't, in general, do very well here on Seti, but there is also quite a big difference in performance between Intel cpu's at same clock speed and similar benchmarks but with differing amounts of cache memory. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
DCF maybe a kludge, but it is probably one that needs to stay. The benchmarks only really test the core of the cpu, so when running tasks there can be a large difference in performance between cpu's with similar benchmarks. The DCF definitely is a kludge, but because it's per project and not per application. What the discussion here is about is not the DCF but the initial estimate, as stated by Nicolas. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
DCF maybe a kludge, but it is probably one that needs to stay. The benchmarks only really test the core of the cpu, so when running tasks there can be a large difference in performance between cpu's with similar benchmarks. The problem is in fact with the initial estimate. But let me tell you about my newest cruncher. This machine exists primarily to run web statistics for hosting customers. It is not very busy -- or very big. It was selected solely for power consumption. It's a Via C7 at 1.5. The first work unit it received was AP. Crunching 24 hours/day, running the stock Astropulse application, it missed the deadline by five days, and some other machine completed the reissued work. The estimate said it had plenty of time. I mention the C7 because it has a particularly poor FPU. A reasonable estimate for any other processor is going to be way off on this processor. If BOINC over-estimates the run-time, and the actual time is shorter, everything is reasonably okay. If BOINC under-estimates, and downloads more work than it can do, there is a problem. ... and no matter how you try to fix this, someone is going to find some issue: if you raise the estimates you get safer scheduling while DCF dials-in, if you force the queue to be artificially small during the first few work units, then people will complain that the queue can't be filled. |
AlphaLaser Send message Joined: 6 Jul 03 Posts: 262 Credit: 4,430,487 RAC: 0 |
Yeah, its not so much a problem with DCF itself but rather that it should initialize to near 1.0. Isn't there not a built-in mechanism on the server side to automatically adjust the data used for estimation sent out to newly attached hosts based on previously returned work by other hosts? Or perhaps doing that adds to much load? Otherwise, it would seem like a nice feature to have for some projects. Though, DCF would work much better on a per-app basis. I would go even further and say that perhaps projects should be able to define categories of work (for SETI, that would mean for each group of ARs) -- that would provide flexibility for projects which might have apps with different modes of operation or different categories of input. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Regarding the missed deadline, does boinc have a method to ping the host to verify progress? If so, wouldn't that be a better and universal way to decide about reissuing, at least at first? (This question should probably in NC, sorry.) |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Regarding the missed deadline, does boinc have a method to ping the host to verify progress? If so, wouldn't that be a better and universal way to decide about reissuing, at least at first? (This question should probably in NC, sorry.) Are you asking if the SETI servers can "ping" the client on the volunteer's PC to verify progress? If so, the answer is no, and I don't forsee that changing for that would require an incoming connection into the user's PC, which means that firewalls would have to be configured to allow this to happen, and I can't imagine that too many corporate donors would be happy with this idea, nor would a few other users. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Regarding the missed deadline, does boinc have a method to ping the host to verify progress? If so, wouldn't that be a better and universal way to decide about reissuing, at least at first? (This question should probably in NC, sorry.) The way to do this would be for the BOINC client to "check in" and report progress periodically. Two problems: it adds another mechanism, with more work on the servers and more fields in the database, and it doesn't really help for machines with intermittent connections (dialup, portables, etc.). If you can't verify that a machine is making progress, you don't know if it is broken, or does not have connectivity -- you still have to wait for the deadline. |
Dena Wiltsie Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26 |
You still do have some information in that the last contact time is recorded. While this will not tell you what the status is, it will tell you they are still out there and connecting to the project. If they have not connected in a while, they could have quit the project or gone on vacation without draining their work. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I see. So if the host has checked in, but is late, then you might give it another quantum of time. And repeat this process a couple of times with smaller quanta, in order to save the efforts of the slower hosts. Of course this makes no sense if these hosts continue to get wu's that are 'too much' for them. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
You still do have some information in that the last contact time is recorded. While this will not tell you what the status is, it will tell you they are still out there and connecting to the project. If they have not connected in a while, they could have quit the project or gone on vacation without draining their work. Exactly. You have information "if" the machine is checking in. As you point out, the person can be on vacation, or they could have quit. The machine could have been destroyed, or the hard drive might have failed. ... or it could be crunching away happily, but not connected to the net. If it hasn't checked in for a while, that's all you really know. It could disappear for a week, and resurface just before the due-date. It could even report late, but before the work has been reassigned and reissued, complete the quorum, and move on. |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
I see. So if the host has checked in, but is late, then you might give it another quantum of time. And repeat this process a couple of times with smaller quanta, in order to save the efforts of the slower hosts. Of course this makes no sense if these hosts continue to get wu's that are 'too much' for them. For a project to get client to connect again, they can just set <next_rpc_delay> and appart for computers that's disconnected for some reason they'll get a scheduler-rpc. But, the problem for SETI@home is, this would be useless, since the database-server doesn't manage to handle the extra load to check for re-issue and abortions, so wouldn't have the capasity to do extra checks for any tasks in danger of passing deadline either as part of scheduler-requests. The Transitioner will check and update the task then the deadline is reached, but you can't extend the deadline based on last time host connected project, since you've no way of knowing if the task is just a little late to be returned, or for some reason is a "ghost-wu" and never made it to the host in the 1st. place, and re-issuing "lost" work is disabled since puts too much load on database... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
I had forgotten about that little limitation imposed by the database engineering. So many rough edges still after so many years. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I had forgotten about that little limitation imposed by the database engineering. So many rough edges still after so many years. Probably less of a "rough edge" and more of a case of just trying to carry a really big load on largely "recycled" equipment. If they had the money for faster servers, it might be different -- but part of what BOINC tries to do is minimize cost. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.