Message boards :
Technical News :
Small Word (Sep 20 2007)
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
How about this solution: Whenever there is no new work to issue, generate results from the oldest work units that have not yet been completed and have not had any reissues due to no reply results or this new system. This way, someone who panics and reformats due to some rootkit infection (which, if written well enough requires nothing short of a reformat and possibly a BIOS flash to cure 100%) or malware that is so new the antivirus solution doesn't know how to deal with it, or some rude vacationer who does not abort his work won't cause the work unit to stall too long, and since the work unit already has been split, there is no need to wait for the splitter to come up with the result. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
@n7rfa: I agree that the outgoing traffic would be larger when the replication level is greater than 2. If it were set to 3, then there would be an immediate 50% increase in outgoing downloads and a 'waste' of 33% of the outgoing bandwidth. As wu's are validated, the one wu still not returned will need to be cancelled. This cancellation has little impact on network bandwidth. I think the analysis stops there, since the same logic applies to each wu. If a rep of 4 were selected, then we'd 'waste' 50% of the outgoing bandwidth. And so on. On the plus side, I think the server side database requirements might reduce, because the persistance time of each wu in the database would go down. So while more results would be registered in the tables, they would be there for a shorter period of time, at least if we all ran boinc 5.10.20. @Jesse: You are proposing a bandaid for outliers. It could run without changing anything else running today. This is fine and could be implemented easily enough, it seems. I suppose at the cost of increased complexity one could issue wu's to clients that have similar tpt's. Slow clients matched with slow clients, etc. But this would require considerably more data analysis on the servers than I suspect is desired by the sysadmins there. One positive thing is that it would reduce the average size of the databases, because it should trim the pending-credit distribution. |
n7rfa Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0 |
@n7rfa: You're assuming that they will be cancelled immediately. They aren't cancelled until the client connects to the server and then only if the BOINC Client supports the cancellation and the client hasn't already started crunching it. 1/3 of my pending WUs are because the other Client hasn't connected and I'm only looking at the pending work for August. More Results being sent out will not resolve this aspect of the problem. Only a shorter Deadline will help in this case. Now let's consider the slow clients. As long as they are matched with 2 fast clients, they will be continually cancelling and downloading work. Oh, they will be crunching, but they will be continually wasting network bandwidth as well. Remember, the client downloads have been impacted recently by the number of spliters that are running. And it we hit another run of "short" WUs, there will be even more impact on the network response. In my opinion, shortening the Deadline to 3-4 weeks is the best all around solution to the "problem". |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
As Joe Segur has pointed out, the current formula gives a range of 13x between the shortest and the longest deadlines - 8.68 days to ~113 days. Yet my research suggested that the maximum variation to the rare extreme outliers was closer to 8x, and to commoner angle ranges nearer 6x. That suggests it would be perfectly reasonable to compress the range of deadlines, keeping the shortest at 8.68 days, but bringing the longest down to say 50 days. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
@n7rfa: Yes indeed you are right. I'm assuming continuous connection! So my suggestion is not as desirable as I had hoped. Regarding your comment about slow processors in my scheme: 1) statistically, the slow will not be matched with the fast until the fast are the dominant species and even then their caches will tend to grow so they look slow; and 2) when the slow simply cannot get credit because they are not fast, then account holders will either change projects or upgrade their hardware; in the latter case we all win. General: I would not be in favor of decreasing the deadline times. It's a philosophical objection, I guess, which should count for something. Seti was supposed to be a noble, egalitarian project after all. Slow processors are a source of pride for some of us and we don't give a hoot about more objective arguments to the contrary. Instead, as was suggested below more or less by Jesse, why not just run a process that finds the outlier wu's with a pending result that is way overdue for completion; then issue a redundant wu or two for cases that significantly exceed the norm for that client. The tpt data for each client is available, so why not use it at least for the outliers. Perhaps this is too big a programming challenge? |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
And so where has Matt been? I'm addicted to his updates on this board and he seems to have disappeared! Probably in some basement cutting a CD (or do you burn those? probably burn the bad ones) |
Mentor397 Send message Joined: 16 May 99 Posts: 25 Credit: 6,794,344 RAC: 108 |
Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all. Eventually, the pending credit thing will even out. It cannot keep rising indefinately. As I write this, my pending credit is 2,182. It will even out, and numbers will rise. However, as we are still technically recovering from a long period of Seti blues, that equilibrium hasn't been tested. I don't feel bad that my cache is eight days. I've seen others that have it set for the whole ten, and with the difficulties the project is experiencing, I'd like to have a cushion in case something goes wrong, either on my end or on Seti's. That being said, I don't crunch for anyone else and really don't care to. While I'm sure there is plenty of useful science out there, I'm with this project because Seti interests me far more than saving the planet or curing disease. (wow, that made me sound heartless!) But, up until this past March, my computers have ALWAYS sucked. In fact, the computer before this one was an P3-450. Apparently computers don't like to be dropped - write that down. I knew way back when I signed up that Seti was going to be about numbers for some people. While EVERYONE wants to contribute more than anyone else, there are going to be people who want to tweak the rules so they get their numbers first. But, think about it. How much pending credit is lost? AFAIK, none. It may take a while, it may take a LONG while, but eventually people get their credit they want. Your computer(s) still does the same amount of work every single day whether your pending credit is 30 or 3000, it just means that you have more out there that you haven't gotten credit for YET. The ONLY justification I can see for crunchers to reset the deadlines is so they get their credit sooner, thereby eliminating the slower computers. Whew, sorry this is so long. My first post on this board, tee hee! I'm all a-twitter! |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
My computers are far from the fastest in the world but they are all I can afford. They do the job and keep crunching along. http://setiathome.berkeley.edu/hosts_user.php?userid=258982 I keep them up to date with the latest BOINC and SETI apps so that they can at least try to keep up. The only problem I have with the long completion dates are that at least two of my pending wingmen have gone AWOL without aborting their WUs or detaching from the project. One of them expires on the 5th of October. He got the WUs on Aug. 14 and has not been heard from since. Another one doesn't expire until sometime in Nov. That leaves the WU hanging for two months before it even goes out to someone else to complete. I really hope someone comes up with some way to identify people like this so that the work could progress the way it is supposed to instead of just sitting there waiting on an answer that is never coming. PROUD MEMBER OF Team Starfire World BOINC |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all. As the system is currently set up, after a quorum has been met, the servers will attempt to cancel the workunits sent out to all other hosts. If the system is currently crunching a workunit that has been marked to be canceled, the system will allow the host to keep crunching until completion, meaning that the scientific value is zilch since the quorum are already met, so the host is now crunching purely for the credit since the server will still grant credit for work done before the WU deadline. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all. And your point is......? The system is working perfectly. Hosts were crunching WUs for nothing other than cobblestones much more before the change to an initial issue of 2. Now most of the crunching being done, even by dog-slow rigs, is valid science. As the kitties have written, as it shall be done. "Time is simply the mechanism that keeps everything from happening all at once." |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all. My point was to answer Mentor's question about what happens when a quorum is met and a workunit is scheduled for deletion and a host is currently crunching that workunit. I thought my point was quite obvious being that I even quoted the portion of text that I was replying to. |
Heflin Send message Joined: 22 Sep 99 Posts: 81 Credit: 640,242 RAC: 0 |
I guess I'm ABNORMAL. [B}I **LIKE** Pending credit![/B] It is like ... my computer's have worked faster than my wingmen's. Or I'm gonna get something for nothing in the near future. Kind of like Christmas morning: I'm up and see the presents but have not opened them yet. Neither really accurate, but a more positive "feeling" Maybe we should have a competition to see who can get the HIGHEST Pending Credit Score? Maybe a page to list folks via Pending Credits? Sure, actual credit may be different but they still don't have any cash value. SETI@home since 1999 "Set it, and Forget it!" |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
(for those that don't remember, anything with an AR of 1.2 or above would have a [relatively] short crunching time and a very short deadline [some in the one week range], a cache full of these WU's would put the computer they were assigned to into EDF [and "no New Tasks"] for the duration of processing the cache!) Solution is simple: Don't eat what you can't chew - don't accept short WUs if at end of cache. Don't get given what you can't eat - server doesn't give short WUs if cache is too big. Spit out what you can't eat. - return WUs for someone else if you can't meet the deadline, or disappear. No evil nasty horrible EDF required! What's the problem with EDF anyway? You've got enough on your plate, and you want more? No wonder the West has a weight problem! |
Scrooge McDuck Send message Joined: 26 Nov 99 Posts: 1191 Credit: 1,674,173 RAC: 54 |
I want to thank Mentor for his very good post. It sumarizes the whole discussion perfectly. So one may beat on those, who get large amounts of WUs and then simply go away, never be seen again. But it's very important for the project that a normal user, not having the state-of-the-art system is able to contribute, even if his system runs only some hours a day. He doesn't want to crunch simply for the credit, if the quorum was already reached for his WU. So I will never buy or run a fast host simply for SETI. But it's the perfect usage for the idle cycles on my normal desktop machines. If hard deadlines would prevent me from doing so, I would leave the project and I think a lot of other people would do the same. This may not reduce the computing power significantly, but every user tells his friends, buddies... about the project. So it's a simple question: What audience is Seti@home adressing? Is it only a group of maybe some 10K power users, using a farm of new systems in their company and running 24/7? Seti@home grew to its current state by millions using the simple and nice Seti Classic, without deadlines and a fantastic team at Berkeley, knowing their work is appreciated by all those people worldwide. I think, we should keep it this way and the power users will get the credits some days or weeks later. |
Zentrallabor Send message Joined: 8 Jan 00 Posts: 6 Credit: 70,525 RAC: 0 |
@perryjay .. I'm far from coming up with a solution to your problem, but the reason (at least for Josh's machine not sending back the results to WUs from Aug-14) seems simple: IMO he experimented with WinVista - had problems and now runs his machines again with WinXP ;) Best would have been, if he cancelled the downloaded results before abandoning his WinVista-experience, but now it may be too late. :-( I also think that such abandoned machines/clients are most of the problem some users are getting annoyed by their pending credits (not because of credits but because of lagging results which blow up the database). This is not a specific WinVista-problem but a common problem with users trying to run BOINC or new/unknown BOINC-projects. A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space). Did anyone hear from the team if the long duration for such partly abandoned WUs is _really_ an issue for the S@H-servers? Maybe the servers can get along easily with 2megs of open results. ;-) Right now my client was able to DL 3 WUs only after trying (only once manually triggered) for more than 30 minutes - such multi-requests are also something causing unneccessary load on the servers and network. When looking at the server-stats I think it's a problem of WU-creation: some weeks ago only 3-4 mb-splitters worked with 10-20 WUs/second creation, now 6 mb-splitters seem to create fewer WUs?? Regards, Chris P.S. Thanks to OzzFan and the others for their replies to my earlier post. If there were good reasons for abandoning RRI, I won't waste another minute on it (it's a smaller problem of the "pending credits" (better: unvalidated results) - results will only stay a bit (at most 24h) longer unvalidated). |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space). I'm guessing you're right on this issue. Although attrition is much lower percent of users on SETI vs. CPDN, I think this does explain it well. I too have been in that situation; after you uninstall BOINC, it's too late to abort the WU. I suppose a super-user would log into the website and manually about a WU if such a form/button existed on their results page. That may make the BOINC system as a whole more complicated though. Right now my client was able to DL 3 WUs only after trying (only once manually triggered) for more than 30 minutes - such multi-requests are also something causing unneccessary load on the servers and network. When looking at the server-stats I think it's a problem of WU-creation: some weeks ago only 3-4 mb-splitters worked with 10-20 WUs/second creation, now 6 mb-splitters seem to create fewer WUs?? Yes, the problem this year all along has been I/O contention. More splitters run but each of them run slower. They are competing for the same disk. I'm sure Matt has been playing with the number of splitters to find the best configuration overall. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space). Matt has also been palying games with moving some of the HD capacity to different volumes. BOINC WIKI |
Invisible Man Send message Joined: 24 Jun 01 Posts: 22 Credit: 1,129,336 RAC: 0 |
Help - Come back Matt, from wherever you are. Reason: project is down. Viv. |
edjcox Send message Joined: 20 May 99 Posts: 96 Credit: 5,878,353 RAC: 0 |
Noted that LIDOS got a lot of press recently about the discovery of a series of repetetive emanations from the vicinity of a Pulsar cluster. They are of course capitalizing on this and turning a "discovery" into a request and justification for more funding.. So what if anything does SETI have in it's database about that space region? Or are we once more no able to vie that area of space due to Aricibo's narrow sliver of the sky.. Anyway's would like to hear from some "experts" and what they make of the report. Never engage stupid people at their level, they then have the home court advantage..... |
Invisible Man Send message Joined: 24 Jun 01 Posts: 22 Credit: 1,129,336 RAC: 0 |
Many thanks somebody. All the red status blocks are now green again. Well done. Viv. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.