Message boards :
Technical News :
Post-Weekend Roundup (Feb 05 2007)
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Additional information on Zombie creation. The records do not seem to be processed in report date order. They seem to be processed by download order instead. I watched this happen several times and have recorded an event as an example. In order the jobs to be run were dated May 10, April 21 and May 10 with run times of 7, 2 and 7 hours. When the next job was selected, the first in the list (May 10) was processed. The system has been running for several days now and is an Apple OS X PPC system. This is not a problem for me because I limit data to only one day worth of processing, but people loading a week or more worth of data could have problems with this and create Zombies by having jobs time out before being processed. If you have three results to process, and only three, and you've got "connect every 1 days" then these three work units represent no deadline pressure at all. They will be done in the order they were assigned. If BOINC sees that it can't return a work unit before the deadline, it processes in order by deadline instead -- and as long as the work estimate is reasonably accurate (that the 2 hour work unit will take 2 hours) and that the machine is on a reasonable amount of time per day, everything will get reported on time. There are only a couple of reasons that might not be true: 1) Really big cache and lots of short deadlines. 2) Change in the operating patterns of this machine. But with 16 hours of work, connect every day, and the closest deadline more than a week away, the odds of something getting orphaned is very small. |
Dena Wiltsie Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26 |
Avoiding orphaned data is the reason I limit myself to one day of work. Processing in received order can still bite. I had a World Community Grid work unit that uncovered a bug and required over 35 hours to complete when it should have taken less than 4 hours. It forced my system to stop swapping jobs till the work unit was complete. I just think it would make the code much cleaner to process by Report Deadline instead of placing special conditions on when the hurry up code can be used. If the special code miscalculates when Deadline code is used, you end up with expired work units. You can also have people that turn their system off over the week end that would cause the prediction software to fail because it was unable to run while the system was powered off. You would buy more time by processing early deadlines first. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Avoiding orphaned data is the reason I limit myself to one day of work. Processing in received order can still bite. I had a World Community Grid work unit that uncovered a bug and required over 35 hours to complete when it should have taken less than 4 hours. It forced my system to stop swapping jobs till the work unit was complete. I just think it would make the code much cleaner to process by Report Deadline instead of placing special conditions on when the hurry up code can be used. If the special code miscalculates when Deadline code is used, you end up with expired work units. You can also have people that turn their system off over the week end that would cause the prediction software to fail because it was unable to run while the system was powered off. You would buy more time by processing early deadlines first. Dena, The big problem is CPDN. Deadlines for CPDN are on the order of a year for most folks. If you process strictly by deadline order and gave CPDN a 50% resource share, you'd do no work on CPDN until debt or deadlines got close, then nothing but CPDN for a very long time, and then no more CPDN for months. Otherwise, I'd agree -- but that only works if the work units are all measured in hours or days, not months and years. -- Ned |
Aurora Borealis Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0 |
I've had a project make an error of a factor of 65 in its crunching estimate. 3 hours of crunch time slowly became 197 hrs, on a two week deadline. I had work on hand for all the other six projects I was connected to at the time. Boinc managed to process everything else I had on the computer on time and managed to return the errant WU only 3 or 4 hr later than my due time. By then, the project had noticed the problem and made adjustments at their end to accept and credit the past due WU. That was on my slow Duron system. As long as you keep a reasonable sized cache there is little reason to worry that Boinc will have problems under normal circumstances and it can even adapt to some degree when the projects mess up. I've never lost any credits or wasted CPU time because Boinc wasn't able to process and return work on time. As far as I'm concerned, the Boinc work scheduler as always been extremely efficient at doing its job as intended. I can only tip my hat to JM7 and the other developers who are continuously tweaking and improving it to remove the few 'special situations' glitches that remain. Boinc V7.2.42 Win7 i5 3.33G 4GB, GTX470 |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
I had a situation as follows: I had just set up a PIII 750 laptop for Boinc/Seti. My standard "home" computer at the time was a AMD Sempron 2400+, with a 4.5 day cache. For some reason, the BOINC client (5.4.11) thought that the LT had the same processing time as the "home" machine and downloaded results comensurate with the home machine's processing time. (About 20 WU's, each in the 7 hour range, for my "home" machine.) Unfortunately, those WU's take about 24-25 hours on the LT! Once I ran by deadline, (10 days for all the WU's) a) the BOINC client had a better idea of how long a WU takes to crunch on the LT; and b)I manually cancelled all the WU's left on the machine; (They showed as "compute error" in Berkeley.) reset the project; got more WU's; and haven't had a problem since. (This was back in January of this year.) [edit to add] My home computer has been upgraded since then, BTW [/edit] . Hello, from Albany, CA!... |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
I had a situation as follows: did you copy the entire BOINC directory, or just the account_*.files? Or was it a clean install? BOINC WIKI |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
I had a situation as follows: Entirely clean install of BOINC on the LT... the LT came to me (it was used...) with Win XP Pro installed, (upgraded from Win Me) and Office 2K3 on it (although the installation seems faulty, the few times I've tried Office on the LT...) The home machine runs XP Home Ed, and Win XP Pro x64, BTW (I dual boot...) Boinc is installed on x64, but I only run ClimatePrediction on x64... . Hello, from Albany, CA!... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.