Back Again (Aug 13 2009)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 925872 - Posted: 13 Aug 2009, 20:22:28 UTC I was actually out the past couple of days. Family stuff, including an adventure where we had to tow our Prius almost 100 miles back to Oakland (it freaked out and lost power on I-5). It's in the shop now - luckily these newfangled cars store debugging information so they were able to locate the problem (flakey potentiometer causing erratic accelerator information, and as a failsafe the Prius cut its own power). Anyway.. during the past couple of days Jeff and Bob handled the Tuesday outage, and Eric tackled a couple general network issues as well (the upload server got misconfigured somehow and was dropping excess connections, and then the assimilators were dead in the water for a while there, causing the queue to back up, the workunit disks to fill up, and finally the splitters to shut down - which is why we ran out of work to send out last night). All seems much better now, albeit jammed with traffic. In better news we did finally get the first two data drives from Arecibo as recorded by the upgraded data recorder and new external drive docks under normal operations. So we're not going to run out of raw data after all, or at least not just yet. I'm copying those raw data files onto our local drives as I type. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 925872 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 925880 - Posted: 13 Aug 2009, 20:50:23 UTC - in response to Message 925872. Thanks for the update Matt, Claggy ID: 925880 ·

zpm Volunteer tester Send message Joined: 25 Apr 08 Posts: 284 Credit: 1,659,024 RAC: 0	Message 925885 - Posted: 13 Aug 2009, 21:41:17 UTC - in response to Message 925872. I'm copying those raw data files onto our local drives as I type. - Matt any idea when these will go into crunch mode.... I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/ Go Georgia Tech. ID: 925885 ·

B-Man Volunteer tester Send message Joined: 11 Feb 01 Posts: 253 Credit: 147,366 RAC: 0	Message 925886 - Posted: 13 Aug 2009, 21:52:22 UTC So the pipes will be full for awhile and uploads will be jammed until we get all the AP WU out of the way. Looking at the server page it looks like 5-10 tapes so a blockage for around 24h or so. Thanks for the update Matt. ID: 925886 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 925889 - Posted: 13 Aug 2009, 22:02:30 UTC - in response to Message 925872. Horray for fresh radio data! A new, shiny, eco-friendly Prius had to be towed by a diesel-burning gas guzzler for 100 miles? Wow, I need to tell all the cap&trade promoting congressman about this. :) ID: 925889 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 925901 - Posted: 13 Aug 2009, 23:41:56 UTC - in response to Message 925885. I'm copying those raw data files onto our local drives as I type. - Matt any idea when these will go into crunch mode.... I see "tapes" dated July 2009 in the splitter queue. ID: 925901 ·

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 925945 - Posted: 14 Aug 2009, 5:01:27 UTC - in response to Message 925872. I'm copying those raw data files onto our local drives as I type. Thanks for the update, nice to hear you took sometime to spend with family. How long does it take to copy raw data to your local drives? Now that fresh data is here is new AP work on the way? Server Status says four AP channels are in progress but at the mo all work has been grabbed as of 14 Aug 2009 4:40:11 UTC Ready to send As of* 29m ID: 925945 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22327 Credit: 416,307,556 RAC: 380	Message 926467 - Posted: 16 Aug 2009, 5:34:02 UTC Wot? No jobs? Since about 06:30bst my PCs have been reporting "Message from server (Project has no jobs available)". Depending on which PC I look at this message is being sent every few minutes to about once an hour. Are there no jobs on the servers, or is this a problem with a server? Looking at the server status all looks well, but no jobs are forthcoming at my end of the string for the last 24 hours...... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 926467 ·

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 926470 - Posted: 16 Aug 2009, 5:56:29 UTC - in response to Message 926467. I know that a bunch of short tasks passed through about two days ago all of my shorties have been validated. Although everything minus a few mb splitters are green on the Server Status I think a system is not working correctly somewhere. I expect this to be loose at Monday Berkeley time & Tuesday New Zealand time. I'm out of Seti tasks so I'm helping World Community Grid out. ID: 926470 ·

Norwich Gadfly Send message Joined: 29 Dec 08 Posts: 100 Credit: 488,414 RAC: 0	Message 926486 - Posted: 16 Aug 2009, 8:10:31 UTC - in response to Message 926470. I know that a bunch of short tasks passed through about two days ago all of my shorties have been validated. Although everything minus a few mb splitters are green on the Server Status I think a system is not working correctly somewhere. I expect this to be loose at Monday Berkeley time & Tuesday New Zealand time. I'm out of Seti tasks so I'm helping World Community Grid out. Another project which helps humanity is Malaria Control Net http://www.malariacontrol.net/ ID: 926486 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 926489 - Posted: 16 Aug 2009, 8:22:29 UTC I am runing only CPDN models, one long range and one medium range. All my other 5 BOINC projects are not sending any work. Einstein had a filesystem crash. Tullio ID: 926489 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 926566 - Posted: 16 Aug 2009, 19:10:20 UTC - in response to Message 926489. I am runing only CPDN models, one long range and one medium range. All my other 5 BOINC projects are not sending any work. Einstein had a filesystem crash. Tullio CPDN has deadlines itself, and BOINC can decide that CPDN needs all of the CPU time now to meet the deadline, and will block downloads from everywhere else. the CPU time will be made up later by preventing CPDN from downloading work until the other projects have gotten their share of CPU time. BOINC WIKI ID: 926566 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 926569 - Posted: 16 Aug 2009, 19:23:51 UTC Last modified: 16 Aug 2009, 19:26:01 UTC My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p. Tullio ID: 926569 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 926667 - Posted: 17 Aug 2009, 1:31:16 UTC - in response to Message 926569. My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p. Tullio I've noticed that issue with CPDN as well. As soon as I got my most recent CPDN task most of the projects refused to send work, despite the CPDN deadline 1 year in the future and the crunch time being under a week. I think something in the work fetch was changed and now it refuses to realize it has plenty of time to crunch the CPDN models. I'm sure this is a BOINC bug, likely a bug in the design and not the implementation. Reading some of the Wiki for it, I'm pretty sure the work fetch was designed to only work when projects are set with the same resource share and have work units that are similar in crunch time and deadlines that are about the same. I don't think the designers ever expected say LHC which 99% of the time won't have work, but might be set to crunch 99% by a user so he gets a work unit if one ever becomes available. This screws up the assumptions behind the work fetch and it simply misbehaves. I suspect the fetch algorithm is in two places doing a backoff. First when it tries to calculate how much time will be available, but assumes that the queue will be filled by other projects without this check being made before they fill the queue. Of course they do hence the second backoff. The second problem is the short/long term debt isn't being adjusted when a project doesn't send work because it thinks it won't finish in time. Schedulers are a PITA and I think the work fetch needs a going over with a fine tooth comb and some limit testing. I think the improvements to it over time introduced subtle bugs. There is also a bug when a user changes his work priorities, the short and long term debt isn't also changed, which means that work won't be crunched in the ratio the user just set. ID: 926667 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 926677 - Posted: 17 Aug 2009, 2:38:27 UTC - in response to Message 926667. My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p. Tullio I've noticed that issue with CPDN as well. As soon as I got my most recent CPDN task most of the projects refused to send work, despite the CPDN deadline 1 year in the future and the crunch time being under a week. I think something in the work fetch was changed and now it refuses to realize it has plenty of time to crunch the CPDN models. I'm sure this is a BOINC bug, likely a bug in the design and not the implementation. Reading some of the Wiki for it, I'm pretty sure the work fetch was designed to only work when projects are set with the same resource share and have work units that are similar in crunch time and deadlines that are about the same. I don't think the designers ever expected say LHC which 99% of the time won't have work, but might be set to crunch 99% by a user so he gets a work unit if one ever becomes available. This screws up the assumptions behind the work fetch and it simply misbehaves. I suspect the fetch algorithm is in two places doing a backoff. First when it tries to calculate how much time will be available, but assumes that the queue will be filled by other projects without this check being made before they fill the queue. Of course they do hence the second backoff. The second problem is the short/long term debt isn't being adjusted when a project doesn't send work because it thinks it won't finish in time. Schedulers are a PITA and I think the work fetch needs a going over with a fine tooth comb and some limit testing. I think the improvements to it over time introduced subtle bugs. There is also a bug when a user changes his work priorities, the short and long term debt isn't also changed, which means that work won't be crunched in the ratio the user just set. There is a bug in 6.6.36 that can cause CPDN to take over completely. It is fixed in 6.6.38. BOINC WIKI ID: 926677 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 926702 - Posted: 17 Aug 2009, 4:50:25 UTC Last modified: 17 Aug 2009, 5:44:59 UTC My BOINC client is 6.6.29. All my projects have equal share. Only LHC is set to NNT. for obvious reasons. I see that Einstein is asking for new tasks, after completing the last one. Unfortunately, Einstein is still off limits. Thanks for the help. Tullio Also SETI is asking for new work, which is not available. ID: 926702 ·

Sacaripasa Send message Joined: 29 Dec 05 Posts: 13 Credit: 2,050,629 RAC: 1	Message 926712 - Posted: 17 Aug 2009, 5:33:12 UTC - in response to Message 926677. Last modified: 17 Aug 2009, 5:33:49 UTC Set CPDN to suspend and grab all the work you want from your other projects, then unsuspend it. With my Quad core its easy to accomplish this, but others may not be able to have as much success as I have. I have four hadam3p units right now due July 2010, but three are task suspended. Each day I suspend everything but milky way and grad 10 days worth from them and the only allow one CPDN to run while milky way crunches and seti is fixed. My GPU is hungry! ID: 926712 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 926713 - Posted: 17 Aug 2009, 5:37:01 UTC - in response to Message 926677. There is a bug in 6.6.36 that can cause CPDN to take over completely. It is fixed in 6.6.38. And 6.6.36 is the latest offered at the BOINC home page ... ID: 926713 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 926714 - Posted: 17 Aug 2009, 5:49:51 UTC - in response to Message 926712. Last modified: 17 Aug 2009, 6:48:29 UTC Set CPDN to suspend and grab all the work you want from your other projects, then unsuspend it. With my Quad core its easy to accomplish this, but others may not be able to have as much success as I have. I have four hadam3p units right now due July 2010, but three are task suspended. Each day I suspend everything but milky way and grad 10 days worth from them and the only allow one CPDN to run while milky way crunches and seti is fixed. My GPU is hungry! Thanks, but I am leaving BOINC to do its business. So far it has worked well and I see no reason to interfere with it. Tullio Got an AQUA task, running high priority because my TDCF is 40 in AQUA. ID: 926714 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.