Back Again (Aug 13 2009)


log in

Advanced search

Message boards : Technical News : Back Again (Aug 13 2009)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 925872 - Posted: 13 Aug 2009, 20:22:28 UTC

I was actually out the past couple of days. Family stuff, including an adventure where we had to tow our Prius almost 100 miles back to Oakland (it freaked out and lost power on I-5). It's in the shop now - luckily these newfangled cars store debugging information so they were able to locate the problem (flakey potentiometer causing erratic accelerator information, and as a failsafe the Prius cut its own power).

Anyway.. during the past couple of days Jeff and Bob handled the Tuesday outage, and Eric tackled a couple general network issues as well (the upload server got misconfigured somehow and was dropping excess connections, and then the assimilators were dead in the water for a while there, causing the queue to back up, the workunit disks to fill up, and finally the splitters to shut down - which is why we ran out of work to send out last night). All seems much better now, albeit jammed with traffic.

In better news we did finally get the first two data drives from Arecibo as recorded by the upgraded data recorder and new external drive docks under normal operations. So we're not going to run out of raw data after all, or at least not just yet. I'm copying those raw data files onto our local drives as I type.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4048
Credit: 32,693,129
RAC: 604
United Kingdom
Message 925880 - Posted: 13 Aug 2009, 20:50:23 UTC - in response to Message 925872.

Thanks for the update Matt,

Claggy

zpm
Volunteer tester
Avatar
Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,476,759
RAC: 3,192
United States
Message 925885 - Posted: 13 Aug 2009, 21:41:17 UTC - in response to Message 925872.

I'm copying those raw data files onto our local drives as I type.

- Matt


any idea when these will go into crunch mode....

____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.

B-Man
Volunteer tester
Send message
Joined: 11 Feb 01
Posts: 253
Credit: 147,366
RAC: 0
United States
Message 925886 - Posted: 13 Aug 2009, 21:52:22 UTC

So the pipes will be full for awhile and uploads will be jammed until we get all the AP WU out of the way. Looking at the server page it looks like 5-10 tapes so a blockage for around 24h or so. Thanks for the update Matt.
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 539,896
RAC: 577
United States
Message 925889 - Posted: 13 Aug 2009, 22:02:30 UTC - in response to Message 925872.

Horray for fresh radio data!

A new, shiny, eco-friendly Prius had to be towed by a diesel-burning gas guzzler for 100 miles? Wow, I need to tell all the cap&trade promoting congressman about this. :)

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 925901 - Posted: 13 Aug 2009, 23:41:56 UTC - in response to Message 925885.

I'm copying those raw data files onto our local drives as I type.

- Matt


any idea when these will go into crunch mode....

I see "tapes" dated July 2009 in the splitter queue.
____________

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 644
Credit: 5,366,437
RAC: 6,841
New Zealand
Message 925945 - Posted: 14 Aug 2009, 5:01:27 UTC - in response to Message 925872.

I'm copying those raw data files onto our local drives as I type.

Thanks for the update, nice to hear you took sometime to spend with family. How long does it take to copy raw data to your local drives? Now that fresh data is here is new AP work on the way? Server Status says four AP channels are in progress but at the mo all work has been grabbed as of 14 Aug 2009 4:40:11 UTC Ready to send As of* 29m
____________

Live in NZ y not join Smile City?

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8134
Credit: 52,635,755
RAC: 75,190
United Kingdom
Message 926467 - Posted: 16 Aug 2009, 5:34:02 UTC

Wot? No jobs?
Since about 06:30bst my PCs have been reporting "Message from server (Project has no jobs available)". Depending on which PC I look at this message is being sent every few minutes to about once an hour.
Are there no jobs on the servers, or is this a problem with a server?
Looking at the server status all looks well, but no jobs are forthcoming at my end of the string for the last 24 hours......
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 644
Credit: 5,366,437
RAC: 6,841
New Zealand
Message 926470 - Posted: 16 Aug 2009, 5:56:29 UTC - in response to Message 926467.

I know that a bunch of short tasks passed through about two days ago all of my shorties have been validated. Although everything minus a few mb splitters are green on the Server Status I think a system is not working correctly somewhere. I expect this to be loose at Monday Berkeley time & Tuesday New Zealand time. I'm out of Seti tasks so I'm helping World Community Grid out.
____________

Live in NZ y not join Smile City?

Norwich Gadfly
Avatar
Send message
Joined: 29 Dec 08
Posts: 100
Credit: 488,414
RAC: 0
United Kingdom
Message 926486 - Posted: 16 Aug 2009, 8:10:31 UTC - in response to Message 926470.

I know that a bunch of short tasks passed through about two days ago all of my shorties have been validated. Although everything minus a few mb splitters are green on the Server Status I think a system is not working correctly somewhere. I expect this to be loose at Monday Berkeley time & Tuesday New Zealand time. I'm out of Seti tasks so I'm helping World Community Grid out.

Another project which helps humanity is Malaria Control Net http://www.malariacontrol.net/

Profile tullio
Send message
Joined: 9 Apr 04
Posts: 3581
Credit: 362,356
RAC: 194
Italy
Message 926489 - Posted: 16 Aug 2009, 8:22:29 UTC

I am runing only CPDN models, one long range and one medium range. All my other 5 BOINC projects are not sending any work. Einstein had a filesystem crash.
Tullio
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24090
Credit: 517,737
RAC: 158
United States
Message 926566 - Posted: 16 Aug 2009, 19:10:20 UTC - in response to Message 926489.

I am runing only CPDN models, one long range and one medium range. All my other 5 BOINC projects are not sending any work. Einstein had a filesystem crash.
Tullio

CPDN has deadlines itself, and BOINC can decide that CPDN needs all of the CPU time now to meet the deadline, and will block downloads from everywhere else. the CPU time will be made up later by preventing CPDN from downloading work until the other projects have gotten their share of CPU time.
____________


BOINC WIKI

Profile tullio
Send message
Joined: 9 Apr 04
Posts: 3581
Credit: 362,356
RAC: 194
Italy
Message 926569 - Posted: 16 Aug 2009, 19:23:51 UTC
Last modified: 16 Aug 2009, 19:26:01 UTC

My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p.
Tullio
____________

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12127
Credit: 6,410,138
RAC: 8,367
United States
Message 926667 - Posted: 17 Aug 2009, 1:31:16 UTC - in response to Message 926569.

My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p.
Tullio

I've noticed that issue with CPDN as well. As soon as I got my most recent CPDN task most of the projects refused to send work, despite the CPDN deadline 1 year in the future and the crunch time being under a week. I think something in the work fetch was changed and now it refuses to realize it has plenty of time to crunch the CPDN models. I'm sure this is a BOINC bug, likely a bug in the design and not the implementation. Reading some of the Wiki for it, I'm pretty sure the work fetch was designed to only work when projects are set with the same resource share and have work units that are similar in crunch time and deadlines that are about the same. I don't think the designers ever expected say LHC which 99% of the time won't have work, but might be set to crunch 99% by a user so he gets a work unit if one ever becomes available. This screws up the assumptions behind the work fetch and it simply misbehaves. I suspect the fetch algorithm is in two places doing a backoff. First when it tries to calculate how much time will be available, but assumes that the queue will be filled by other projects without this check being made before they fill the queue. Of course they do hence the second backoff. The second problem is the short/long term debt isn't being adjusted when a project doesn't send work because it thinks it won't finish in time.

Schedulers are a PITA and I think the work fetch needs a going over with a fine tooth comb and some limit testing. I think the improvements to it over time introduced subtle bugs.

There is also a bug when a user changes his work priorities, the short and long term debt isn't also changed, which means that work won't be crunched in the ratio the user just set.


____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24090
Credit: 517,737
RAC: 158
United States
Message 926677 - Posted: 17 Aug 2009, 2:38:27 UTC - in response to Message 926667.

My CPDN deadlines are December 2011 for hadcm3 and July 2010 for hadam3p. All my TDCF are less than 1 except for AQUA where it is 40. I was running all 5 projects up to a few days ago when I had hadcm3 already running, I then downloaded hadam3p, which should take 130 hours on my Linux box. I have completed 15% of hadcm3 and 25% of hadam3p.
Tullio

I've noticed that issue with CPDN as well. As soon as I got my most recent CPDN task most of the projects refused to send work, despite the CPDN deadline 1 year in the future and the crunch time being under a week. I think something in the work fetch was changed and now it refuses to realize it has plenty of time to crunch the CPDN models. I'm sure this is a BOINC bug, likely a bug in the design and not the implementation. Reading some of the Wiki for it, I'm pretty sure the work fetch was designed to only work when projects are set with the same resource share and have work units that are similar in crunch time and deadlines that are about the same. I don't think the designers ever expected say LHC which 99% of the time won't have work, but might be set to crunch 99% by a user so he gets a work unit if one ever becomes available. This screws up the assumptions behind the work fetch and it simply misbehaves. I suspect the fetch algorithm is in two places doing a backoff. First when it tries to calculate how much time will be available, but assumes that the queue will be filled by other projects without this check being made before they fill the queue. Of course they do hence the second backoff. The second problem is the short/long term debt isn't being adjusted when a project doesn't send work because it thinks it won't finish in time.

Schedulers are a PITA and I think the work fetch needs a going over with a fine tooth comb and some limit testing. I think the improvements to it over time introduced subtle bugs.

There is also a bug when a user changes his work priorities, the short and long term debt isn't also changed, which means that work won't be crunched in the ratio the user just set.


There is a bug in 6.6.36 that can cause CPDN to take over completely. It is fixed in 6.6.38.
____________


BOINC WIKI

Profile tullio
Send message
Joined: 9 Apr 04
Posts: 3581
Credit: 362,356
RAC: 194
Italy
Message 926702 - Posted: 17 Aug 2009, 4:50:25 UTC
Last modified: 17 Aug 2009, 5:44:59 UTC

My BOINC client is 6.6.29. All my projects have equal share. Only LHC is set to NNT. for obvious reasons. I see that Einstein is asking for new tasks, after completing the last one. Unfortunately, Einstein is still off limits. Thanks for the help.
Tullio
Also SETI is asking for new work, which is not available.
____________

Profile Sacaripasa
Avatar
Send message
Joined: 29 Dec 05
Posts: 13
Credit: 836,147
RAC: 0
United States
Message 926712 - Posted: 17 Aug 2009, 5:33:12 UTC - in response to Message 926677.
Last modified: 17 Aug 2009, 5:33:49 UTC

Set CPDN to suspend and grab all the work you want from your other projects, then unsuspend it. With my Quad core its easy to accomplish this, but others may not be able to have as much success as I have. I have four hadam3p units right now due July 2010, but three are task suspended. Each day I suspend everything but milky way and grad 10 days worth from them and the only allow one CPDN to run while milky way crunches and seti is fixed. My GPU is hungry!

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12127
Credit: 6,410,138
RAC: 8,367
United States
Message 926713 - Posted: 17 Aug 2009, 5:37:01 UTC - in response to Message 926677.

There is a bug in 6.6.36 that can cause CPDN to take over completely. It is fixed in 6.6.38.


And 6.6.36 is the latest offered at the BOINC home page ...


____________

Profile tullio
Send message
Joined: 9 Apr 04
Posts: 3581
Credit: 362,356
RAC: 194
Italy
Message 926714 - Posted: 17 Aug 2009, 5:49:51 UTC - in response to Message 926712.
Last modified: 17 Aug 2009, 6:48:29 UTC

Set CPDN to suspend and grab all the work you want from your other projects, then unsuspend it. With my Quad core its easy to accomplish this, but others may not be able to have as much success as I have. I have four hadam3p units right now due July 2010, but three are task suspended. Each day I suspend everything but milky way and grad 10 days worth from them and the only allow one CPDN to run while milky way crunches and seti is fixed. My GPU is hungry!

Thanks, but I am leaving BOINC to do its business. So far it has worked well and I see no reason to interfere with it.
Tullio
Got an AQUA task, running high priority because my TDCF is 40 in AQUA.
____________

Message boards : Technical News : Back Again (Aug 13 2009)

Copyright © 2014 University of California