Other projects offline

Message boards : Number crunching : Other projects offline
Message board moderation

To post messages, you must log in.

AuthorMessage
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 821207 - Posted: 20 Oct 2008, 23:12:31 UTC

Malaria went offline over the weekend -- for them that includes their homepage so there is no announcement of what the problem is or when they hope to have it resolved -- still in vapor land as of 4PM Pacific Coast time.

Climate went offline early this morning. Their database is offline: 'The main database will be down for a short time whilst we carry out essential maintenance. Apologies for any inconvenience.'. Short time is undefined there -- but at least this go round it is defined by facts as being greater than 8 hours (so far).

ID: 821207 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 821284 - Posted: 21 Oct 2008, 1:26:20 UTC - in response to Message 821207.  

Climate will probably be down until late Tuesday. They are installing a new file server with 10TB of space. :)

As for Malaria, looks like the web server may be in the process of restarting, but it appears they had some kind of filesystem crash.

Rosetta was also having issues but should be OK by now:
Oct 18, 2008 - We're having increasing problems with our project's fileserver, causing intermittent outages. We're in the process of building out the replacement system but it is going to take some time to iron out the issues. We apologize for the troubles.

Long live SETI!
ID: 821284 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 821414 - Posted: 21 Oct 2008, 5:32:41 UTC - in response to Message 821284.  

OK -- thanks for the extra detail - I guess with climate change 'short time' means something different <smile>.

I was aware of the Rosetta issues -- I saw the news update there -- but it looks like they are running fine now.

For Malaria, since whatever problem they have took their home page offline, we get to go into speculation mode. As they typically have tight due dates, I suspect they will do a bit of a push out to compensate there. I 'caught' their outage early, so I simply placed Malaria on suspend so as to not generate a bunch of network access attempts while leaving my other projects active (I did the suspend thing with Climate for the same reason, though with their long due dates a couple day outage shouldn't cause any work unit due dates problems).



Climate will probably be down until late Tuesday. They are installing a new file server with 10TB of space. :)

As for Malaria, looks like the web server may be in the process of restarting, but it appears they had some kind of filesystem crash.

Rosetta was also having issues but should be OK by now:
Oct 18, 2008 - We're having increasing problems with our project's fileserver, causing intermittent outages. We're in the process of building out the replacement system but it is going to take some time to iron out the issues. We apologize for the troubles.

Long live SETI!


ID: 821414 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 821610 - Posted: 21 Oct 2008, 22:47:48 UTC - in response to Message 821284.  

Looks like the work over on Climate was (surprise, surprise) larger than hoped for. They operate out of the UK I believe, so it looks to be at least another day before they come back online.

On the good news front, Malaria is back online -- on their message board I found this report:

****************

It seems that the web-server was very unreliable over the last few days. As a consequence, the website and also the job scheduler here on the server could be reached only intermittently.

We have restarted the server and currently it seems to work fine. We have increased the grace period for overdue results to 3 three days to minimize the impact of this problem on your credit accounts. In addition, we have set up an additional monitoring service that will help us respond faster in case this should happen again.

We apologize for the inconvenience!
Nick

*************

So that seems to be resolved (at least for now).


Climate will probably be down until late Tuesday. They are installing a new file server with 10TB of space. :)

As for Malaria, looks like the web server may be in the process of restarting, but it appears they had some kind of filesystem crash.



ID: 821610 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 821784 - Posted: 22 Oct 2008, 5:01:07 UTC - in response to Message 821610.  
Last modified: 22 Oct 2008, 5:01:36 UTC

Based on the last DB outage at climate, they could be down for a week! Too bad as my PC just finished a big model. Can't crunching anything else because I have to keep Network disabled to prevent BOINC giving up too soon.
ID: 821784 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 821809 - Posted: 22 Oct 2008, 7:02:17 UTC - in response to Message 821784.  

Well, that would be disappointing -- not a big deal for me - I don't run their larger models and none that I have are anywhere close to having due dates -- I simply suspended Climate and let the other projects pick up the cycles.

ID: 821809 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 821819 - Posted: 22 Oct 2008, 7:47:37 UTC - in response to Message 821784.  

Based on the last DB outage at climate, they could be down for a week! Too bad as my PC just finished a big model. Can't crunching anything else because I have to keep Network disabled to prevent BOINC giving up too soon.

Errrr - can you explain "giving up too soon?". I have an AM model that completed yesterday and has been trying to Report. I manage Resource Share for Climate at 25% on my rig by manually suspending all bar one of the models that downloads and then unsuspending each when its predecessor is completed (obviously I set NNT until a set of downloads is completed). So Climate is happily crunching away at the next model in my queue (an SM model). So - are you saying that the completed model will error out if it can't report soon?
ID: 821819 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 821822 - Posted: 22 Oct 2008, 8:01:35 UTC

CPDN Beta is online and working.
Tullio
ID: 821822 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 821831 - Posted: 22 Oct 2008, 10:29:30 UTC - in response to Message 821784.  
Last modified: 22 Oct 2008, 11:22:19 UTC

Based on the last DB outage at climate, they could be down for a week! Too bad as my PC just finished a big model. Can't crunching anything else because I have to keep Network disabled to prevent BOINC giving up too soon.

When a BOINC model finishes, whether CPDN or any other project I know of, two things happen:

1) a result data file is uploaded
2) the final state of the task is reported

So far as I know, it's only the 'file upload' (as seen in the transfers tab of BOINC Manager) which gives up - after, I believe, 14 days.

I have checked on one of the bits of CPDN which is still working (as Tullio says, their Beta is also up and running), and I have received an assurance that the file upload servers should be up and running (and currently have plenty of spare space - 12 TB has recently been moved to alternative storage). So there should be no problem about allowing network activity, to complete the stage (1) upload and allow your BOINC to collect work from other projects.

Obviously the second stage, reporting, isn't going to be possible during the outage, and nor will the intermediate trickles be accepted. But I don't think there's the same time pressure on those transactions. I'll keep digging for more information, and let you know what I find out.

Edit - I am advised that there is no time limit on the trickle uploads or task completion reports. So there is no need to disable network activity in response to this particular CPDN outage.
ID: 821831 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 821846 - Posted: 22 Oct 2008, 12:10:32 UTC

http://www.climateprediction.net/index.php is back. My latest trickle message just uploaded.
ID: 821846 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 821847 - Posted: 22 Oct 2008, 12:11:36 UTC - in response to Message 821846.  
Last modified: 22 Oct 2008, 12:15:59 UTC

http://www.climateprediction.net/index.php is back. My latest trickle message just uploaded.

Likewise. You beat me to the post - but only just!

Edit - they'll be going through a recovery phase for a while - the site is very slow to load just now.
ID: 821847 · Report as offensive

Message boards : Number crunching : Other projects offline


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.