Ups and Downs (Aug 05 2008)

Message boards : Technical News : Ups and Downs (Aug 05 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 793379 - Posted: 5 Aug 2008, 23:15:08 UTC

Today was another one of them "outage days" where we shut everything down to do basic weekly maintenance (database backup and whatnot). We had a particularly large task list this time around. A lot of it was fairly mundane - like moving/compressing files to make more room on various storage systems.

The sidious crash the other day did in fact break the mysql replica again. No big deal, but that meant recreating the database from the master - a seemingly weekly occurrence. It's easy to do, just adds extra time to the whole operation.

Also, we tried to fix that broken index on the science database. We found the corruption was actually not on the RAID system we thought (the one that required a drive replacement). Huh. Anyway.. the index repair on the whole table was taking too long. We might just go ahead and drop/rebuild the specific index later now that we are more sure what's what.

We brought all our backend services (feeder, transitioner, validator, etc.) up to spec on current BOINC code for the first time in a long time, so we carefully turned these on one at a time to observe the logs/results and make sure nothing got all screwy with the updated code.

So we're back up, more or less. The current mystery is why we are using so much bandwidth. Too many factors at play to make a clear determination - lots of known network bottlenecks, lots of database bottlenecks, unknown Astropulse behavior, etc. We'll give this a closer look tomorrow after (hopefully) some of the traffic jams disappear.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 793379 · Report as offensive
Profile [KWSN]John Galt 007
Volunteer tester
Avatar

Send message
Joined: 9 Nov 99
Posts: 2444
Credit: 25,086,197
RAC: 0
United States
Message 793388 - Posted: 5 Aug 2008, 23:27:18 UTC

Thanks for the update..."Good job" to all!!
Clk2HlpSetiCty:::PayIt4ward

ID: 793388 · Report as offensive
Profile Mr. Majestic
Volunteer tester
Avatar

Send message
Joined: 26 Nov 07
Posts: 4752
Credit: 258,845
RAC: 0
United States
Message 793424 - Posted: 6 Aug 2008, 0:54:53 UTC

Thanks for keeping us up to date Matt.

ID: 793424 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 793429 - Posted: 6 Aug 2008, 1:00:39 UTC
Last modified: 6 Aug 2008, 1:11:19 UTC

@ Matt Lebofsky

I hope you noticed this thread in the Number_crunching area:

AstroPulse Ghost WUs !!!

I got to now two AP-WUs but didn't changed my app_info.xml for to get AP-WUs..

#1 - 4 Aug 2008 22:47:48 UTC

#2 - 5 Aug 2008 23:13:30 UTC


When will be the server updated/fixed?

When we can choose in the preferences Enhanced- and/or Astropulse- WUs?


EDIT:
It would be nice if it would be possible to choose (Enhanced- and/or AP- WUs) different in the 'Computing preferences'.
For 'Primary (default) preferences', 'Home', 'School' and 'Work'.
ID: 793429 · Report as offensive
Profile DaBrat and DaBear

Send message
Joined: 13 Dec 00
Posts: 69
Credit: 191,564
RAC: 0
United States
Message 793458 - Posted: 6 Aug 2008, 1:43:49 UTC
Last modified: 6 Aug 2008, 1:45:34 UTC

Thanks for keeping us up to date. This unable to upload is driving me mad.... Maybe the bandwidth issue is with so many WUS trying to upload. I started having issues sometime before midnight EST on the 4th... before the Tues outage. Oh well off to bed... maybe by morning the 70 or so uploaders I have will have safely made it back to the arms of SETI.

BTRW.... hopefully they won't all end up in the pending pile....lol
ID: 793458 · Report as offensive
Profile muddocktor

Send message
Joined: 2 Aug 06
Posts: 12
Credit: 28,074,814
RAC: 0
United States
Message 793467 - Posted: 6 Aug 2008, 2:02:58 UTC

I sure hope you can clear up all the server problems soon, Matt.I have several machines with work results that won't upload and I've also had problems with my machines getting new work too. Good luck on getting the issues sorted tomorrow.
ID: 793467 · Report as offensive
Profile Rev. Tim Olivera

Send message
Joined: 15 Jan 06
Posts: 20
Credit: 1,717,714
RAC: 0
United States
Message 793789 - Posted: 6 Aug 2008, 17:01:27 UTC - in response to Message 793379.  

Simple question, why has my BOINC gone from SETI@home to ASTRO PULSE?? I didn't
sign on to any ASTRO PULSE...

Tim Olivera
ID: 793789 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 793820 - Posted: 6 Aug 2008, 18:30:16 UTC - in response to Message 793789.  
Last modified: 6 Aug 2008, 18:33:26 UTC

Simple question, why has my BOINC gone from SETI@home to ASTRO PULSE?? I didn't
sign on to any ASTRO PULSE...

Tim Olivera


Everything is O.K. .. :-)

You are member of SETI@home and your PC get Enhanced- and/or Astropulse- WUs.

Have a look here:

Astropulse FAQ



BTW. Have a look in my profile, because of opt. Enhanced- applications.. :-)
ID: 793820 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 793914 - Posted: 6 Aug 2008, 23:04:04 UTC - in response to Message 793789.  

Simple question, why has my BOINC gone from SETI@home to ASTRO PULSE?? I didn't
sign on to any ASTRO PULSE...

Tim Olivera

Simple answer, the data recorded by the SETI@home project at Arecibo has much more potential information than we have been extracting with the original Spike and Gaussian searches, or the Pulse and Triplet finding added later. The AstroPulse application looks for microsecond pulses, either single or repeating, which are beyond the capability of setiathome_enhanced. Because the additional searching requires the full 2.5 MHz. recorded spectrum but less duration, a separate application is more appropriate than trying to combine it into a setiathome_double_enhanced.
                                                                Joe
ID: 793914 · Report as offensive
Profile Robert Gammon
Volunteer tester

Send message
Joined: 29 Aug 01
Posts: 21
Credit: 1,573,250
RAC: 0
United States
Message 795643 - Posted: 10 Aug 2008, 12:05:08 UTC - in response to Message 793467.  

I sure hope you can clear up all the server problems soon, Matt.I have several machines with work results that won't upload and I've also had problems with my machines getting new work too. Good luck on getting the issues sorted tomorrow.


I will note from observation of the server status over the last two days, that the back end processes, db-purge, and wu-purge (may have the process names in error) are working fine. In a few hours, certainly before Monday, all that back end work will be complete.

Result analysis is clogged up and not making much progress.

The tapes are not spitting out new WUs, or more precisely, any WUs read from the tapes are not making it to the outgoing queue.

So there is more going on than simply "Can't upload completed WUs"


ID: 795643 · Report as offensive
Sumyunguyy
Volunteer tester

Send message
Joined: 14 Jul 04
Posts: 12
Credit: 1,173,168
RAC: 0
United States
Message 795652 - Posted: 10 Aug 2008, 12:32:42 UTC

I have about 25 WU's waiting to be uploaded,when do you think this issue might be fixed?
ID: 795652 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 795662 - Posted: 10 Aug 2008, 12:47:14 UTC - in response to Message 795652.  

I have about 25 WU's waiting to be uploaded,when do you think this issue might be fixed?


Project servers and staff are in CA, USA. Their local TZ is UTC-7. I don't think tomorrow is a public holiday in CA, so I would think probably by around 18:00 UTC tomorrow work may start flowing again. Then you can expect a big flood as everyone else tries to upload, download, report tasks etc.

Things may be back to "normal" by around 01:00 UTC on 12th August.
Sir Arthur C Clarke 1917-2008
ID: 795662 · Report as offensive
Profile Walter Schmidt

Send message
Joined: 28 Aug 99
Posts: 1
Credit: 959,166
RAC: 0
United States
Message 795666 - Posted: 10 Aug 2008, 12:59:30 UTC


Can't U/L or D/L at 080810.0858-4

ID: 795666 · Report as offensive
Menno Vos

Send message
Joined: 4 Jul 99
Posts: 1
Credit: 328,705
RAC: 0
Netherlands
Message 795673 - Posted: 10 Aug 2008, 13:12:16 UTC - in response to Message 795666.  


Can't U/L or D/L at 080810.0858-4

Same here from the Netherlands, all uploads and downloads are stuck since noon Saturday 07-08 (local time, which is GMT +2)
ID: 795673 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 795683 - Posted: 10 Aug 2008, 13:38:08 UTC

I personally think it would be late Monday or early tuesday 9europe zone) before things are uploading/downloading.
ID: 795683 · Report as offensive
Profile Bob Mahoney Design
Avatar

Send message
Joined: 4 Apr 04
Posts: 178
Credit: 9,205,632
RAC: 0
United States
Message 795730 - Posted: 10 Aug 2008, 15:18:55 UTC

A moment-to-moment online service such as SETI will lose members if the members are not informed. It is likely more members/visitors will see the "News" column on the setiathome.berkeley.edu home page than expecting visitors to dig into the message boards.

I think SETI can enhance its excellent and generous reputation by performing much more frequent (and small) updates of the "News" column on the home page. This should take less staff time than posting multiple messages to the boards.

There is no need to feel embarrassment about a service outage - this is bleeding-edge stuff and no pain is no gain. We all know SETI is on the edge and is driven by passion as much as anything else.

Remember that when our crunching computers choke, we jump into action at our homes and offices and reload BOINC, reset our computers, do numerous Project Update commands, fiddle with our settings, Google for "SETI no new tasks" and read about an issue 5 years ago, etc. In other words, if I could have found out that my idle computers were not a result of some mistake on my part, I would be very sympathetic to all the fine scientists trying to keep SETI together with that magic bailing wire and duct tape.

Keep up the good work! But please save us users from the angst that comes with not knowing.
ID: 795730 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 795736 - Posted: 10 Aug 2008, 15:42:02 UTC - in response to Message 795730.  

A moment-to-moment online service such as SETI will lose members if the members are not informed. It is likely more members/visitors will see the "News" column on the setiathome.berkeley.edu home page than expecting visitors to dig into the message boards.

I think SETI can enhance its excellent and generous reputation by performing much more frequent (and small) updates of the "News" column on the home page. This should take less staff time than posting multiple messages to the boards.

There is no need to feel embarrassment about a service outage - this is bleeding-edge stuff and no pain is no gain. We all know SETI is on the edge and is driven by passion as much as anything else.

Remember that when our crunching computers choke, we jump into action at our homes and offices and reload BOINC, reset our computers, do numerous Project Update commands, fiddle with our settings, Google for "SETI no new tasks" and read about an issue 5 years ago, etc. In other words, if I could have found out that my idle computers were not a result of some mistake on my part, I would be very sympathetic to all the fine scientists trying to keep SETI together with that magic bailing wire and duct tape.

Keep up the good work! But please save us users from the angst that comes with not knowing.


Or, instead of going into panic mode and doing all that, you can just give it time. Even without an update, which I agree would be most helpful, there is no need to panic because of a few workunits that can't upload or download for a couple days.

I think people tend to panic too much and turn to micromanaging BOINC when there is a perceived problem. There are some computers on my account which belong to family members who have no idea when there are server problems or not. They simply go about their day, using their computer as usual, never even knowing if anything is wrong. They do not panic, they do not start pressing buttons and they don't check in on the website if there appears to be a problem. And BOINC always recovers on its own.
ID: 795736 · Report as offensive
gugi

Send message
Joined: 26 Mar 01
Posts: 1
Credit: 33,490
RAC: 0
Croatia
Message 795873 - Posted: 10 Aug 2008, 19:42:51 UTC - in response to Message 795736.  

Or, instead of going into panic mode and doing all that, you can just give it time. Even without an update, which I agree would be most helpful, there is no need to panic because of a few workunits that can't upload or download for a couple days.
I think people tend to panic too much and turn to micromanaging BOINC when there is a perceived problem. There are some computers on my account which belong to family members who have no idea when there are server problems or not. They simply go about their day, using their computer as usual, never even knowing if anything is wrong. They do not panic, they do not start pressing buttons and they don't check in on the website if there appears to be a problem. And BOINC always recovers on its own.

If I see 10 or more workunits that won't upload for two days, and no news about server downtime, I assume it's an error on my side.
If it is on MY side, it won't just magically go away in few days, so I do a lot of micromanaging. I just lost 2 hours checking my connections, settings, restarting, upgrading BOINC to the new version, restarting few more times and just went crazy because servers should be up and I cannot upload. All because seti@home team was too lazy to post one sentence in the News saying "servers down".
I can see the latest News entry now, but as any journalist will tell you: news about something that happened yesterday is not really news.
ID: 795873 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 795885 - Posted: 10 Aug 2008, 20:05:48 UTC - in response to Message 795873.  

If I see 10 or more workunits that won't upload for two days, and no news about server downtime, I assume it's an error on my side.
If it is on MY side, it won't just magically go away in few days, so I do a lot of micromanaging. I just lost 2 hours checking my connections, settings, restarting, upgrading BOINC to the new version, restarting few more times and just went crazy because servers should be up and I cannot upload. All because seti@home team was too lazy to post one sentence in the News saying "servers down".
I can see the latest News entry now, but as any journalist will tell you: news about something that happened yesterday is not really news.

You expect the handful of part time paid people to work 24x7 just to tell you there is an issue? Please read the Number Crunching forum which will give you information way before the team posts about the issues. The team knows about them but they have lives too. They do not babysit the systems, because they are not paid to. If they do work on weekends and stuff it's of their own time, etc.

This is a voluntary project for you, if you are doing work, good, if not, so what? It hurts nothing. The work will get done eventually.

Why do people think they must be so pushy at the project people? These guys go above and beyond the small amount they are paid to work. They monitor the project 24x7, but sometimes are not in a spot to fix it right away. That's why they even recommend you crunch other projects, so if you want your CPU to stay warm it will. If you only want to crunch SETI, well then be that as it may, but if you run out of work, or have problems sending and receiving, the people that work on the project already tell us it will happen, and may happen more often as we get more people, larger work, etc. There is no unlimited funds at this project, and they work on some substandard machines, etc.

I cannot understand why people who get nothing out of the project except a few credits to brag about, are so rude to the admins on this project.
ID: 795885 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 795922 - Posted: 10 Aug 2008, 21:04:43 UTC

... I just lost 2 hours checking my connections, settings, restarting, upgrading BOINC to the new version, restarting few more times and just went crazy because servers should be up and I cannot upload. All because seti@home team was too lazy to post one sentence in the News saying "servers down". ...


A quick look in the Number crunching forum could have saved you about 1 hour 58...

mic.


ID: 795922 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Ups and Downs (Aug 05 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.