Gasping for Air (May 14 2007)

Message boards : Technical News : Gasping for Air (May 14 2007)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 956,038
RAC: 1,098
United States
Message 567413 - Posted: 14 May 2007, 21:54:17 UTC
Last modified: 14 May 2007, 21:54:45 UTC

What a weekend. As noted by the others they successfully got the replacement science database server from Sun and brought it to the lab Friday afternoon. As we hoped it was basically plug n' play after putting the old thumper's drives in it. After some file system syncing and data checking Eric started the splitters on Saturday. All was well until bruno's httpd processes choked (more on that below). So we were not sending work for a whole day until Jeff kicked bruno this morning. The bright side is this allowed the splitters to create a whole pile of work in the meantime which we are sending out right now as fast as we can. The main bottleneck is NFS on the workunit file server which is (and always has been) choking at around 60 Mbps. It'll take a while for things to catch up.

We officially retired both koloth and kryten as of today - both are powered down, and in the case of koloth completely removed from the closet to make way for thumper, sidious, and then some. With the closet as empty as it has been in a long time I finally removed dozens of unused SCSI/ethernet/terminal/power cables that came with the rack, all tucked in various corners and secured with cable ties. The process of cutting the tightly wound ties in sharp metal cages left me with four bleeding wounds on my hands - nothing bad, only two required band aids - but I've wanted to get that particular clutter out of that rack for years.

With koloth and kryten gone bruno has been taking up most of the slack. I noticed last week it gets into these periods of malaise where httpd just stops working. I think this may be buggy restart logic when we rotate web logs, but it's a little weirder than that. Adding insult to injury one of its internal drives just up and died today. Luckily it was a RAID spare so nothing was harmed, and we had replacement drives already donated to us a while back. Eric replaced the drive, but we may need to reboot to fully pick it up. Probably during the usual outage tomorrow. Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time.

The server situation will be in major flux, and generally in a positive direction, over the next week or so. I'll be trying to keep updating the server status page, but I make no guarantees about its accuracy.

Thanks again for your patience during the past couple of weeks. While I appreciate the kind words and sentiments I should point out that this past weekend for me wasn't exactly restful time off. I was working at
my other job.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 567413 · Report as offensive
Profile Mephist0
Volunteer tester

Send message
Joined: 4 Dec 99
Posts: 12
Credit: 1,372,175
RAC: 0
Sweden
Message 567418 - Posted: 14 May 2007, 22:07:43 UTC

Gr8! Then it all should be back to normal then :) Good job and hope you get a little rest during this week Matt! :)

Take care!
ID: 567418 · Report as offensive
hugo_rune

Send message
Joined: 16 Apr 03
Posts: 1
Credit: 42,993
RAC: 0
United Kingdom
Message 567420 - Posted: 14 May 2007, 22:08:11 UTC
Last modified: 14 May 2007, 22:08:58 UTC

Well tbh, I've just detached the project and will stay away for a few days, if not slightly longer. I've had no joy downloading anything. (Heh i just got boinc up and running again after a break of 6 months or so just for the servers to go down >.<). I had downloaded a total of a meg in a week o.O.

I'll just let Rosetta, world community grid and folding@home crunch away until i have a half decent chance of downloading anything ;)

My sympathies for all the headaches this must have caused, and I'm quite happy to be patient ;)
ID: 567420 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 567424 - Posted: 14 May 2007, 22:10:43 UTC
Last modified: 14 May 2007, 22:11:11 UTC

Koloth and Kryten R.I.P. in computer scrap heaven or wherever you two are heading...

Thanks for the update, Matt. It's very much appreciated.

And we know you are working hard in both your jobs, hopefully having fun.

A huge thank you to all you people over there for your efforts. :-)




"I'm trying to maintain a shred of dignity in this world." - Me

ID: 567424 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 567425 - Posted: 14 May 2007, 22:15:15 UTC

As usual you have all gone above and beyond the call of duty. Many thanks, it is appreciated.
ID: 567425 · Report as offensive
Nick
Volunteer tester
Avatar

Send message
Joined: 30 Oct 99
Posts: 8
Credit: 2,881,357
RAC: 0
United Kingdom
Message 567427 - Posted: 14 May 2007, 22:17:03 UTC

What a weekend. As noted by the others they successfully got the replacement science database server from Sun and brought it to the lab Friday afternoon. As we hoped it was basically plug n' play after putting the old thumper's drives in it. After some file system syncing and data checking Eric started the splitters on Saturday. All was well until bruno's httpd processes choked (more on that below). So we were not sending work for a whole day until Jeff kicked bruno this morning. The bright side is this allowed the splitters to create a whole pile of work in the meantime which we are sending out right now as fast as we can. The main bottleneck is NFS on the workunit file server which is (and always has been) choking at around 60 Mbps. It'll take a while for things to catch up.

Good man! I don't pretend to understand the science or the terminology. However I do appreciate all the work that you chaps have put in (inc. Sun) since the project fell over.
I've SETI'd for some years now and don't intend to finish for a while yet.
Updates have been, and continue to be, a great bonus.
Well done.
Nick

ID: 567427 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7104
Credit: 147,313,424
RAC: 0
Germany
Message 567433 - Posted: 14 May 2007, 22:33:53 UTC
Last modified: 14 May 2007, 22:39:12 UTC



I have a lot of WUs with deadline May 18. UTC 04:40

The server will be fixed to this date?


ID: 567433 · Report as offensive
slayer4229

Send message
Joined: 12 Mar 03
Posts: 1
Credit: 13,253
RAC: 0
United States
Message 567442 - Posted: 14 May 2007, 22:42:21 UTC - in response to Message 567413.  

Could we get pics of the system power that is harnessed in the closet? Could make for some nice backgrounds.
-kris
ID: 567442 · Report as offensive
BigBrother

Send message
Joined: 27 Jul 99
Posts: 18
Credit: 3,451,359
RAC: 518
Sweden
Message 567448 - Posted: 14 May 2007, 22:52:27 UTC - in response to Message 567413.  

2007-05-15 00:49:43 [SETI@home] [file_xfer] Started upload of file 29ja04ab.27703.4690.467332.3.113_1_0
2007-05-15 00:49:43 [SETI@home] [file_xfer] Started upload of file 29ja04ab.27703.6272.1003390.3.57_2_0
2007-05-15 00:49:46 [---] Project communication failed: attempting access to reference site
2007-05-15 00:49:46 [SETI@home] [file_xfer] Temporarily failed upload of 29ja04ab.27703.4690.467332.3.113_1_0: system connect
2007-05-15 00:49:46 [SETI@home] Backing off 8 min 30 sec on upload of file 29ja04ab.27703.4690.467332.3.113_1_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Temporarily failed upload of 29ja04ab.27703.6272.1003390.3.57_2_0: system connect
2007-05-15 00:49:46 [SETI@home] Backing off 28 min 36 sec on upload of file 29ja04ab.27703.6272.1003390.3.57_2_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Started upload of file 29se04ab.22429.31842.317338.3.64_0_0
2007-05-15 00:49:46 [SETI@home] [file_xfer] Started upload of file 29se04ab.22429.31842.317338.3.62_0_0
2007-05-15 00:49:47 [---] Access to reference site succeeded - project servers may be temporarily down.

Is this a part of some of the recovery problems or what? It seems as if BOINC is able to access the servers, but none is responding or receiving?

Johan
ID: 567448 · Report as offensive
Profile Alex Striker

Send message
Joined: 11 Jan 04
Posts: 39
Credit: 31,121,081
RAC: 11,346
Denmark
Message 567452 - Posted: 14 May 2007, 22:55:36 UTC

Hi

The Update info you are writing to us is very appreciated, thanks for that, Matt
keep up the good work.

Sincerely
Alex Striker

Team Striker

Denmark

Team Striker - Denmark
Happy Crunching

/Alex Striker, founder of:
Team Striker SETI/BOINC

English version webpage
us on Facebook
ID: 567452 · Report as offensive
Profile Marta Holt

Send message
Joined: 28 Mar 02
Posts: 11
Credit: 576,381
RAC: 0
United States
Message 567457 - Posted: 14 May 2007, 23:03:02 UTC - in response to Message 567413.  

Thanks to all who got things together again, well done!!! I have work!!! yay

thanks again



What a weekend. As noted by the others they successfully got the replacement science database server from Sun and brought it to the lab Friday afternoon. As we hoped it was basically plug n' play after putting the old thumper's drives in it. After some file system syncing and data checking Eric started the splitters on Saturday. All was well until bruno's httpd processes choked (more on that below). So we were not sending work for a whole day until Jeff kicked bruno this morning. The bright side is this allowed the splitters to create a whole pile of work in the meantime which we are sending out right now as fast as we can. The main bottleneck is NFS on the workunit file server which is (and always has been) choking at around 60 Mbps. It'll take a while for things to catch up.

We officially retired both koloth and kryten as of today - both are powered down, and in the case of koloth completely removed from the closet to make way for thumper, sidious, and then some. With the closet as empty as it has been in a long time I finally removed dozens of unused SCSI/ethernet/terminal/power cables that came with the rack, all tucked in various corners and secured with cable ties. The process of cutting the tightly wound ties in sharp metal cages left me with four bleeding wounds on my hands - nothing bad, only two required band aids - but I've wanted to get that particular clutter out of that rack for years.

With koloth and kryten gone bruno has been taking up most of the slack. I noticed last week it gets into these periods of malaise where httpd just stops working. I think this may be buggy restart logic when we rotate web logs, but it's a little weirder than that. Adding insult to injury one of its internal drives just up and died today. Luckily it was a RAID spare so nothing was harmed, and we had replacement drives already donated to us a while back. Eric replaced the drive, but we may need to reboot to fully pick it up. Probably during the usual outage tomorrow. Bruno is dropping lots of packets right now, resulting in all kinds of upload/download snags and showing up as "disabled" on the server status page. This should clear up over time.

The server situation will be in major flux, and generally in a positive direction, over the next week or so. I'll be trying to keep updating the server status page, but I make no guarantees about its accuracy.

Thanks again for your patience during the past couple of weeks. While I appreciate the kind words and sentiments I should point out that this past weekend for me wasn't exactly restful time off. I was working at
my other job.

- Matt


marta
ID: 567457 · Report as offensive
zombie67 [MM]
Volunteer tester
Avatar

Send message
Joined: 22 Apr 04
Posts: 758
Credit: 27,495,887
RAC: 0
United States
Message 567463 - Posted: 14 May 2007, 23:12:22 UTC

Would it be possible to skip the regular Tuesday down-time for this week? At this rate, it will take several days before my pending upload/download backlog has cleared. Additional down time will not help.
Dublin, California
Team: SETI.USA
ID: 567463 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 567492 - Posted: 14 May 2007, 23:44:53 UTC
Last modified: 14 May 2007, 23:45:37 UTC

Say,,,,Is that Kryten and Koloth I see listed on Ebay???

LOL

Two) slightly used servers
ID: 567492 · Report as offensive
Profile dv5678

Send message
Joined: 9 Sep 01
Posts: 1
Credit: 24,807,522
RAC: 6,621
United States
Message 567505 - Posted: 15 May 2007, 0:01:02 UTC - in response to Message 567420.  

... I've just detached the project and will stay away for a few days ...


i don't think you want to detach -- if "Detach" is the opposite of "Attach to project", then you will lose your coveted low computer number.

better to click on "Suspend" if you want your computer to stop pestering the SETI@Home servers.

or just use "No new tasks", so that your computer at least tries to complete the download and execution of the tasks already assigned to it.
ID: 567505 · Report as offensive
Profile Stealth Eagle*
Volunteer tester
Avatar

Send message
Joined: 7 Sep 00
Posts: 5971
Credit: 367,640
RAC: 0
United States
Message 567511 - Posted: 15 May 2007, 0:11:11 UTC

Hi Matt, Thanks for the update, And all the hard work you guys are doing there. I know what it is like, being in the industry myself.

Keep up the good work, and keep us posted. It helps sooth the nerves of some of the newer crunchers.




What you do today you will have to live with tonight
ID: 567511 · Report as offensive
John-James-Connellan

Send message
Joined: 16 Jan 00
Posts: 4
Credit: 824,683
RAC: 0
Ireland
Message 567516 - Posted: 15 May 2007, 0:20:41 UTC - in response to Message 567511.  

upload/download server disabled may explain difficulties in uploading and downloading
ID: 567516 · Report as offensive
BigBrother

Send message
Joined: 27 Jul 99
Posts: 18
Credit: 3,451,359
RAC: 518
Sweden
Message 567518 - Posted: 15 May 2007, 0:23:08 UTC - in response to Message 567516.  

upload/download server disabled may explain difficulties in uploading and downloading


Naah, I don't think that's the problem. Even when Bruno indicated it was running there was no contact with it.

Johan
ID: 567518 · Report as offensive
Profile KO6QK

Send message
Joined: 9 May 99
Posts: 4
Credit: 4,645,888
RAC: 3,165
United States
Message 567520 - Posted: 15 May 2007, 0:27:59 UTC - in response to Message 567413.  

Way to go team!! Keep up the good Work!!
John
ID: 567520 · Report as offensive
Brad

Send message
Joined: 1 Aug 99
Posts: 5
Credit: 148,044
RAC: 0
United States
Message 567568 - Posted: 15 May 2007, 2:21:13 UTC

I just downloaded 5 seti work units. My pc just looked up at me and said "ohhh, i so happy".
ID: 567568 · Report as offensive
Profile MoonFire

Send message
Joined: 6 May 99
Posts: 9
Credit: 1,725,064
RAC: 0
United States
Message 567571 - Posted: 15 May 2007, 2:26:21 UTC - in response to Message 567413.  
Last modified: 15 May 2007, 2:39:23 UTC

A job well-done! On behalf of the many, thanks to the few of you!

++++++++++++++++++++++++++++++++++++++++++++++++++

Contribution to BOINC total credit 0.00069%
Accumulated more credit than % of all BOINC users 97.613%
Highest World position ever 23989 at 2007-05-05

Resident of United States
Position in Country stats 9,775
Contribution to own country total credit 0.00171%
Accumulated more credit than % of all fellow citizens 96.42371%
++++++++++++++++++++++++++++++++++++++++++++++++++++


ID: 567571 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Technical News : Gasping for Air (May 14 2007)


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.