Rudy and Spider (Feb 28 2008)

Message boards : Technical News : Rudy and Spider (Feb 28 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 719610 - Posted: 28 Feb 2008, 21:25:13 UTC

Fully recovered from the long outages earlier this week. I also employed more assimilators (and even more just now) to try to capitalize on periods of low I/O to help catch up on the big assimilator queue backlog. Seems to be working, sort of. We also changed the mount flags on the database volume to include "noatime" - we'll see if this actually makes a difference in performance.

Jeff and I are still getting beyond the router config. One of our roadblocks was using cables that were gigabit capable mixed with ones that were not (once again it's cheap parts causing the headache). We might actually be ready to go except we have to upgrade the super-long cable going from our closet to the main lab server closet, which is inaccessible to us. Waiting on the appropriate parties to handle that.

Regarding hardware/software RAID: We tend to shy away from hardware RAID as we've had many nightmares in the past regarding configuration and implementation. Namely, it takes forever to figure it out, and then drives fail spuriously and/or silently. The software RAID hit isn't enough to make us consider going hardware on our current systems any time soon.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 719610 · Report as offensive
telluric

Send message
Joined: 12 Feb 06
Posts: 9
Credit: 102,871
RAC: 0
United States
Message 719672 - Posted: 29 Feb 2008, 1:21:36 UTC - in response to Message 719610.  

The catchup that you refer to, does it have to do with bringing the database statistics and thus the related website statistics up to date? [i was here 2 yrs ago only briefly, so am new at this whole process]


Fully recovered from the long outages earlier this week. I also employed more assimilators (and even more just now) to try to capitalize on periods of low I/O to help catch up on the big assimilator queue backlog. Seems to be working, sort of. We also changed the mount flags on the database volume to include "noatime" - we'll see if this actually makes a difference in performance.

Jeff and I are still getting beyond the router config. One of our roadblocks was using cables that were gigabit capable mixed with ones that were not (once again it's cheap parts causing the headache). We might actually be ready to go except we have to upgrade the super-long cable going from our closet to the main lab server closet, which is inaccessible to us. Waiting on the appropriate parties to handle that.

Regarding hardware/software RAID: We tend to shy away from hardware RAID as we've had many nightmares in the past regarding configuration and implementation. Namely, it takes forever to figure it out, and then drives fail spuriously and/or silently. The software RAID hit isn't enough to make us consider going hardware on our current systems any time soon.

- Matt


ID: 719672 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 719681 - Posted: 29 Feb 2008, 1:44:05 UTC - in response to Message 719672.  

The catchup that you refer to, does it have to do with bringing the database statistics and thus the related website statistics up to date?

Nope.
Check out the previous Tech News posts for the nitty gritty.

Grant
Darwin NT
ID: 719681 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 719874 - Posted: 29 Feb 2008, 14:27:10 UTC


Great work being done by each of you @ Berkeley - Keep it up all . . .

Thanks for the Post Matt . . . iT's always Appreciated Sir!!!


BOINC Wiki . . .

Science Status Page . . .
ID: 719874 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 719875 - Posted: 29 Feb 2008, 14:34:58 UTC

Umm, whatever else is going on, at this writing (0630 PST) uploads appear to be on the fritz... have tried with 4 different WU's on two different computers (different client builds on each) and no joy... the upload never starts.
.

Hello, from Albany, CA!...
ID: 719875 · Report as offensive
William Roeder
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 69
Credit: 523,414
RAC: 0
United States
Message 719915 - Posted: 29 Feb 2008, 16:10:14 UTC - in response to Message 719875.  

Umm, whatever else is going on, at this writing (0630 PST) uploads appear to be on the fritz... have tried with 4 different WU's on two different computers (different client builds on each) and no joy... the upload never starts.


Me too. System Connect
ID: 719915 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 719940 - Posted: 29 Feb 2008, 16:59:07 UTC

Just went for me when I pushed retry.

ID: 719940 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 719941 - Posted: 29 Feb 2008, 16:59:16 UTC

Back in business again...

F.
ID: 719941 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 719958 - Posted: 29 Feb 2008, 17:25:04 UTC

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.
ID: 719958 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 719985 - Posted: 29 Feb 2008, 18:46:07 UTC - in response to Message 719958.  

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning

ID: 719985 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 720883 - Posted: 2 Mar 2008, 16:40:27 UTC - in response to Message 719985.  

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning



The don't fix everything, every morning, I've got over 1000 more pending credit than usual today.
ID: 720883 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 720898 - Posted: 2 Mar 2008, 17:22:48 UTC - in response to Message 720883.  

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning



The don't fix everything, every morning, I've got over 1000 more pending credit than usual today.

C'mon... It is weekend and they gotta have a life as well!!

F.
ID: 720898 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 721148 - Posted: 3 Mar 2008, 2:10:59 UTC - in response to Message 720898.  

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning



The don't fix everything, every morning, I've got over 1000 more pending credit than usual today.

C'mon... It is weekend and they gotta have a life as well!!

F.


What? You don't check your machines daily to see that they are working properly? Are we more devoted to the project than they are? I can't buy that.

ID: 721148 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 721177 - Posted: 3 Mar 2008, 3:47:36 UTC - in response to Message 721148.  
Last modified: 3 Mar 2008, 3:49:22 UTC

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning



The don't fix everything, every morning, I've got over 1000 more pending credit than usual today.

C'mon... It is weekend and they gotta have a life as well!!

F.


What? You don't check your machines daily to see that they are working properly? Are we more devoted to the project than they are? I can't buy that.


They may check in on them, but most of them have lives outside of SETI. It's not a matter of devotion, it's a matter of having other things/obligations to do.

For instance, Matt is also part of a band. If he's got a gig on stage and he happens to get a text message via his cell from the server (assuming he could even hear the ring or vibration) saying there's a problem, do you expect him to ditch his band to come in and fix the servers? These people are not sitting around at home with their families doing nothing but waiting for servers to go down.

If one of the guys happens to be free and notice a problem, they come in if they can. If they can't, then there's nothing that can be done about it. This is the reason why BOINC is designed to hide server downtime through caching or utilizing other projects. 99.99% uptime is just not feasible and their manpower is very limited.
ID: 721177 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 721413 - Posted: 3 Mar 2008, 13:36:52 UTC

All of this is true, but it certainly ruins the fun if the performance data is incorrect. I'm referring to scarecrow's graphs. At least, the statistics reporting scripts should automatically be disabled when the servers are belly-up.

Just like I tell my divorce lawyer, I hate being lied to.
ID: 721413 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 721420 - Posted: 3 Mar 2008, 13:49:28 UTC
Last modified: 3 Mar 2008, 13:52:13 UTC

Matt, the validators, even though the status page shows them as "running", are down, hung, in infinite loops, or otherwise not doing their jobs; and have been this way for at least 24 hours...

[added] My pending, normally in the 1700-2900 range is now at 3700 and growing! [/add]
.

Hello, from Albany, CA!...
ID: 721420 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 721424 - Posted: 3 Mar 2008, 13:59:34 UTC - in response to Message 719985.  
Last modified: 3 Mar 2008, 14:00:10 UTC

they probably have that server now connected to the light switch; when someone came to work today, the lights went on and we all started to 'communicate' with the seti-borg.


Good things:
1) BOINC tolerates some downtime with the project so no work is lost;
2) SETI team fixes everything every morning



make that: "every weekday morning" ;-)
.

Hello, from Albany, CA!...
ID: 721424 · Report as offensive
Profile muddocktor

Send message
Joined: 2 Aug 06
Posts: 12
Credit: 28,074,814
RAC: 0
United States
Message 721481 - Posted: 3 Mar 2008, 15:32:33 UTC - in response to Message 721420.  

Matt, the validators, even though the status page shows them as "running", are down, hung, in infinite loops, or otherwise not doing their jobs; and have been this way for at least 24 hours...

[added] My pending, normally in the 1700-2900 range is now at 3700 and growing! [/add]


Yeah, I was just coming here to see if there was an update on this myself. I saw my pending results balloon up from around 20k to around 55k credits over the weekend, with the majority of the ballooning happening since yesterday.
ID: 721481 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 721482 - Posted: 3 Mar 2008, 15:38:31 UTC

My RAC is not moving at all, it has been at the same level for the last 2 days from what I have seen.

My pending just shot through 10k when it averages around 5k or so.

ID: 721482 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 721623 - Posted: 3 Mar 2008, 19:09:31 UTC

I noticed that my RAC nosedived about ten percent in the last 24 hours. I looked at my pendings and saw the level at over 9000, about 50% more than normal. Still I'm not alarmed. This will fix itself just like the many "Ready To Reports" that disappeared during the same time.
ID: 721623 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Rudy and Spider (Feb 28 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.