Fiber channel woes, Chicken App, etc. (May 21 2007)

Message boards : Technical News : Fiber channel woes, Chicken App, etc. (May 21 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Profile Xaak

Send message
Joined: 22 May 99
Posts: 32
Credit: 22,636,357
RAC: 0
United States
Message 573390 - Posted: 22 May 2007, 0:54:14 UTC

I gotta laugh at this whole mess here.

It's pretty obvious that the lastest server version of BOINC should never have been installed and is broken.

How hard is it to recognize you need to roll back and get the problems with it fixed offline? Do you really need to be that stubborn as to keep a broken server system in production?

I've worked in IT for over 25 years, and almost every cardinal rule was broken by putting the latest server version into production. To repeat what I've said elsewhere:

1. You NEVER introduce another potential point of failure (new BOINC version) while other things are not completely stable (hardware outage, longest in your history).

2. You NEVER put anything into production without testing first and assessing the impact of the change on your installed user base.

Stop messing around and roll back the server version, then get the problems fixed offline.
XaaK


ID: 573390 · Report as offensive
Sixpack
Volunteer tester
Avatar

Send message
Joined: 12 Sep 99
Posts: 38
Credit: 182,096
RAC: 0
Canada
Message 573394 - Posted: 22 May 2007, 1:05:29 UTC - in response to Message 573390.  

I gotta laugh at this whole mess here.

It's pretty obvious that the lastest server version of BOINC should never have been installed and is broken.

How hard is it to recognize you need to roll back and get the problems with it fixed offline? Do you really need to be that stubborn as to keep a broken server system in production?

I've worked in IT for over 25 years, and almost every cardinal rule was broken by putting the latest server version into production. To repeat what I've said elsewhere:

1. You NEVER introduce another potential point of failure (new BOINC version) while other things are not completely stable (hardware outage, longest in your history).

2. You NEVER put anything into production without testing first and assessing the impact of the change on your installed user base.

Stop messing around and roll back the server version, then get the problems fixed offline.


If this was the case Microsoft would never release anything. lol

ID: 573394 · Report as offensive
Profile Xaak

Send message
Joined: 22 May 99
Posts: 32
Credit: 22,636,357
RAC: 0
United States
Message 573396 - Posted: 22 May 2007, 1:11:07 UTC - in response to Message 573394.  


If this was the case Microsoft would never release anything. lol


And you don't think companies test Microsoft releases before subjecting it to end users?

XaaK


ID: 573396 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 573399 - Posted: 22 May 2007, 1:17:26 UTC - in response to Message 573390.  
Last modified: 22 May 2007, 1:22:55 UTC

I gotta laugh at this whole mess here.

It's pretty obvious that the lastest server version of BOINC should never have been installed and is broken.

How hard is it to recognize you need to roll back and get the problems with it fixed offline? Do you really need to be that stubborn as to keep a broken server system in production?

I've worked in IT for over 25 years, and almost every cardinal rule was broken by putting the latest server version into production. To repeat what I've said elsewhere:

1. You NEVER introduce another potential point of failure (new BOINC version) while other things are not completely stable (hardware outage, longest in your history).

2. You NEVER put anything into production without testing first and assessing the impact of the change on your installed user base.

Stop messing around and roll back the server version, then get the problems fixed offline.

Obviously you don't understand a basic fact of life on these projects. There is no use trying to do only one thing at a time, there are too many things to do, too few people and too little time to stop and verify everything before moving on to the next thing.

Also, Seti IS the Boinc test bed. Everything is tried out here first, whether it be the forum or server software it gets tested here before being rolled out for use by other projects. If it's broken we find out and it gets fixed because of our feedback.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 573399 · Report as offensive
Profile QuietIce

Send message
Joined: 21 Jul 06
Posts: 5
Credit: 24,098,658
RAC: 0
United States
Message 573402 - Posted: 22 May 2007, 1:20:09 UTC
Last modified: 22 May 2007, 1:20:35 UTC

Thanks for the update, Eric - hopefully you guys can put your heads together and figure out the issue with the "Chicken" apps. I know it's not exactly "supported software" but a lot of us rely on it ... :)
ID: 573402 · Report as offensive
Rattledagger_v3

Send message
Joined: 1 Oct 05
Posts: 2
Credit: 138,603
RAC: 0
Message 573409 - Posted: 22 May 2007, 1:37:42 UTC - in response to Message 573390.  

2. You NEVER put anything into production without testing first and assessing the impact of the change on your installed user base.

Stop messing around and roll back the server version, then get the problems fixed offline.

...and how many days/weeks should Thumper have been tested before it was put into production...


ID: 573409 · Report as offensive
Joypipe

Send message
Joined: 29 Jun 99
Posts: 10
Credit: 1,337,470
RAC: 0
United States
Message 573414 - Posted: 22 May 2007, 1:48:04 UTC

I miss the old days where we could run our own proxy servers so we didn't have to tell each of our clients to get 10 days worth of work just to keep them running in a 2 day outage.

Seems like this project became someone's Master degree study on software development (they should fail) and has become less about the science. I want to crunch data and not deal with connection issues. Protein folding is sounding more and more interesting every day.

-Joy
ID: 573414 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 573416 - Posted: 22 May 2007, 1:51:19 UTC
Last modified: 22 May 2007, 1:52:18 UTC

Boinc has an alpha site for the early development and testing of the Boinc software. There aren't enough users there to test all the possible system configurations. The Seti Beta site is for the development of the Seti applications themselves. Considering what a rare confluence of event that have occurred in the past weeks, the current system has worked out well with the limited resources available.
ID: 573416 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 573419 - Posted: 22 May 2007, 2:03:01 UTC - in response to Message 573414.  

I miss the old days where we could run our own proxy servers so we didn't have to tell each of our clients to get 10 days worth of work just to keep them running in a 2 day outage.

Seems like this project became someone's Master degree study on software development (they should fail) and has become less about the science. I want to crunch data and not deal with connection issues. Protein folding is sounding more and more interesting every day.

-Joy

Along with that also came a big negatives. Useless data being returned as people cloned fast crunching WU and returned them 1000's of times just for the credits.

There may still be a proxy capability in the future. For now you might try the next version of the manager with it's additional caching capabilities.
ID: 573419 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 573440 - Posted: 22 May 2007, 2:32:25 UTC

I see a couple of things that make no sense... Matt has posted many times they are taking old stuff offline (SUN) and migrating to Nix... not to meniton issues with getting old Sun code to compile under NIX...

So while everyone is frustrated with the Server Issues, the Seti Staff has been very patient and listened to what people are saying and working as fast as they can... If anyones wants I have a "Dead Mule" that you can "rent" to "Flog" it to make it run faster (frustration)... Progress takes time, Seti Beta spent over a year getting you Seti Enhanced... MultiBeam and Astropulse (Beta and Alpha respectively) are in progress...

As Eric stated, he spent time on the phone with Blurf this morning (Thank You Blurf, for relaying information that is helping)... I spent over a half hour this evening talking to Eric about Seti Beta issues (including an Astropulse Result that Successfuly completed)... What You Say, is seen and relayed pointing to specific things... A few people can call and do when they see things. There are more people watching and helping than most realize... They spend a lot of time doing that!

I guess the bottom line is that a "Lot of People Care!" They are not on the Seti payroll, they do provide feedback that makes it easier for the Seti Team... "WE" are here to Find ET!

Please, as things start calming down lets find ET!



Please consider a Donation to the Seti Project.

ID: 573440 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 573446 - Posted: 22 May 2007, 2:41:03 UTC - in response to Message 573438.  

Head over to the BOINC forums perhaps and explain there. For me that would be a logical place for you to start, yet I do not see a single post there from you.


Xaak can be rather blunt which is a good thing and bad thing, but his message is clear. The fix can be implemented by SETI staff, but hasn't, and there has been no indication from SETI staff that they even understand the problem or that they intend to fix it. As posted elsewhere the code has been updated, but it has not been implemented so why should he post on the the BOINC forum?

A clear post from someone on staff would do a world of good here, especially if everyone would check their egos at the door. Perhaps that will be forthcoming tomorrow. For now, SETI is broken for a great number of it's volunteers with a problem introduced by new BOINC server code, installed untested during the last outage.

I'm not an IT person, nor do I pretend to be, but I do understand what is going on here.

Kinguni
Join Team Starfire
BOINC Chat

ID: 573446 · Report as offensive
Equus1046

Send message
Joined: 18 Apr 00
Posts: 1
Credit: 1,462,987
RAC: 0
United States
Message 573508 - Posted: 22 May 2007, 3:52:48 UTC

As someone who has been with the project for many years (i.e. before BOINC) might I just say to all of you who are running all kinds of different apps . . . "Get a life!" SETI@Home and BOINC are revolutionary applications. The first of their kind who have defined an entire industry. And don't forget that all of us are "volunteering" our time. So, from someone who has been running just the plain vanilla apps since day 1, take a deep breath, count back from 10, and everything will be all right. To quote from another post, these guys are doing yeoman's work on a much too small budget, and we should all be grateful for what is working, and stop whining about what isn't. So if your chicken app or gopher app or command line app isn't doing what it should, maybe, just maybe, you can join us plebians running the basic BOINC app. It works just fine, thank you.
ID: 573508 · Report as offensive
Paul Hodges

Send message
Joined: 8 Feb 07
Posts: 1
Credit: 96,157
RAC: 0
United States
Message 573524 - Posted: 22 May 2007, 4:26:09 UTC

All..

Mud slinging in these forums is really pointless.

XAAK -- Judging by your involvment in so many other projects, you know more than most that there are other projects to compile credit with. We each have the right to silently and non-apologetically reapportion our BOINC clients to those projects whose 'management styles' and 'scientific goals' better suit us. It's that simple. If mistakes were made and you are deeply and continuously dissapointed in the goals, management and science of SETI@Home, then withdraw your support. Vent with actions and not with passive aggressive posturing and veiled threats. If you value the goals of these projects then tolerence of mistakes or random failures or outdated hardware or planned vacations or miscalculted updates may at times be required to fullfill your longterm desires.

We ALL want SETI@Home to work flawlessly, we have to use the same faith and patients we have for the project's goals as we have for the NATURE of the PROJECT itself...the search for intellegent life somewhere else.

It's about the science--right?

To the Forum Moderator that previously removed this post. I will be following my own advice now by withdrawing my support from SETI@Home. Censorship is not always necessary and is evil by nature.
ID: 573524 · Report as offensive
Profile boosted
Volunteer tester

Send message
Joined: 25 Jan 04
Posts: 5
Credit: 75,849
RAC: 0
United States
Message 573526 - Posted: 22 May 2007, 4:41:26 UTC
Last modified: 22 May 2007, 4:42:40 UTC

well while I have crunched nearly 72K now, these constant outages have made me decide to stop and do more worth while projects.

I may leave a machine or two still with minimal seti time but it will no longer be a main function in my crunching. as was said before there are many more useful things that can be researched and the constant down time is not helping matters.

I have nothing but respect for the people that run this project, and know how hard it can be to do stuff like this.
ID: 573526 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 573527 - Posted: 22 May 2007, 4:43:34 UTC - in response to Message 573524.  

All..

Mud slinging in these forums is really pointless.

XAAK -- Judging by your involvment in so many other projects, you know more than most that there are other projects to compile credit with. We each have the right to silently and non-apologetically reapportion our BOINC clients to those projects whose 'management styles' and 'scientific goals' better suit us. It's that simple. If mistakes were made and you are deeply and continuously dissapointed in the goals, management and science of SETI@Home, then withdraw your support. Vent with actions and not with passive aggressive posturing and veiled threats. If you value the goals of these projects then tolerence of mistakes or random failures or outdated hardware or planned vacations or miscalculted updates may at times be required to fullfill your longterm desires.

We ALL want SETI@Home to work flawlessly, we have to use the same faith and patients we have for the project's goals as we have for the NATURE of the PROJECT itself...the search for intellegent life somewhere else.

It's about the science--right?

To the Forum Moderator that previously removed this post. I will be following my own advice now by withdrawing my support from SETI@Home. Censorship is not always necessary and is evil by nature.

I suppose 8 years is enough, and I should take a well deserved break. Things seem to be broken here for a while, and there seems to be an atmosphere that wants to inhibit any form of discussion or alternat ideas other than those being persued by the project leaders. This IMO is not the environment that produces good science. If this attitude is prevelant in one area, ie discussion of current events and procedures, then it will likely manifest in other areas as well. Censure in an university setting is for cowards.
When we finally figure it all out, all the rules will change and we can start all over again.
ID: 573527 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30649
Credit: 53,134,872
RAC: 32
United States
Message 573548 - Posted: 22 May 2007, 5:40:33 UTC

Any clues?

Mon May 21 21:19:52 2007|SETI@home|Message from server: Completed result 18mr05aa.11342.16960.53410.3.248_1 refused: result already reported as success
Mon May 21 21:19:52 2007|SETI@home|Message from server: Completed result 18mr05aa.11342.16960.53410.3.182_2 refused: result already reported as success


ID: 573548 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 573550 - Posted: 22 May 2007, 5:45:16 UTC - in response to Message 573548.  
Last modified: 22 May 2007, 5:45:44 UTC

Any clues?

Mon May 21 21:19:52 2007|SETI@home|Message from server: Completed result 18mr05aa.11342.16960.53410.3.248_1 refused: result already reported as success
Mon May 21 21:19:52 2007|SETI@home|Message from server: Completed result 18mr05aa.11342.16960.53410.3.182_2 refused: result already reported as success



Those are actually fine, you see it means the server got your result but your machine never got its acknowledgement so it tried again. They will be in your results reported as "Success" "Done". Ambiguous Error Message.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 573550 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 573574 - Posted: 22 May 2007, 6:25:33 UTC - in response to Message 573446.  


Xaak can be rather blunt which is a good thing and bad thing, but his message is clear. The fix can be implemented by SETI staff, but hasn't, and there has been no indication from SETI staff that they even understand the problem or that they intend to fix it. As posted elsewhere the code has been updated, but it has not been implemented so why should he post on the the BOINC forum?


One of the largest problems with testing BOINC releases is a matter of scale some problems just don't show up in a project a few thousand become much more apparent in a project of a few hundred thousand. The other problem is that BOINC releases are not designed to be incremental. An upgrade that fixes one bug often includes new ones elsewhere in the code. They are also not designed to be reversible. Database changes don't often go away quietly. At any rate, a code rollback wasn't going to work because it would negate the round-robin DNS scheme for our feeders and schedulers and we'd be back where we were on Friday, with most connection attempts failing. David checked in the final fix for that problem tonight, but I'm not going to change the server without getting in a few hours of sleep. My alarm clock is set for 5.5 hours from now. When I finish this message, I'm going to bed.

And with Matt gone, SETI's operations staff is essentially me and Jeff. Jeff has a real job, which means he doesn't work 24 hours a day. Lynn would also kill him if he tried. I'm a scientist, so I'm expected to work until I drop. After I drop I work in a reclining position. But I've got a proposal due on campus on Thursday, so I can't spend all my working hours watching the server logs. (I do, and have had two windows open on the feeder logs which I have been glancing at. Right now each system is handling about 10 results a second.)

Regarding censorship here. Please remember that most of the moderators are not university employees and they are human. Complain to the moderators list (setimods at ssl.berkeley.edu) or to me (korpela at ssl.berkeley.edu, warning: very aggressive spam filter) with a link to the posts in question and an explanation of what was meant. Under normal circumstances, moderation decisions can be overturned, or agreement can be reached about permissible language. Often times the problem can be including too much of a post which was deleted for a reason or withdrawn by the original poster with a request that quotes also be deleted.

Good night. 5h15 before the alarm goes off.

--

Eric
@SETIEric@qoto.org (Mastodon)

ID: 573574 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 573575 - Posted: 22 May 2007, 6:31:24 UTC - in response to Message 573574.  
Last modified: 22 May 2007, 6:37:00 UTC


Xaak can be rather blunt which is a good thing and bad thing, but his message is clear. The fix can be implemented by SETI staff, but hasn't, and there has been no indication from SETI staff that they even understand the problem or that they intend to fix it. As posted elsewhere the code has been updated, but it has not been implemented so why should he post on the the BOINC forum?


One of the largest problems with testing BOINC releases is a matter of scale some problems just don't show up in a project a few thousand become much more apparent in a project of a few hundred thousand. The other problem is that BOINC releases are not designed to be incremental. An upgrade that fixes one bug often includes new ones elsewhere in the code. They are also not designed to be reversible. Database changes don't often go away quietly. At any rate, a code rollback wasn't going to work because it would negate the round-robin DNS scheme for our feeders and schedulers and we'd be back where we were on Friday, with most connection attempts failing. David checked in the final fix for that problem tonight, but I'm not going to change the server without getting in a few hours of sleep. My alarm clock is set for 5.5 hours from now. When I finish this message, I'm going to bed.

And with Matt gone, SETI's operations staff is essentially me and Jeff. Jeff has a real job, which means he doesn't work 24 hours a day. Lynn would also kill him if he tried. I'm a scientist, so I'm expected to work until I drop. After I drop I work in a reclining position. But I've got a proposal due on campus on Thursday, so I can't spend all my working hours watching the server logs. (I do, and have had two windows open on the feeder logs which I have been glancing at. Right now each system is handling about 10 results a second.)

Regarding censorship here. Please remember that most of the moderators are not university employees and they are human. Complain to the moderators list (setimods at ssl.berkeley.edu) or to me (korpela at ssl.berkeley.edu, warning: very aggressive spam filter) with a link to the posts in question and an explanation of what was meant. Under normal circumstances, moderation decisions can be overturned, or agreement can be reached about permissible language. Often times the problem can be including too much of a post which was deleted for a reason or withdrawn by the original poster with a request that quotes also be deleted.

Good night. 5h15 before the alarm goes off.

--

Eric

Understood & Accepted
When we finally figure it all out, all the rules will change and we can start all over again.
ID: 573575 · Report as offensive
Profile Kinguni
Volunteer tester
Avatar

Send message
Joined: 15 Feb 00
Posts: 239
Credit: 9,043,007
RAC: 0
Canada
Message 573576 - Posted: 22 May 2007, 6:36:00 UTC

Thanks for the late post Eric, and we look forward to the "fix" on Tuesday. Much better you play with it after some sleep. Acknowledgement is always appreciated.

As a former moderator for a very, very large set of forums I can be over-critical of moderating and moderators at times, but certainly have no problem with what was edited out of this thread.

Cheers and goodnight,

Kinguni
Join Team Starfire
BOINC Chat

ID: 573576 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Technical News : Fiber channel woes, Chicken App, etc. (May 21 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.