Monolith (Jun 14 2011)


log in

Advanced search

Message boards : Technical News : Monolith (Jun 14 2011)

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author Message
Profile Jeff Mercer
Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1118145 - Posted: 17 Jun 2011, 1:34:45 UTC

OK... just noticed things wern't uploading. Wasn't sure if it was me or a problem at the lab. Just plain forgot about the weekly outage. No problem... just concerned ! :)

Profile Donald L. Johnson
Avatar
Send message
Joined: 5 Aug 02
Posts: 6011
Credit: 634,195
RAC: 1,008
United States
Message 1118189 - Posted: 17 Jun 2011, 6:31:45 UTC - in response to Message 1118092.
Last modified: 17 Jun 2011, 7:29:41 UTC

Not just you(mine aren't going either).....just checked, and the upload server has been labeled as "Disabled". SETI@Home is in the midst of the weekly backup/outage schedule, so I wouldn't be surprised if nothing goes back until sometime Friday.

No, the weekly outage ended Tuesday afternoon, Berkeley time.
See Matt's message, #1 in this thread.

We are having problems with some of the servers. See the Cricket Graphs.
Bookmark that link, and check it when you have trouble communicating with the Seti@Home servers.

When the Green (downloads) is above 90 MB and/or the Blue (uploads & Scheduler Requests) is about 40 Mb, the data link is maxed out and you will have trouble connecting. This is the normal case for at least the first full day after an outage.

When either the Green or Blue is near the bottom of the graph (as the blue line is now), we are having server problems.

Hope that is useful information.

Around here lately, patience is not just a virtue, it is a requirement.
____________
Donald
Infernal Optimist / Submariner, retired

Profile Jeff Mercer
Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1118263 - Posted: 17 Jun 2011, 14:07:56 UTC - in response to Message 1118189.

Thanks for the information. I read that the servers were throwing fits, but thought that they had most of the problem figured out. I'm going to have to spend a little more time reading the news and updates. As to patience, it's not a problem with me !! ;) At my age, you just CAN'T hurry anymore ! HA HA HA ! Anyway, thanks for the information !!

Acrklor
Volunteer tester
Avatar
Send message
Joined: 22 Oct 01
Posts: 14
Credit: 639,144
RAC: 0
Austria
Message 1118271 - Posted: 17 Jun 2011, 14:37:41 UTC - in response to Message 1117406.

Anybody have any theories about what is causing the ridiculously consistent heavy load?

Free to speculate? That may not be a good idea to say to me :P

The obvious theories:
- seems to me there are more and more Lunatics-enabled 'Anonymous platform' crunchers which would also lead to less crunch time and more load (however, I have no idea if they reached a percentage where this would have an impact)
- a configuration change (from a few weeks ago) coming back to haunt you (never happend to me of course :P)...even more cache can have negativ impact when nothings hit
- timeouts/events/semaphores... meaning the software part is often waiting on something (which wouldn't cause CPU/IO load in particular) because of a completely other issue

The creative but unlikely theories:
- outage of something cooling related, but to small to trigger an alarm (for example a single non-critical fan) -> higher temperature -> system tries to countermeasure by reducing CPU cycles
- installed BBU (Battery Backup Unit) went awry causing the RAID controller to disable Write Cache
- malfunction on network equipment causing a lot of lost packets -> higher latency

The fabricated theories:
- BIOS went nuts and changed the CPU/RAM multiplier on power cycle (<- no kidding, it happens...also I havn't seen it with server boards)
- I assume TCP/IP checksum offload is enabled. However, I read a few times that there are configurations out there (with a lot small packets) where disabling improves throughput (which would suggest the CPU is faster than the network interface itself for this particular setup). I know, I know, it's unlikely at best, but who knows and since we're speculating... ;)


I've got my fair share of experience with server systems, but of course still no clue how it looks backstage at seti@home/boinc, which means: just speculating like crazy. ^^
____________
"Judging people you don't know for things you don't understand is just really stupid." - Ellen Page

Profile S@NL - XP_Freak
Send message
Joined: 10 Jul 99
Posts: 99
Credit: 4,674,514
RAC: 2,114
Netherlands
Message 1118297 - Posted: 17 Jun 2011, 16:01:26 UTC

@Acrklor:
You can add the very high numer of short time wu's to the obvious theories.
____________

Goodbye Seti Classic

pdelgado
Send message
Joined: 2 Jun 99
Posts: 58
Credit: 15,601,223
RAC: 6,286
United States
Message 1118346 - Posted: 17 Jun 2011, 18:36:33 UTC

If there is a way, might it be productive to limit the number of concurrent connections? Since the pipe can only handle so much bandwidth, limiting the number of clients that can connect at any one time to something the pipe can reasonably handle should improve efficiency. If the excess clients get immediately rejected and go into a timeout delay, it should reduce the thrashing on the servers and eliminate a bunch of errors/lost packets/re-sends.

Don't know if it would make a noticeable difference, just a thought.

Invisible Man
Send message
Joined: 24 Jun 01
Posts: 22
Credit: 1,129,336
RAC: 0
United Kingdom
Message 1118402 - Posted: 17 Jun 2011, 21:22:42 UTC

I apologize but expect things to get worse as the music career will temporary consume me. You may see rather significant periods of silence from me for the next... I dunno... 6 to 12 months? I'm sure the others will chime in as needed if I'm not around.

Let’s face it folks, the writing is on the wall. It appears to me that Matt wants out, to further his music career. His last sentence is very telling; who exactly will “chime in”? Remember the vast number of Threads to Technical News only shows 12 pages, from back in Feb 7th 2007. How many years before that? Most have been written by Matt.

Even if his music career is of a temporary nature, we will surely miss him & his up to date writings.

What ever you do Matt, we will all wish you the Very Best for the Future.

“A man must do, what a man has to do”

P.S. I really hope I am wrong!
____________

Invisible Man
Send message
Joined: 24 Jun 01
Posts: 22
Credit: 1,129,336
RAC: 0
United Kingdom
Message 1118405 - Posted: 17 Jun 2011, 21:27:06 UTC

Sorry. The first two lines of my previous message should have been in quotes.
____________

David J. Moritz
Send message
Joined: 15 Aug 99
Posts: 21
Credit: 1,621,590
RAC: 949
United States
Message 1119014 - Posted: 19 Jun 2011, 14:57:20 UTC

This is supposed to be the technical news forum, unfortunately the SETI staff never posts any news. It seems it is up to the users to update the status of the system. Once again the upload server is not functioning (cricket Graph) and the server status on the site shows it as UP. Further the server status page shows that the work units recieved has not updated for 56 hours.

A little bit of information posted on the site would allow volunteers to understand the status of the servers and the system. The staff needs to remember that people are supporting SETI.

____________
David Moritz

Profile John Clark
Volunteer tester
Avatar
Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 1119017 - Posted: 19 Jun 2011, 15:09:07 UTC

Wrong David J Moritz

Who posted the first post of this thread, and the admin/project scientists post as and when relevant or the need arises.
____________
It's good to be back amongst friends and colleagues



OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,381,583
RAC: 15,499
United States
Message 1119046 - Posted: 19 Jun 2011, 15:57:00 UTC - in response to Message 1119014.

The staff needs to remember that people are supporting SETI.


I'll give Eric a call right away. I'm sure he'll be delighted to hear that there are in fact intelligent lifeforms installing BOINC and crunching data and not the robotic pirate super squirrels he previously theorized.

... and the next time I see Matt, Jeff or Eric outside the lab actually living their lives, I'll be sure to lure them back into the lab and lock the doors. A trail of candy and a few pieces conveniently placed under a cardboard box held up at one end by a stick and triggered by a string are usually enough to catch all three of those Project Admins.

... and I'll be sure to train all three of them to give obvious status updates every 10 minutes, including detailed bathroom breaks. They will be appropriately rewarded for good behavior and properly shocked with a jolt of electricity for not following orders.

I might even put up a web cam and start a pay-per-view service.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3334
Credit: 19,046,776
RAC: 20,247
Sweden
Message 1119055 - Posted: 19 Jun 2011, 16:07:29 UTC - in response to Message 1119046.

The staff needs to remember that people are supporting SETI.


I'll give Eric a call right away. I'm sure he'll be delighted to hear that there are in fact intelligent lifeforms installing BOINC and crunching data and not the robotic pirate super squirrels he previously theorized.

... and the next time I see Matt, Jeff or Eric outside the lab actually living their lives, I'll be sure to lure them back into the lab and lock the doors. A trail of candy and a few pieces conveniently placed under a cardboard box held up at one end by a stick and triggered by a string are usually enough to catch all three of those Project Admins.

... and I'll be sure to train all three of them to give obvious status updates every 10 minutes, including detailed bathroom breaks. They will be appropriately rewarded for good behavior and properly shocked with a jolt of electricity for not following orders.

I might even put up a web cam and start a pay-per-view service.


Don't you think that's a bit too subtle to sink in with the "whining crew"?

LOL
____________

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1899
Credit: 9,191,220
RAC: 12,995
United States
Message 1119063 - Posted: 19 Jun 2011, 16:12:14 UTC

Thing(s) for the staff to look at when they get in Monday:

on the "Server Status" page:

"Current Result creation rate", "Result Turnaround Time", "Results Received in the last hour" and "Transitioner backlog"

all show as "AS OF" 53 hours ago (currently...), suggesting that the same software bug is affecting all those stats (and that whatever broke, broke on Friday around 4 AM Berkeley time)...
____________
.

Profile AllenIN
Send message
Joined: 5 Dec 00
Posts: 159
Credit: 12,580,875
RAC: 14,166
United States
Message 1119065 - Posted: 19 Jun 2011, 16:12:43 UTC - in response to Message 1119046.

Great sarcasm Oz, but it does seem to me and I'm sure many other supporters, that this project has more problems with their systems that most other projects.

Why do you think that is, especially since they have had probably the most time as a project?

Allen
____________

Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2463
Credit: 85,118,346
RAC: 14,126
United States
Message 1119069 - Posted: 19 Jun 2011, 16:25:37 UTC - in response to Message 1119065.
Last modified: 19 Jun 2011, 16:26:11 UTC


Why do you think that is, especially since they have had probably the most time as a project?

Allen


Perhaps the sheer number of users and CPU's and GPU's. All of which are far above any other project.
____________
Boinc....Boinc....Boinc....Boinc....

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 14,919,253
RAC: 12,194
United States
Message 1119070 - Posted: 19 Jun 2011, 16:30:45 UTC - in response to Message 1119046.

I might even put up a web cam and start a pay-per-view service.




Oh boy, a webcam, where do I subscribe???
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2463
Credit: 85,118,346
RAC: 14,126
United States
Message 1119074 - Posted: 19 Jun 2011, 16:41:18 UTC - in response to Message 1119046.


I might even put up a web cam and start a pay-per-view service.


Would that include a new fibre cable to the SSL? If so where do I sign up?
____________
Boinc....Boinc....Boinc....Boinc....

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4203
Credit: 1,030,492
RAC: 268
United States
Message 1119089 - Posted: 19 Jun 2011, 17:01:29 UTC - in response to Message 1119069.


Why do you think that is, especially since they have had probably the most time as a project?

Allen

Perhaps the sheer number of users and CPU's and GPU's. All of which are far above any other project.

And that "most time as a project" combined with severely limited funds ensures that much of the hardware is just barely able to handle the load. A commercial data center trying to do what this project is doing would dedicate much more than three racks of equipment in a repurposed closet.
Joe

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,381,583
RAC: 15,499
United States
Message 1119099 - Posted: 19 Jun 2011, 17:30:40 UTC - in response to Message 1119065.
Last modified: 19 Jun 2011, 17:55:57 UTC

Great sarcasm Oz, but it does seem to me and I'm sure many other supporters, that this project has more problems with their systems that most other projects.

Why do you think that is, especially since they have had probably the most time as a project?

Allen


My other response was too wordy, so I have edited to this response.


SETI@Home has over 500,000+ user accounts the last time I checked. Most these users have more than one system, and a good portion have a farm running.

On an intranet, to avoid so much network activity, you would divide the users up into subnets and you'd have powerful enough servers to handle the load.

But since this is the internet running off of a WAN connection, go look at some of the largest internet sites that service 1,000,000+ connections like SETI@Home does. Look at Amazon.com and the amount of transactions they process. Look at Youtube.com and the compressed video they serve to their viewers. Go look at their business models on how they earn money to afford the large datacenters they have and the staff required to keep things running smoothly and without a glitch.

Then go look at what SETI@Home is running. Look at SETI@Home's business model and how they earn their income to afford their "datacenter" servicing over 500,000 users, and the whole 5 part-time staff members employed to keep us happy.

Then you tell me why SETI@Home is running as "poorly" as some users seem to think it is and how they can improve it, and you can explain to me why longevity has anything at all to do with reality in an ever-changing world.

Profile AllenIN
Send message
Joined: 5 Dec 00
Posts: 159
Credit: 12,580,875
RAC: 14,166
United States
Message 1119227 - Posted: 20 Jun 2011, 1:52:57 UTC - in response to Message 1119099.


Thanks for taking the time to put some of this in the right perspective for me.

I never quite thought of it as you put it, many users with many, many machines certainly does make for a lot of connections. I don't know that it would be quite as massive as Amazon, but it certainly massive on any scale.

However, since you had a very good answer for me this time, might I ask why there is so much more trouble at this point in time and not so much say 7 years ago....before Boinc? I really don't remember having as much downtime back then, but I could have just forgotten the good old days.

Thanks, Allen
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Technical News : Monolith (Jun 14 2011)

Copyright © 2014 University of California