Panic Mode On (48) Server problems?

Message boards : Number crunching : Panic Mode On (48) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1119693 - Posted: 21 Jun 2011, 13:45:37 UTC
Last modified: 21 Jun 2011, 14:17:50 UTC

If you mean the latest BOINC version, be advised it has greatly increased retry and backoff escalations and maked uploads & downloads more trying. That's what made me go back to 6.10.xx - I'm not recommending that others revert-- I had another problem -- but I'd hesitate to upgrade when you're having file transfer problems.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1119693 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1119767 - Posted: 21 Jun 2011, 15:36:42 UTC

I get many HTTP errors now, while if connection succeeds - download speed quite good.
It's unusual, on saturated network bandwidth I usually see low download speeds if connections dropped so often.
Smth wrong with server setup now ?
ID: 1119767 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1119775 - Posted: 21 Jun 2011, 15:57:01 UTC - in response to Message 1119767.  

My up and downloads seem to be going through okay. Haven't been watching to closely but not much problem. Maybe they will get everything organized during the weekly outage coming soon and we can enjoy some really good uptime.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1119775 · Report as offensive
justsomeguy

Send message
Joined: 27 May 99
Posts: 84
Credit: 6,084,595
RAC: 11
United States
Message 1119782 - Posted: 21 Jun 2011, 16:15:11 UTC

Okay, update on my end...three linux boxes uploading and downloading just fine
from work...crappy connection but working. two windows boxes - xp and 7, not
uploading at all from home, fast connection. Upgraded the win7 box to the
latest boinc still no go.
"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

ID: 1119782 · Report as offensive
Morten Ross
Volunteer tester
Avatar

Send message
Joined: 30 Apr 01
Posts: 183
Credit: 385,664,915
RAC: 0
Norway
Message 1119829 - Posted: 21 Jun 2011, 22:43:34 UTC - in response to Message 1119782.  

Outage over, so let's introduce a new one:

22/06/2011 00:33:44 SETI@home Message from server: Resent lost task 08ap11ac.12708.16836.16.10.130_0
22/06/2011 00:33:44 SETI@home [error] Already have task 08ap11ac.12708.16836.16.10.130_0

What's next :-) ?

Morten Ross
ID: 1119829 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1119888 - Posted: 22 Jun 2011, 4:17:44 UTC

Yesterday (Monday) I hammered my two uploads through. I had let them try on their own for nearly 24 hours and after 14 minutes of elapsed time trying to upload, hammering the retry button a few times seemed to work.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1119888 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13371
Credit: 208,696,464
RAC: 304
Australia
Message 1119898 - Posted: 22 Jun 2011, 5:08:17 UTC - in response to Message 1119888.  


Whatever they did during the outage seems to have sorted the upload problem. After the outage there was a big surge in uploads, then it tapered off to normal levels, instead of gradually rising & falling & meandering about while the uploads kept timing out.
Grant
Darwin NT
ID: 1119898 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51464
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1119917 - Posted: 22 Jun 2011, 7:43:17 UTC

Oh, well.
Here we go again.
The server status page quit updating about 45 minutes ago, and uploads get no connect errors.
Time for the kittyman to go to bed anyway.

Meowsigh.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1119917 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13371
Credit: 208,696,464
RAC: 304
Australia
Message 1119918 - Posted: 22 Jun 2011, 7:53:58 UTC - in response to Message 1119917.  
Last modified: 22 Jun 2011, 7:54:34 UTC

What is it about 24:00hrs Berkeley time?
For a while there, uploads & downloads dropped to nothing, and the Seti site was unreachable. Downloads & this site are back up, but uploads once again are no go.
Grant
Darwin NT
ID: 1119918 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 21018
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1119923 - Posted: 22 Jun 2011, 8:24:44 UTC

Its the midnight blues. Which I could sing, but you know what happened in NZ a few weeks ago, minutes after my last rendition......

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1119923 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1119931 - Posted: 22 Jun 2011, 9:13:50 UTC

UL don't work.


Ohh.. - what.. - what's with our loved project??

Every ~ 2 days server outage..

It's a hard-/ or software problem in the server lab?

It's again time for new equipment?


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1119931 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33449
Credit: 79,922,639
RAC: 80
Germany
Message 1119945 - Posted: 22 Jun 2011, 9:59:36 UTC
Last modified: 22 Jun 2011, 10:00:07 UTC

Yes, something happens.
Uploads are slow even there´s no heavy load on Bruno.
Nothing since 7 AM UTC.

Monthly cricket graph doesn´t look well.

But i´m sure the staff is doing their best to resolve it.
With each crime and every kindness we birth our future.
ID: 1119945 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13371
Credit: 208,696,464
RAC: 304
Australia
Message 1119952 - Posted: 22 Jun 2011, 10:23:09 UTC - in response to Message 1119935.  

Confirmed no uploads from UK, but cricket graph looks OK.

Cricket graph looks broken to me.
At 24:00hrs Berkeley time everything died briefly. Downloads & this site came back up, but uploads didn't. The upload traffic you see in the graphs at the moment is just the acknowlegements from the download traffic & Scheduler requests/ reporting. No completed results are getting through.
Grant
Darwin NT
ID: 1119952 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 64942
Credit: 55,293,173
RAC: 49
United States
Message 1119979 - Posted: 22 Jun 2011, 13:24:42 UTC

I think We did a DDOS attack on Seti, by accident.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1119979 · Report as offensive
justsomeguy

Send message
Joined: 27 May 99
Posts: 84
Credit: 6,084,595
RAC: 11
United States
Message 1119997 - Posted: 22 Jun 2011, 13:48:42 UTC - in response to Message 1119991.  
Last modified: 22 Jun 2011, 13:49:15 UTC

I think We did a DDOS attack on Seti, by accident.


Not by accident, by design, (of the new x38g, and ATI crunching)

LOL

Edit: I think we've reached roads end so to speak, when it comes to the current server setup. Crunching computers are getting faster and faster, and now with x38g, as well as thousands of computers being able to run SETI on their ATI cards, we're literally killing the current server setup.



Hmmm....Makes me think that if the splitters were reworked to make bigger chunks that would take a little longer to process, this might alleviate a little of the network traffic as well. Fewer WU to report means fewer connections and less traffic. Should be a quick and easy "fix" for at least part of the issue. it would mean slightly larger/longer transmits but much fewer of them.

Fire reTARDant suit is neatly pressed and zipped up - feel free :)

Kevin - The Pirate robotic Super (Sq)uirrel :)
"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

ID: 1119997 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1120033 - Posted: 22 Jun 2011, 15:40:22 UTC - in response to Message 1120003.  
Last modified: 22 Jun 2011, 15:41:35 UTC

And yes, maybe it is time to leave the oldest computers back in the dust, by increasing the analyzis time even further, and not extending the deadline.


For the first, I don't think that old computers returning maybe 1 WU per week are causing the current problems.

Than I also don't hope, that the project will increase the analysis time just for to keep our CPUs and GPUs warm. You always reach a point, where further increase of accuracy becomes pointless for the science (unlike implementing of new search methods like the correlation thing in v7). I don't know i SETI has already reached this point, but if it is so, than better slow down the splitters and send out less work, anything else would be just waste of resources which can be used by other projects. So, yes, increase, but only if it actually help us to find ET.
ID: 1120033 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51464
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1120040 - Posted: 22 Jun 2011, 16:01:24 UTC

Aw, come on now.
It was just that dang midnight cleaning crew plugging their vac into the wrong outlet again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1120040 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1120051 - Posted: 22 Jun 2011, 16:26:36 UTC - in response to Message 1120047.  




By the looks of things, we will never again get back to the situation we had before, when the cricket graph would fall to sustainable levels 1-2 days after the weekly outage. As of now, it's maxed all the time the servers can sustain the beating, and then that follows time after time by some server failure, RAID failure, or other issues.



I'm not so sure this is the case. What I have seen lately is an unusual amount of 'shorties'. When the ratio of 'shorties' drops back to levels I noticed before a lot of the problems will be gone. On my GPU I can do at least 4 shorties in the same time it takes to process a single 'long' workunit. Now there will always be some shorties so you can't expect the number of workunits processed (needing both bandwidth and transaction resources for processing) to go down be a factor four, but I think a factor two should be possible. Maybe the servers and the internet connection can keep up again when this happens ...

Tom
ID: 1120051 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1120058 - Posted: 22 Jun 2011, 16:52:30 UTC - in response to Message 1120047.  

Well, of course there has to be some gain in the science too, we can't increase crunch times just for the sake of slowing things down. If we can't do that, then maybe other ways can be found to ease the burden on the servers. Maybe as you say send out less work, or maybe run the system 100% full bandwidth for 2 hours, and then shut down all outside contact (perhaps allow only uploads) for 1 hour, so the servers can catch up with the work they received during the 2 hours they were working 100%, and then repeat that cycle....

According to how the system was performing until... not so long time ago... I think slowing down the feeder just a bit would be enough.

I also don't think the recent "crashes" are caused by hardware failures as those usually happen at any time and not preferably around 7:00 UTC or whenever the other 1 or 2 sheduled crashes used to happen.
ID: 1120058 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51464
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1120066 - Posted: 22 Jun 2011, 17:07:17 UTC - in response to Message 1120063.  
Last modified: 22 Jun 2011, 17:11:13 UTC


I also don't think the recent "crashes" are caused by hardware failures as those usually happen at any time and not preferably around 7:00 UTC or whenever the other 1 or 2 sheduled crashes used to happen.


Well, then they'd better try to find out what is happening in the lab at 7:00 UTC. I agree with you that is is indeed very strange that many of the "crashes" recently have all occured at more or less exactly 7:00 UTC, and that is not normal for any hardware problem, to always happen at the same time.

I think the last time they were having a problem with 'regularly scheduled' crashes it was tracked down to a UPS with failing batteries...
Might there be another one in the loop somewhere?

EDIT...
It might also be a good idea to borrow a power quality monitor (I'd be surprised if there was not one available somewhere on campus) and monitor the incoming power for any unusual spikes, sags, etc. that could be causing the grief.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1120066 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (48) Server problems?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.