Panic Mode On (82) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (82) Server Problems?

1 · 2 · 3 · 4 . . . 24 · Next
Author Message
Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3590
Credit: 47,338,134
RAC: 397
United States
Message 1340632 - Posted: 25 Feb 2013, 4:45:44 UTC

Now that we are mostly recovered from the weekend outage, lets start a new thread.
____________

Profile Floyd
Avatar
Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1340640 - Posted: 25 Feb 2013, 5:51:08 UTC

Sounds good to me .
are the validators having problems , or just struggling with the mass of sudden Wu's being thrown at it after the 2 day outage ?
____________

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,135,417
RAC: 4,306
United States
Message 1340665 - Posted: 25 Feb 2013, 9:52:44 UTC

are the validators having problems , or just struggling with the mass of sudden Wu's being thrown at it after the 2 day outage ?

Maybe. I checked my first 20 pendings and found 4 where both results are in but validation has not happened. Here's one:

http://setiathome.berkeley.edu/workunit.php?wuid=1175653331
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,135,417
RAC: 4,306
United States
Message 1340669 - Posted: 25 Feb 2013, 10:43:50 UTC

are the validators having problems , or just struggling with the mass of sudden Wu's being thrown at it after the 2 day outage ?

Maybe. I checked my first 20 pendings and found 4 where both results are in but validation has not happened. Here's one:

Disregard, those workunits have cleared now. Probably just another nap.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile TRuEQ & TuVaLu
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 440
Credit: 17,727,604
RAC: 401
Sweden
Message 1340729 - Posted: 25 Feb 2013, 16:00:27 UTC

I have 5 WU's in transfer since yesterday.

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8525
Credit: 31,092,836
RAC: 53,945
United States
Message 1340742 - Posted: 25 Feb 2013, 16:44:40 UTC
Last modified: 25 Feb 2013, 16:45:18 UTC

I just looked at my computers. O MY. one has 138 time outs. The other one wont download anything. I see em but they just sit there blinking at me stuck at various % of downloadedness:)
____________

Old James

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8124
Credit: 52,399,831
RAC: 77,991
United Kingdom
Message 1340766 - Posted: 25 Feb 2013, 17:21:23 UTC

The validators will be struggling with the number of reported tasks which will be far worse than a normal weekly outage.
Downloads have really taken a hit, its taking hours to get an MB down, and as for APs....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile William
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1572
Credit: 9,188,369
RAC: 13,020
Message 1340770 - Posted: 25 Feb 2013, 17:24:46 UTC - in response to Message 1340742.

I just looked at my computers. O MY. one has 138 time outs. The other one wont download anything. I see em but they just sit there blinking at me stuck at various % of downloadedness:)


When you run above boinc 6.10 you need something to perodiaclly reset the backoffs...
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38173
Credit: 556,764,104
RAC: 601,162
United States
Message 1340771 - Posted: 25 Feb 2013, 17:26:31 UTC - in response to Message 1340766.
Last modified: 25 Feb 2013, 17:29:57 UTC

The validators will be struggling with the number of reported tasks which will be far worse than a normal weekly outage.
Downloads have really taken a hit, its taking hours to get an MB down, and as for APs....

And I don't see things easing up much until at least the end of the week.
Even IF they decide to forgo the usual maintenance outage tomorrow, seeing as how everything was just rebooted. They may still need it if they did not do their usual db cleanups and compression, however.

And unfortunately, the shorties being sent out are not enhancing the recovery.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 41,079,565
RAC: 114,453
United States
Message 1340781 - Posted: 25 Feb 2013, 18:04:19 UTC
Last modified: 25 Feb 2013, 18:37:50 UTC

I gave up on connecting my XP machine to Ubuntu via Firewire. I finally broke down and installed Squid in Ubuntu. I actually managed to get Squid to handle the SETI downloads on my XP machine. No More Download Stalls with XP. I do need to look into configuring Squid to handle the SETI uploads and Scheduler requests though. Just like with many of the HTTP proxies, Squid doesn't do Uploads & Scheduler requests. Kinda makes you think many of those HTTP proxies are running Squid. My new HTTP Proxy is 192.168.1.4:5555.

Oh, no more of those 'Permanent HTTP Download Errors' either...

2/25/2013 1:21:52 PM | | Suspending network activity - user request
2/25/2013 1:22:00 PM | | Using proxy info from GUI
2/25/2013 1:22:00 PM | | Using HTTP proxy 192.168.1.4:5555
2/25/2013 1:22:05 PM | | Resuming network activity
2/25/2013 1:22:05 PM | SETI@home | Started download of ap_16my12aa_B2_P1_00172_20130225_23241.wu
2/25/2013 1:22:05 PM | SETI@home | Started download of ap_16my12aa_B2_P0_00181_20130225_21583.wu
2/25/2013 1:22:05 PM | SETI@home | Started download of ap_16my12aa_B1_P1_00214_20130225_21107.wu
2/25/2013 1:22:05 PM | SETI@home | Started download of ap_16my12aa_B2_P0_00206_20130225_21583.wu
2/25/2013 1:22:05 PM | SETI@home | Started download of ap_16my12aa_B2_P1_00198_20130225_23241.wu
2/25/2013 1:26:40 PM | SETI@home | Sending scheduler request: To fetch work.
2/25/2013 1:26:40 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI
2/25/2013 1:26:41 PM | SETI@home | Scheduler request failed: Error 417
2/25/2013 1:26:49 PM | SETI@home | Computation for task 18se12ab.27382.24607.8.10.49_2 finished
2/25/2013 1:26:49 PM | SETI@home | Starting task 18se12ab.27382.24607.8.10.108_2 using setiathome_enhanced version 609 (cuda23) in slot 3
2/25/2013 1:26:51 PM | SETI@home | Started upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:26:53 PM | SETI@home | Project file upload handler is missing
2/25/2013 1:26:53 PM | SETI@home | Backing off 3 min 39 sec on upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:28:01 PM | SETI@home | Started upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:28:04 PM | SETI@home | Project file upload handler is missing
2/25/2013 1:28:04 PM | SETI@home | Backing off 5 min 19 sec on upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:30:31 PM | SETI@home | Started upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:30:33 PM | SETI@home | Project file upload handler is missing
2/25/2013 1:30:33 PM | SETI@home | Backing off 14 min 58 sec on upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:32:53 PM | SETI@home | Finished download of ap_16my12aa_B2_P1_00172_20130225_23241.wu
2/25/2013 1:32:53 PM | SETI@home | Finished download of ap_16my12aa_B2_P0_00181_20130225_21583.wu
2/25/2013 1:33:01 PM | SETI@home | Started upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:33:04 PM | SETI@home | Finished upload of 18se12ab.27382.24607.8.10.49_2_0
2/25/2013 1:33:04 PM | SETI@home | Sending scheduler request: To fetch work.
2/25/2013 1:33:04 PM | SETI@home | Reporting 1 completed tasks
2/25/2013 1:33:04 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI
2/25/2013 1:33:05 PM | SETI@home | Scheduler request failed: Error 417
2/25/2013 1:34:02 PM | SETI@home | Finished download of ap_16my12aa_B2_P1_00198_20130225_23241.wu
2/25/2013 1:34:04 PM | SETI@home | Finished download of ap_16my12aa_B1_P1_00214_20130225_21107.wu
2/25/2013 1:34:46 PM | SETI@home | Finished download of ap_16my12aa_B2_P0_00206_20130225_21583.wu
2/25/2013 1:35:01 PM | | Using proxy info from GUI
2/25/2013 1:35:01 PM | | Not using a proxy
...

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8525
Credit: 31,092,836
RAC: 53,945
United States
Message 1340785 - Posted: 25 Feb 2013, 18:20:57 UTC - in response to Message 1340770.

I just looked at my computers. O MY. one has 138 time outs. The other one wont download anything. I see em but they just sit there blinking at me stuck at various % of downloadedness:)


When you run above boinc 6.10 you need something to perodiaclly reset the backoffs...

I have one, Its called moving my mouse over the retry button and clicking it till my finger cramps:)
____________

Old James

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38173
Credit: 556,764,104
RAC: 601,162
United States
Message 1340787 - Posted: 25 Feb 2013, 18:34:06 UTC
Last modified: 25 Feb 2013, 18:57:14 UTC

Somethin' funny going on....
The server status page has not updated for an hour.
That's usually not a good sign.
Mebbe just generating the daily stats dump?

EDIT...
Finally updated...hopefully that's the end of that particular panic.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8124
Credit: 52,399,831
RAC: 77,991
United Kingdom
Message 1340792 - Posted: 25 Feb 2013, 19:05:22 UTC

Blame the yellow fluffy thing, that wot I say, blame the yellow fluffy thing....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1340799 - Posted: 25 Feb 2013, 19:38:16 UTC

This morning's panic wasn't a panic - it was me upgrading the OS on the master science database server. The whole successful operation took about 90 minutes, during which some machines/processes hung for a bit. Expect a few more of those, which I can't always schedule during the normal Tuesday outages if I want to finish these upgrades already.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Floyd
Avatar
Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1340804 - Posted: 25 Feb 2013, 20:01:49 UTC - in response to Message 1340799.

Upgrades are GOOD , Keep on keeping on Matt !
Does your statement mean that there is still going to be a tuesday outage after the last weekend downtime ?
____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 234
Credit: 27,279,165
RAC: 20,038
Canada
Message 1340826 - Posted: 25 Feb 2013, 20:50:33 UTC
Last modified: 25 Feb 2013, 20:50:58 UTC

Question. Why did the server problem 81 get locked. The last post showed an awesome machine. So I will comment here. WOW, WOW,WOW, I want one. I was wondering how he had 32 threads going out a 8 core. Did not expect two cpus. It just looks so clean.
Now back to server problems.
All I have been doing for 2 days is baby sitting this machine. Here is how it goes. Download 2 k and wait five minutes, stall, Shut down connection, restart connection. Download 2 k , stall. Repeat.
I have tried to flush the dns and no joy there.
What probably will fix this is a proxy. I have tried several but now I can seem to get one going. Does anyone have a good proxy address that I can use?
Siv has given me some help but its the stalls that are beginning to give me more gray hair

Thanks in advance.
Michael Miles

Mike Davis
Volunteer tester
Send message
Joined: 17 May 99
Posts: 232
Credit: 5,305,392
RAC: 15
Isle of Man
Message 1340830 - Posted: 25 Feb 2013, 21:06:29 UTC

http://www.hidemyass.com/proxy-list/

Select USA and search. Currently 147.31.182.137 Port 80 is working quite well, but is not a secure proxy


____________

Kevin Benfield
Send message
Joined: 29 Dec 03
Posts: 39
Credit: 15,299,056
RAC: 12,410
United Kingdom
Message 1340867 - Posted: 25 Feb 2013, 22:26:52 UTC

have a bunch of GPU units trying to download , so far has taken a couple of hours to download about 6, and they are only short ones.

looks like it's going to be a long time before things are back to the so called normal level
____________

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6796
Credit: 24,479,241
RAC: 26,928
United Kingdom
Message 1340880 - Posted: 25 Feb 2013, 23:05:35 UTC - in response to Message 1340867.
Last modified: 25 Feb 2013, 23:06:36 UTC

have a bunch of GPU units trying to download , so far has taken a couple of hours to download about 6, and they are only short ones.

looks like it's going to be a long time before things are back to the so called normal level

Strange, I have 3 machines crunching S@H and when I got up this morning 08:00 UTC all three had dowloaded their maximum WU's.

I have been out most of the day and all 3 have maintained maximum WU's!!

So as far as I am concerned there are no problems. In fact I am surprised at how fast it has recovered form the weekend.
____________


Today is life, the only life we're sure of. Make the most of today.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2235
Credit: 8,429,771
RAC: 4,079
United States
Message 1340881 - Posted: 25 Feb 2013, 23:06:26 UTC

@ pending vs. validated:

I've noticed that the tasks are getting validated fairly quickly, but it can be a few minutes, at least for APs anyway. I reported 3 of them this morning and went and checked the tasks page within a minute and they were pending, and I waited 30 seconds and hit refresh and they were validated.

Same thing just happened again for reporting two more APs a few minutes ago, but it took about 4 minutes instead.

Before the weekend's downtime, it was nearly instant. It may work itself out soon.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

1 · 2 · 3 · 4 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (82) Server Problems?

Copyright © 2014 University of California