Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

1 · 2 · 3 · 4 . . . 23 · Next
Author Message
Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3542
Credit: 46,063,169
RAC: 30,282
United States
Message 1307257 - Posted: 18 Nov 2012, 2:59:22 UTC

Timeouts, Timeouts, Timeouts!!!!!!!
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5560
Credit: 51,209,678
RAC: 37,837
Australia
Message 1307260 - Posted: 18 Nov 2012, 3:19:28 UTC


And still more timeouts.
____________
Grant
Darwin NT.

Profile Jim_S
Avatar
Send message
Joined: 23 Feb 00
Posts: 4456
Credit: 17,470,273
RAC: 6,631
United States
Message 1307261 - Posted: 18 Nov 2012, 3:22:19 UTC
Last modified: 18 Nov 2012, 3:23:43 UTC

And still Very Boring!
Somebody KICK Something...Please.
____________

I Desire Peace and Justice, Jim Scott

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1077
Credit: 30,380,717
RAC: 84,353
United States
Message 1307275 - Posted: 18 Nov 2012, 4:14:43 UTC - in response to Message 1307257.
Last modified: 18 Nov 2012, 4:50:02 UTC

Timeouts, Timeouts, Timeouts!!!!!!!

I snagged a couple more Time Outs. I thought the Virus Scanner Crash had something to do with the last ones, but, there was no crash this time. I ran down my nVidia cache so I could switch back to CUDA23s on my old card and was trying to refill it. Here are the times;
11/17/2012 10:23:17 PM | | Using proxy info from GUI
11/17/2012 10:23:17 PM | | Using HTTP proxy 173.224.215.173:3128
11/17/2012 10:23:24 PM | SETI@home | update requested by user
11/17/2012 10:23:30 PM | SETI@home | Sending scheduler request: Requested by user.
11/17/2012 10:23:30 PM | SETI@home | Reporting 7 completed tasks
11/17/2012 10:23:30 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI
11/17/2012 10:24:08 PM | SETI@home | Scheduler request completed: got 20 new tasks
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 20au12ad.15117.7020.140733193388035.10.204_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 08se12ac.14602.22153.140733193388044.10.79_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 08se12ac.14602.22153.140733193388044.10.137_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 08se12ac.14602.22153.140733193388044.10.158_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 08se12ac.15079.21335.140733193388045.10.209_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.17_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.20_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.23_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 10se12aa.21403.16018.140733193388040.10.29_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.32_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 26fe12ab.13174.480.140733193388045.10.226_4
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 30se12ad.8531.113216.140733193388035.10.209_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 30se12ad.8531.116488.140733193388035.10.29_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 29se12ab.27923.13155.140733193388044.10.10_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 30se12ad.696.120987.140733193388036.10.153_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 30se12aa.13934.25016.6.10.174_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 10se12aa.21403.16018.140733193388040.10.25_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.70_2
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 10se12aa.21403.16018.140733193388040.10.28_3
11/17/2012 10:24:08 PM | SETI@home | Resent lost task 09se12ab.18967.315158.140733193388048.10.73_2
11/17/2012 10:24:10 PM | SETI@home | Started download of 20au12ad.15117.7020.140733193388035.10.204
11/17/2012 10:24:10 PM | SETI@home | Started download of 08se12ac.14602.22153.140733193388044.10.79
11/17/2012 10:24:23 PM | SETI@home | Finished download of 20au12ad.15117.7020.140733193388035.10.204
11/17/2012 10:24:23 PM | SETI@home | Started download of 08se12ac.14602.22153.140733193388044.10.137
11/17/2012 10:24:24 PM | SETI@home | Finished download of 08se12ac.14602.22153.140733193388044.10.79
11/17/2012 10:24:24 PM | SETI@home | Started download of 08se12ac.14602.22153.140733193388044.10.158
11/17/2012 10:24:30 PM | SETI@home | Finished download of 08se12ac.14602.22153.140733193388044.10.137
11/17/2012 10:24:30 PM | SETI@home | Started download of 08se12ac.15079.21335.140733193388045.10.209
11/17/2012 10:24:38 PM | SETI@home | Finished download of 08se12ac.15079.21335.140733193388045.10.209
11/17/2012 10:24:38 PM | SETI@home | Started download of 09se12ab.18967.315158.140733193388048.10.17
11/17/2012 10:24:57 PM | | Suspending network activity - user request
11/17/2012 10:25:06 PM | | Using proxy info from GUI
11/17/2012 10:25:06 PM | | Not using a proxy
11/17/2012 10:25:11 PM | | Resuming network activity
11/17/2012 10:25:11 PM | SETI@home | Started download of 08se12ac.14602.22153.140733193388044.10.158...

Then I got;
2718613438 1088129501 18 Nov 2012 | 2:20:30 UTC 18 Nov 2012 | 2:30:53 UTC Timed out - no response
2718613436 1088129393 18 Nov 2012 | 2:20:30 UTC 18 Nov 2012 | 2:30:53 UTC Timed out - no response
2718613434 1088129387 18 Nov 2012 | 2:20:30 UTC 18 Nov 2012 | 2:30:53 UTC Timed out - no response...
http://setiathome.berkeley.edu/results.php?hostid=6797524&offset=0&show_names=0&state=6&appid=

I never got these without the proxy. At least I'm back to my limit...

Lee Gresham
Avatar
Send message
Joined: 12 Aug 03
Posts: 129
Credit: 91,143,751
RAC: 63,550
United States
Message 1307295 - Posted: 18 Nov 2012, 6:51:37 UTC

And just to make things even better how about mini work units when you do get some work. I bet that helps the scheduler overload!
____________
Delta-V

Profile Donald L. Johnson
Avatar
Send message
Joined: 5 Aug 02
Posts: 5674
Credit: 561,428
RAC: 629
United States
Message 1307299 - Posted: 18 Nov 2012, 7:49:46 UTC

In the previous iteration of this thread, someone was talking about how Macs and other "weak sister" systems seem to be faring much better than Windows systems. Not wishing to add insult to injury, but all 4 of my systems, both WinXP and Mac G4s, are having little or no problem communicating with the Scheduler and servers. Timeouts happen about 1 in 10 attempts, and usually go through on the next try, depending on time of day (PST/UTC-8). Downloads are slow (1-5 KBps), but they get through.

All systems go through the same ADSL modem to AT&T PacBell. The G4 is direct-connect, the iBook has an AirPort Wifi card, and the XP boxes are on USB Wifi modems.

The G4s are running BOINC 6.10.58 and 6.10.60, caches set to 0 + 3 days; and both the XP boxes are running 7.0.28, caches are set at 2 + 0.5 days.

Hope this is useful information.
____________
Donald
Infernal Optimist / Submariner, retired

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 155
Credit: 12,742,501
RAC: 25,229
Netherlands
Message 1307333 - Posted: 18 Nov 2012, 10:36:55 UTC

some people have bashed the project so many times I wonder what they are in for ? Is it a competion who has the highest RAC or are there some people still here who know the core surpose of this project :

Finding E.T. some day in the future

If the project goes cold for a while due to circumstances beyond the control of the volunteers working at Berkeley ($$$$!) and you don't like it why don't you just move on... No hard feelings ;-)

I remember last year around this time we had a mega outage of over a month didn't we and most of you still came back...

I also have to work with limits, proxy's and a lot of "manual care" for the project, my RAC has also dropped about 4000 but it's the end result that counts.

Have a wonderful sunday anyway wherever you are on our tiny planet !
____________

Profile Uli
Volunteer tester
Avatar
Send message
Joined: 6 Feb 00
Posts: 9357
Credit: 4,918,126
RAC: 3,652
Germany
Message 1307352 - Posted: 18 Nov 2012, 13:20:22 UTC

+1
____________
Pluto will always be a planet to me.

Cash Donation Specialist

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1307360 - Posted: 18 Nov 2012, 14:06:15 UTC
Last modified: 18 Nov 2012, 14:07:56 UTC

If admins deserve "bashing" then it's this: they could literally take 3 minutes of their time and write a front page news post, mentioning timeout/network problems, so volunteers would know it's not problem on their side.
Right now, we have people posting their problems all over the place in every subforum, reinstalling boinc, rebooting, (un)plugging network cables, looking for proxies, etc...
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,065,482
RAC: 328,817
Brazil
Message 1307367 - Posted: 18 Nov 2012, 14:22:31 UTC - in response to Message 1307333.
Last modified: 18 Nov 2012, 14:23:03 UTC

some people have bashed the project so many times I wonder what they are in for ? Is it a competion who has the highest RAC or are there some people still here who know the core surpose of this project :

Finding E.T. some day in the future

If the project goes cold for a while due to circumstances beyond the control of the volunteers working at Berkeley ($$$$!) and you don't like it why don't you just move on... No hard feelings ;-)

I remember last year around this time we had a mega outage of over a month didn't we and most of you still came back...

I also have to work with limits, proxy's and a lot of "manual care" for the project, my RAC has also dropped about 4000 but it's the end result that counts.

Have a wonderful sunday anyway wherever you are on our tiny planet !


You forget something, the origin of the problem is because they use the resources of SETI and try to run AP on that, so if they keep the focus on the main project Finding E.T. nothing like that could be happening.

The is crystal clear, the actual configurarion (founds, staff, bandwith, server, router, who else, makes no diference) can´t handle the 2 projects runing at full capacity at the same time something must be done quickly.
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,908,027
RAC: 13,535
United Kingdom
Message 1307369 - Posted: 18 Nov 2012, 14:33:32 UTC - in response to Message 1307367.

some people have bashed the project so many times I wonder what they are in for ? Is it a competion who has the highest RAC or are there some people still here who know the core surpose of this project :

Finding E.T. some day in the future

If the project goes cold for a while due to circumstances beyond the control of the volunteers working at Berkeley ($$$$!) and you don't like it why don't you just move on... No hard feelings ;-)

I remember last year around this time we had a mega outage of over a month didn't we and most of you still came back...

I also have to work with limits, proxy's and a lot of "manual care" for the project, my RAC has also dropped about 4000 but it's the end result that counts.

Have a wonderful sunday anyway wherever you are on our tiny planet !

You forget something, the origin of the problem is because they use the resources of SETI and try to run AP on that, so if they keep the focus on the main project Finding E.T. nothing like that could be happening.

The is crystal clear, the actual configurarion (founds, staff, bandwith, server, router, who else, makes no diference) can´t handle the 2 projects runing at full capacity at the same time something must be done quickly.

From Astropulse FAQ:

Astropulse is a new type of SETI. It expands on the original SETI@home, but does not replace it. The original SETI@home searches for narrowband signals, as does a conventional AM or FM radio. Astropulse, on the other hand, listens for broader-band, short-time pulses.

In addition to ET, Astropulse might detect other sources, such as...

That doesn't detract from your main point, which is that the unmanaged and unregulated attempt to run both halves (of the same project) on the same inadequate infrastructure at the same time is counter-productive. We have passed the point of diminishing returns.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8217
Credit: 21,762,563
RAC: 12,800
United Kingdom
Message 1307371 - Posted: 18 Nov 2012, 14:42:04 UTC - in response to Message 1307360.

If admins deserve "bashing" then it's this: they could literally take 3 minutes of their time and write a front page news post, mentioning timeout/network problems, so volunteers would know it's not problem on their side.
Right now, we have people posting their problems all over the place in every subforum, reinstalling boinc, rebooting, (un)plugging network cables, looking for proxies, etc...

Do you realise how many people do NOT read the front page?

It is probably in the same ballpark as people who read the manual, and that is in single percentage figures. The reason RTFM appears so often backs up that calaim.

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4134
Credit: 1,003,215
RAC: 231
United States
Message 1307376 - Posted: 18 Nov 2012, 14:48:19 UTC - in response to Message 1307295.

And just to make things even better how about mini work units when you do get some work. I bet that helps the scheduler overload!

The ghosts created November 4 which were shorties have now missed deadline. I've just been looking at the task list for a host which had 1091 tasks "sent" Nov. 4 yesterday, now has 792 errors for deadline missed.
Joe

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,908,027
RAC: 13,535
United Kingdom
Message 1307379 - Posted: 18 Nov 2012, 14:52:48 UTC - in response to Message 1307376.

And just to make things even better how about mini work units when you do get some work. I bet that helps the scheduler overload!

The ghosts created November 4 which were shorties have now missed deadline. I've just been looking at the task list for a host which had 1091 tasks "sent" Nov. 4 yesterday, now has 792 errors for deadline missed.
Joe

And I'm sure I've got at least some of the resends. A very high proportion of this morning's shorties have been replication _2, _3, _4

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 83,810,986
RAC: 117,507
Finland
Message 1307383 - Posted: 18 Nov 2012, 15:00:20 UTC

Could it be that an old hub is connected to the network and someone accidentially pushed x-over / vs straight connection button on it causing Rx/Tx to loop indefinetly clogging up all available bandwidth?

Someone at SETI could run Wireshark to see what is going on there.

____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,065,482
RAC: 328,817
Brazil
Message 1307384 - Posted: 18 Nov 2012, 15:01:44 UTC - in response to Message 1307369.
Last modified: 18 Nov 2012, 15:05:21 UTC


That doesn't detract from your main point, which is that the unmanaged and unregulated attempt to run both halves (of the same project) on the same inadequate infrastructure at the same time is counter-productive. We have passed the point of diminishing returns.


I agree 100% with you, and something must stay clear i have nothing against AP (i even crunch it) and don´t ask for compleate AP or MB stop, i just think, before we go to mars we need to develop the technology to stay in the space for almost a year, or the mars project will end on a disaster.

You point a possible and high confidence hypotesis for the origin of the problem, so why not make a "pit-stop" and check? is clear the actual situation goes from bad to worst as days are passing (did you notice the increase in number of pendings? Soon they will need to stop to clear them, but that is another problem for another time) Or in another perspective, if they know they can´t mantain both project running at 100%, why don´t slow both to a confortable speed (one that make less painfull to everybody obtain their WU) at least until Matt returns and realy finds/fix the problem?

Thats i call right management.

Until them i take the extreme option, activate another project, starting to cruching Einstein for a while, but realy don´t feel confortable with that. My real quest is not for RAC, i realy whant to help to find out little green ET friend (or enemy who knows?).
____________

fkjgsklfjgisojfg
Send message
Joined: 12 Jan 12
Posts: 1
Credit: 632,351
RAC: 199
Message 1307404 - Posted: 18 Nov 2012, 15:40:43 UTC

THE ALIENS HAVE CRASHED MY COMPUTER!!!!!!!!!!!!!!!!!!!

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 2,873,069
RAC: 11,121
Finland
Message 1307429 - Posted: 18 Nov 2012, 16:25:30 UTC

Thunder is out of work (again)...

18/11/2012 04:26:21 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 07:18:15 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 10:39:04 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 10:46:17 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 10:53:06 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 10:58:21 | SETI@home | Scheduler request completed: got 20 new tasks
18/11/2012 11:09:07 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 11:37:09 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 11:46:46 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 11:59:50 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 12:23:44 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 12:52:56 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 13:54:27 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 15:11:15 | SETI@home | Scheduler request failed: Timeout was reached
18/11/2012 17:55:34 | SETI@home | Scheduler request failed: Timeout was reached


In 13 hours, only 1 request out of 15 completed. In that completed request 51 completed tasks were reported and those 20 lost tasks were sent.


____________

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 3960
Credit: 31,817,790
RAC: 10,146
United Kingdom
Message 1307434 - Posted: 18 Nov 2012, 16:42:28 UTC - in response to Message 1307429.
Last modified: 18 Nov 2012, 16:45:32 UTC

Thunder is out of work (again)...

In 13 hours, only 1 request out of 15 completed. In that completed request 51 completed tasks were reported and those 20 lost tasks were sent.

You can increase your chances of getting work by setting No New Tasks, getting your completed tasks reported, then set a small enough cache so only one or two tasks are resent at once, then scheduler requests are more likely to get through,
increase cache level in 0.1 day increments as work is resent.

Or use a proxy.

Claggy

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 2,873,069
RAC: 11,121
Finland
Message 1307436 - Posted: 18 Nov 2012, 16:50:57 UTC - in response to Message 1307434.

Thunder is out of work (again)...

In 13 hours, only 1 request out of 15 completed. In that completed request 51 completed tasks were reported and those 20 lost tasks were sent.

You can increase your chances of getting work by setting No New Tasks, getting your completed tasks reported, then set a small enough cache so only one or two tasks are resent at once, then scheduler requests are more likely to get through,
increase cache level in 0.1 day increments as work is resent.

Claggy


Yep, I know that but as I did say:

All right, all my active crunchers are now in mode "Set & Forget"

No "No New Tasks", proxies, manually update. No any kind of user activity.

"Install and forget"....


This is just a test to see what happens without user action, and we are seeing results.

1 · 2 · 3 · 4 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California