Panic Mode On (73) Server problems?

Message boards : Number crunching : Panic Mode On (73) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1216057 - Posted: 9 Apr 2012, 9:15:06 UTC

every holiday the servers seem to take their own vacation. And who says machines don't have feelings.... pffff

on topic : agreed with previous posters, time-out reached on every attempt.
ID: 1216057 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1216060 - Posted: 9 Apr 2012, 9:20:30 UTC - in response to Message 1216033.  

To borrow a line from Wiggo, I'm bouncing off the limits

So am i, now.
But for several hours there i was getting further & further from the limits with each Scheduler timeout.

Far too much bouncing going here now, I'm getting a sore neck.

Cheers.
ID: 1216060 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1216135 - Posted: 9 Apr 2012, 14:59:14 UTC - in response to Message 1216001.  

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)


I guess the limit was reached only on CPU tasks, if the GPU's are in panic mode, BOINC wont ask for more GPU work, and if you are not using the flops tags then the scheduller is still learning what the speed for the new app's is and meanwhile is using a (slower) default value which is forcing the panic mode...
(or something like that...)

In addition, the tasks being crunched were sent to be done by the stock applications, so the servers won't yet be applying the run times to anonymous platform averages.

Without flops in app_info.xml the core client will be using the ~3.34e09 Whetstone value as <flops> for CPU work, and ~1.8e09 (0.54*Whetstones) as <flops> for GPU. The CPU value is low by a factor of 6 or more, the GPU value is low by about a factor of 48 (rough estimates based on APRs for stock). DCF will have been driven down so the estimated run times are not longer by that much, in fact CPU tasks are likely to have short estimates. But the GPU task estimates will still be long, hence the high priority processing.
                                                                  Joe
ID: 1216135 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1216152 - Posted: 9 Apr 2012, 15:22:38 UTC - in response to Message 1216135.  

My whole project is acting a bit weird. Scheduler timeouts, downloads that won't move but suddenly scream through the pipes (100kbs on occasion).


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1216152 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1216153 - Posted: 9 Apr 2012, 15:28:53 UTC - in response to Message 1216135.  

I don't know if this is related to server problems or what, but my i7's tasks in progress that was at its maximum of 800 earlier today is now 699.

This may or may not have anything to do with the fact that I finally pulled the trigger and installed Lunatics around noon today.

My tasks in progress continues to drop, and the server has been saying I've reached my limit for about 7 hours now. Is this anything to do with the switch to Lunatics? Boinc is running GPU work on high priority, even though it's not due for a week or more. (If it matters, I set app_info to do 2 GPUs at a time.)

I guess the limit was reached only on CPU tasks, if the GPU's are in panic mode, BOINC wont ask for more GPU work, and if you are not using the flops tags then the scheduller is still learning what the speed for the new app's is and meanwhile is using a (slower) default value which is forcing the panic mode...
(or something like that...)

I suspected something like that. However, it has downloaded new work for both CPU and GPU under Anonymous Platform. (Hasn't returned any of it yet; still working on the previous stuff, but returning it in a day instead of the usual 5-6 days.)

In addition, the tasks being crunched were sent to be done by the stock applications, so the servers won't yet be applying the run times to anonymous platform averages.

Without flops in app_info.xml the core client will be using the ~3.34e09 Whetstone value as <flops> for CPU work, and ~1.8e09 (0.54*Whetstones) as <flops> for GPU. The CPU value is low by a factor of 6 or more, the GPU value is low by about a factor of 48 (rough estimates based on APRs for stock). DCF will have been driven down so the estimated run times are not longer by that much, in fact CPU tasks are likely to have short estimates. But the GPU task estimates will still be long, hence the high priority processing.
                                                                  Joe

I suspected that too. (The basics; my eyes glazed over on the details.)

Tasks in progress is currently sitting at 659. Oddly, it hasn't made contact in almost 4 hours. I'll have to remote in and see what's up when I switch to my work laptop (right now I can't because I'm on my work desktop and IE crashes when I use logmein on it).

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1216153 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1216186 - Posted: 9 Apr 2012, 16:36:47 UTC - in response to Message 1216152.  

My whole project is acting a bit weird. Scheduler timeouts, downloads that won't move but suddenly scream through the pipes (100kbs on occasion).


+1


Kevin


ID: 1216186 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1216208 - Posted: 9 Apr 2012, 17:11:51 UTC - in response to Message 1216186.  
Last modified: 9 Apr 2012, 18:05:30 UTC

My whole project is acting a bit weird. Scheduler timeouts, downloads that won't move but suddenly scream through the pipes (100kbs on occasion).

+1

+2

If I press [ No New Tasks ] then [ Update ] I can report completed tasks, otherwise things fails all the time. Asking for WUs seems to work more often if tasks are not also being reported. Using <report_results_immediately>1</report_results_immediately> seems to help things along.
ID: 1216208 · Report as offensive
BWX

Send message
Joined: 31 May 03
Posts: 36
Credit: 156,754,993
RAC: 24
United States
Message 1216220 - Posted: 9 Apr 2012, 18:19:51 UTC

Something significantly changed Friday afternoon PDT. Suddenly it became almost impossible to get downloads to complete, and when they did the 'speed' was unusually high.

Seems somebody made an 'inprovement' that has caused my machines to starve - I repeatedly have downloads ALWAYS time out, no matter how many times I re-try. I leave it for a day, re-try and they all finish in record time. Then the next batch I get are stuck all over again.

Cripes!
ID: 1216220 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1216222 - Posted: 9 Apr 2012, 18:24:08 UTC
Last modified: 9 Apr 2012, 18:25:21 UTC

Yeah, something got whacked after that last outage.
My top rig had about 10 downloads that took almost an hour of retrying, and then they all screamed through.
Now it has about 30 downloads that again will not budge an inch.

And when I checked it this morning, it had gotten so many failed scheduler requests that the whole project was backed off for a couple of hours.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1216222 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1216230 - Posted: 9 Apr 2012, 18:49:09 UTC - in response to Message 1216224.  
Last modified: 9 Apr 2012, 18:50:47 UTC

We've been pegged at our bandwidth limit since we fixed the problem on friday.

Been pegged for most of last year & all of this year.


No problem with downloads here (other than after the outage but before Eric tweaked things downloads would timeout as soon as they started).
But still getting heaps of Scheduler request timeouts.
Grant
Darwin NT
ID: 1216230 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1216233 - Posted: 9 Apr 2012, 18:50:52 UTC - in response to Message 1216224.  

Yeah, something got whacked after that last outage.
My top rig had about 10 downloads that took almost an hour of retrying, and then they all screamed through.
Now it has about 30 downloads that again will not budge an inch.

And when I checked it this morning, it had gotten so many failed scheduler requests that the whole project was backed off for a couple of hours.



Eric posted in "News":
Posted: 7 Apr 2012 | 18:39:40 UTC
"We had network problems Thursday night and since then we've been flooded with connection attempts. I just altered some server parameters to prevent clients from holding on to a connection for too long in hope that will help some downloads complete. I was able to get my home downloads done after that.

We've been pegged at our bandwidth limit since we fixed the problem on friday."

____________________________________________________________________________
I guess altering those server parameters, didn't make things better though, on the contrary, things got really bad...

Normally reporting and such would failed pretty quickly. Looking over the logs from the weekend. It looks like they all timed out on the client side at 5 minutes.

I'm not sure if that is the indented effect, but if they are/were just dropping the connection on the server side I would have thought the client would have seen it occur.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1216233 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1216585 - Posted: 10 Apr 2012, 19:48:57 UTC - in response to Message 1216583.  

MegaPanc:-)


Cheers,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1216585 · Report as offensive
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1216587 - Posted: 10 Apr 2012, 19:57:24 UTC

panic^2: It works, but I don't know why!
ID: 1216587 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1216638 - Posted: 10 Apr 2012, 22:59:25 UTC

Pessimists
Accept
Nothing
In
Cache

And so,
don`t post about it.
which in poop.
:¬)
ID: 1216638 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1216643 - Posted: 10 Apr 2012, 23:08:02 UTC

The outage seemed pretty quick, so I'll panic about that :)

Member of the People Encouraging Niceness In Society club.

ID: 1216643 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1217000 - Posted: 11 Apr 2012, 20:57:24 UTC

I had better post something before 24hrs is up, or Sten will have nothing to panic about.
But there was a nasty little noch in the cricket graph earlier today that nobody else bothered to mention.
I wunderz why :¬)
ID: 1217000 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1217005 - Posted: 11 Apr 2012, 21:11:16 UTC
Last modified: 11 Apr 2012, 21:43:16 UTC

SETI@home	Project is temporarily shut down for maintenance

Quack?

Edit: Sorry, nevermind, it fixed itself pretty soon. Someone must be fiddling something.
Edit 2: Not sorry and do mind. It is dead after all. :( Now there's your panic.
ID: 1217005 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1217015 - Posted: 11 Apr 2012, 21:17:38 UTC

11/04/2012 22:09:58 | SETI@home | update requested by user
11/04/2012 22:10:00 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:10:00 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:10:08 | SETI@home | Scheduler request completed
11/04/2012 22:10:08 | SETI@home | Project is temporarily shut down for maintenance
11/04/2012 22:10:28 | SETI@home | update requested by user
11/04/2012 22:10:29 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:10:29 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:10:34 | SETI@home | Scheduler request completed
11/04/2012 22:10:34 | SETI@home | Project is temporarily shut down for maintenance
11/04/2012 22:11:15 | SETI@home | update requested by user
11/04/2012 22:11:16 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:11:16 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:11:22 | SETI@home | Scheduler request completed
11/04/2012 22:11:22 | SETI@home | Project is temporarily shut down for maintenance
11/04/2012 22:11:32 | SETI@home | update requested by user
11/04/2012 22:11:38 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:11:38 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:11:42 | SETI@home | Scheduler request completed
11/04/2012 22:11:42 | SETI@home | Project is temporarily shut down for maintenance
11/04/2012 22:12:02 | SETI@home | update requested by user
11/04/2012 22:12:08 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:12:08 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:12:14 | SETI@home | Scheduler request completed


I presume there is a minor bug in new version 7.0.25.
Whereas it used to report simple failure it now reports shut down instead. Cricket is at max so I again presume there is extremely high traffic.


ID: 1217015 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1217027 - Posted: 11 Apr 2012, 21:42:36 UTC - in response to Message 1217015.  

11/04/2012 22:09:58 | SETI@home | update requested by user
11/04/2012 22:10:00 | SETI@home | Sending scheduler request: Requested by user.
11/04/2012 22:10:00 | SETI@home | Reporting 5 completed tasks, not requesting new tasks
11/04/2012 22:10:08 | SETI@home | Scheduler request completed
11/04/2012 22:10:08 | SETI@home | Project is temporarily shut down for maintenance

I presume there is a minor bug in new version 7.0.25.
Whereas it used to report simple failure it now reports shut down instead. Cricket is at max so I again presume there is extremely high traffic.

Nope, it just means the project is shut down for maintenance, later when every host under the sun is trying to reach the scheduler, then you might get failures,

Claggy
ID: 1217027 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1217052 - Posted: 11 Apr 2012, 23:06:29 UTC

Saw this in my message log and laughed.

2012-04-11 18:57:16|SETI@home|Sending scheduler request: To fetch work. Requesting 1 seconds of work, reporting 1 completed tasks

It is inevitable that there would be a 1-second work request, but I never thought I would see it. Lowest I've seen before that was 12 seconds.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1217052 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Panic Mode On (73) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.