Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 25 · Next
Author Message
Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 11,596,435
RAC: 12,735
United Kingdom
Message 1326877 - Posted: 11 Jan 2013, 23:47:46 UTC

Cricket still showing that "average bits out" is still declining slowly but steadily, so there must be a blockage in the pipe somewhere.
Call for DynaRod!
____________

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 100,313,370
RAC: 99,793
Finland
Message 1326962 - Posted: 12 Jan 2013, 5:14:46 UTC

Problems connecting to scheduler:
12/01/2013 07:11:16 | SETI@home | [sched_op] Fetching master file
12/01/2013 07:11:16 | SETI@home | Fetching scheduler list
12/01/2013 07:11:19 | SETI@home | [sched_op] Got master file; parsing
12/01/2013 07:11:19 | SETI@home | [sched_op] Found 1 scheduler URLs in master file
12/01/2013 07:11:19 | SETI@home | Master file download succeeded
12/01/2013 07:11:24 | SETI@home | [sched_op] Starting scheduler request
12/01/2013 07:11:24 | SETI@home | Sending scheduler request: To report completed tasks.
12/01/2013 07:11:24 | SETI@home | Reporting 28 completed tasks, not requesting new tasks
12/01/2013 07:11:24 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
12/01/2013 07:11:24 | SETI@home | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
12/01/2013 07:11:47 | SETI@home | Scheduler request failed: Couldn't connect to server
12/01/2013 07:11:47 | SETI@home | [sched_op] Deferring communication for 1 min 57 sec
12/01/2013 07:11:47 | SETI@home | [sched_op] Reason: Scheduler request failed

Still the cricket graph shows green.....
____________

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 11,596,435
RAC: 12,735
United Kingdom
Message 1326975 - Posted: 12 Jan 2013, 5:54:16 UTC - in response to Message 1326962.

Still the cricket graph shows green.....


The green is not the problem. Us crunchers reporting work are represented by the blue line. "Bits out" means out to Eric and pals. The green is "bits in" from Berkeley on their way to the world.
I know this is counter-intuitive but that's the way it works!

____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2266
Credit: 8,671,746
RAC: 4,122
United States
Message 1327024 - Posted: 12 Jan 2013, 7:04:07 UTC

My scheduler contacts are slow, but they go through about 90% of the time. The most recent one was over 60 seconds to acknowledge 3 completed tasks.

I'm still not really a fan of the project back-off mechanism, but last night when I had 11 APs that were all trying to download at the same time, I was getting way too sleepy to help them along. I decided to just leave it be and BOINC will make it work. Woke up 7 hours later and they had all finished just fine.. in addition to four more.

But it is the scheduler contacts mainly that could use some tweaking to try to make them a little more reliable and speedy. Or a bigger download pipe. Or maybe an off-site download mirror that has a >= gigabit link and the existing 100mbit pipe can be used solely for sending new data to the mirror. Just some ideas. I know the off-site thing has been mentioned before and the staff isn't too keen on the idea, but unless we can get our 100mbit limit increased to 150 or 200, it's just going to be painful headaches from here on out.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 100,313,370
RAC: 99,793
Finland
Message 1327048 - Posted: 12 Jan 2013, 8:20:32 UTC

Update on No New Tasks increases the retry timer....:(.. I have not been able to report WU for last 12 hours. Not getting any new WU either...
____________

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2777
Credit: 209,703,295
RAC: 127,889
United States
Message 1327054 - Posted: 12 Jan 2013, 8:36:06 UTC - in response to Message 1326191.
Last modified: 12 Jan 2013, 8:55:35 UTC


I don't believe they have the time and resources to do what they want but to say the guys in the lab "don't care" is to me and insult.


Bernie, I don't know what to suggest except that you either read what I wrote and accept that I used my words correctly, or that you just be insulted or offended all over the place.

I said that "I fear," meaning that I am afraid of that, not that I have evidence of that.

I fear one of my children may die before me, because I would find that a traumatic event.

In a similar, but far, far less important way, I would hate to learn that the guys in the lab don't care.

I'm hardly a coward. My word choice was not a clever way of saying something without saying it.

If I meant, "The guys in the lab at SETI don't care about this project anymore since they are busy with their high-speed spectrometer and other data they've recently acquired from Green Bank," I would have no problem, at all, saying that, on the record, right here.

If I held that opinion, it certainly wouldn't be pointed at, or meant to offend, or insult, you.

That isn't even what I said. Read it again with a different inflection.

Perhaps we are having a U.S. vs. U.K. usage of words problem?

"and insult" to whom, Bernie?
____________

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2777
Credit: 209,703,295
RAC: 127,889
United States
Message 1327056 - Posted: 12 Jan 2013, 8:41:54 UTC - in response to Message 1326414.

That they've gone to the effort to inform the fund raisers of their requirements would indicate to most people that they do care.



Hey Grant,

I may know more about that than you do, and it's entirely possible that I was the very first person to contribute to that fundraiser and made the first public plea for others to do likewise --- before it was even announced.

BUT, I don't need, or appreciate, you or anyone else putting words in my mouth.

The word is "fear" and denotes anxiety.

If I need someone to tell me what I meant, I'll be sure to call for help.
____________

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 100,313,370
RAC: 99,793
Finland
Message 1327058 - Posted: 12 Jan 2013, 8:44:42 UTC

Very close to 3 minute TCP timeout for scheduler. The Scheduler transactions seems to take some time again:

12/01/2013 10:17:20 | SETI@home | update requested by user
12/01/2013 10:17:24 | SETI@home | [sched_op] Starting scheduler request
12/01/2013 10:17:24 | SETI@home | Sending scheduler request: Requested by user.
12/01/2013 10:17:24 | SETI@home | Reporting 36 completed tasks, not requesting new tasks
12/01/2013 10:17:24 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
12/01/2013 10:17:24 | SETI@home | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
12/01/2013 10:17:47 | SETI@home | Scheduler request failed: Couldn't connect to server
12/01/2013 10:17:47 | SETI@home | [sched_op] Deferring communication for 3 hr 3 min 49 sec
12/01/2013 10:17:47 | SETI@home | [sched_op] Reason: Scheduler request failed
12/01/2013 10:17:57 | SETI@home | work fetch resumed by user
12/01/2013 10:36:51 | SETI@home | update requested by user
12/01/2013 10:36:55 | SETI@home | [sched_op] Starting scheduler request
12/01/2013 10:36:55 | SETI@home | Sending scheduler request: Requested by user.
12/01/2013 10:36:55 | SETI@home | Reporting 36 completed tasks, requesting new tasks for CPU and NVIDIA
12/01/2013 10:36:55 | SETI@home | [sched_op] CPU work request: 202266.66 seconds; 0.00 devices
12/01/2013 10:36:55 | SETI@home | [sched_op] NVIDIA work request: 224640.00 seconds; 2.00 devices
12/01/2013 10:39:50 | SETI@home | Scheduler request completed: got 128 new tasks
12/01/2013 10:39:50 | SETI@home | [sched_op] Server version 701
12/01/2013 10:39:50 | SETI@home | Project requested delay of 303 seconds
12/01/2013 10:39:50 | SETI@home | [sched_op] estimated total CPU task duration: 206174 seconds
12/01/2013 10:39:50 | SETI@home | [sched_op] estimated total NVIDIA task duration: 88822 seconds

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5813
Credit: 58,902,264
RAC: 48,108
Australia
Message 1327084 - Posted: 12 Jan 2013, 10:52:00 UTC - in response to Message 1327056.
Last modified: 12 Jan 2013, 10:54:20 UTC

BUT, I don't need, or appreciate, you or anyone else putting words in my mouth.

I didn't put any words there, i used the exact words you posted.


I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues.

right from the post you made- http://setiathome.berkeley.edu/forum_thread.php?id=70431&postid=1326112


I said that "I fear," meaning that I am afraid of that, not that I have evidence of that.

I fear one of my children may die before me, because I would find that a traumatic event.

In a similar, but far, far less important way, I would hate to learn that the guys in the lab don't care.

The only reason to fear something as you describe it there, is because you consider it possible.
No, you didn't come out & say out right that they don't care, you just implied it.


If I need someone to tell me what I meant, I'll be sure to call for help.

I have no idea what you meant, i can only go by what you post & what you posted implies that those running the project don't care about the issues they are having.
____________
Grant
Darwin NT.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8495
Credit: 49,834,032
RAC: 50,961
United Kingdom
Message 1327099 - Posted: 12 Jan 2013, 11:56:51 UTC - in response to Message 1327084.

The only reason to fear something as you describe it there, is because you consider it possible.

... what you posted implies that those running the project don't care about the issues they are having.

Non sequitur. Saying something is possible doesn't turn it into a certainty.

"When you have eliminated the impossible, whatever remains, however improbable, must be the truth" (Sherlock Holmes). But that doesn't apply here - there are a lot more possibilities left than the highly unlikely one that the project staff don't care, as evidenced by Matt's New Year post.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7075
Credit: 60,301,667
RAC: 15,176
Germany
Message 1327116 - Posted: 12 Jan 2013, 14:15:35 UTC
Last modified: 12 Jan 2013, 14:24:58 UTC

[panic on]
Last well scheduler server contact at 12 Jan 2013, 12:34:52 UTC.
Since then:

Scheduler request failed: Couldn't connect to server Scheduler request failed: Failure when receiving data from the peer

[/panic on]

[panic off]
Just now.. - OK, at 14:14 UTC again contact. ;-)
[/panic off]

[EDIT:
[panic on]
It was just one well contact since then again ..
Scheduler request failed: Couldn't connect to server

[/panic on]]


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 30
United States
Message 1327125 - Posted: 12 Jan 2013, 15:32:46 UTC

No work or scheduler connects for over 24 hours now per BOINCTasks. Have been out of gpu work for about 18 hours, crunching Einstein on zero resource share to stay busy.

I'm puzzled by the Cricket Graph. It still shows downloads are maxxed out, saying that many are getting work. Is this another issue like the old HE conncection issue where only certain computers or IP addresses are blocked?
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Tom*
Send message
Joined: 12 Aug 11
Posts: 114
Credit: 4,811,361
RAC: 2,835
United States
Message 1327155 - Posted: 12 Jan 2013, 18:48:40 UTC

Finally got thru - logjam is slightly abating as the incoming traffic
has finally made it above 10
Cur: 10.12 Mbits/sec
First Contact in over 24 hours with or without proxy.
lost tasks were mostly shorties
funny thing was I got one of those red BOINC messages saying I lost contact
with the internet during the request that finally made it thru??

Profile James SotherdenProject donor
Avatar
Send message
Joined: 16 May 99
Posts: 8800
Credit: 34,272,942
RAC: 61,025
United States
Message 1327156 - Posted: 12 Jan 2013, 18:58:32 UTC

Im just going to let my machines go to other projects. two allready are, One still has work for maybe 8 hours. They are going to shut down for maintenace anyway so im not going to fight an uphill battle trying to get work.
____________

Old James

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1327163 - Posted: 12 Jan 2013, 19:37:09 UTC

if i could download as many work units as i have `scheduler list` and `master file` i would be crunching fine :((

All i am getting is:-

12/01/2013 17:01:47 | SETI@home | Fetching scheduler list
12/01/2013 17:01:49 | SETI@home | Master file download succeeded
12/01/2013 17:01:55 | SETI@home | Sending scheduler request: To report completed tasks.
12/01/2013 17:01:55 | SETI@home | Reporting 100 completed tasks, not requesting new tasks
12/01/2013 17:02:17 | SETI@home | Scheduler request failed: Couldn't connect to server
12/01/2013 17:02:20 | | Project communication failed: attempting access to reference site
12/01/2013 17:02:21 | | Internet access OK - project servers may be temporarily down.

Haz some one stolen the server

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5813
Credit: 58,902,264
RAC: 48,108
Australia
Message 1327169 - Posted: 12 Jan 2013, 19:44:02 UTC - in response to Message 1327099.
Last modified: 12 Jan 2013, 19:46:00 UTC

Non sequitur. Saying something is possible doesn't turn it into a certainty.

I agree.
Just because someone believes something doesn't make it so. But beleiving something to be possible indicates that you consider it likely and implies that you consider it to be the case.

But that doesn't apply here - there are a lot more possibilities left than the highly unlikely one that the project staff don't care, as evidenced by Matt's New Year post.

Yet another indication that they do care.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5813
Credit: 58,902,264
RAC: 48,108
Australia
Message 1327173 - Posted: 12 Jan 2013, 20:04:23 UTC


Back to the panic-

If you do manage to get some work it won't last long, they appear to be all shorties.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5813
Credit: 58,902,264
RAC: 48,108
Australia
Message 1327182 - Posted: 12 Jan 2013, 21:12:58 UTC - in response to Message 1327173.


And for some time now the splitters have been unable to do more than 15/s, and the Ready to Send buffer that as a resault of the limited splitter output had been steadily declining, thanks to the shorties, is now falling like a stone...
____________
Grant
Darwin NT.

Tom*
Send message
Joined: 12 Aug 11
Posts: 114
Credit: 4,811,361
RAC: 2,835
United States
Message 1327187 - Posted: 12 Jan 2013, 21:20:29 UTC

More Ammo to have shorties handled like VLAR's especially when AP's are being Split.!!!

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8385
Credit: 56,672,589
RAC: 77,788
United Kingdom
Message 1327191 - Posted: 12 Jan 2013, 21:49:02 UTC

Wouldn't really help much. The problem with VLAR is a computational one, many GPU just freak out and so produce wrong answers, or get stalled in near infinite loops. The problem with shorties is that everyone gets through them so much faster than a normal WU that everyone, be they a slow CPU cruncher or a mega fast GPU cruncher is getting through them about 5 times faster than a normal one the demand for WUs goes up by a factor of five - perhaps the "cure" is actually the reverse - only distribute shorties when the fastest crunchers are stuffed up to the gills with APs.....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California