Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next

AuthorMessage
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898754 - Posted: 2 Nov 2017, 20:44:31 UTC - in response to Message 1898753.  

Darksider

908	SETI@home	11/2/2017 13:41:56	update requested by user	
909	SETI@home	11/2/2017 13:41:57	sched RPC pending: Requested by user	
910	SETI@home	11/2/2017 13:41:57	[sched_op] Starting scheduler request	
911	SETI@home	11/2/2017 13:41:57	Sending scheduler request: Requested by user.	
912	SETI@home	11/2/2017 13:41:57	Reporting 66 completed tasks	
913	SETI@home	11/2/2017 13:41:57	Requesting new tasks for CPU and NVIDIA GPU	
914	SETI@home	11/2/2017 13:41:57	[sched_op] CPU work request: 1196610.72 seconds; 0.00 devices	
915	SETI@home	11/2/2017 13:41:57	[sched_op] NVIDIA GPU work request: 542319.00 seconds; 0.00 devices	
916	SETI@home	11/2/2017 13:42:41	Scheduler request failed: HTTP internal server error	
917	SETI@home	11/2/2017 13:42:41	[sched_op] Deferring communication for 03:22:53	
918	SETI@home	11/2/2017 13:42:41	[sched_op] Reason: Scheduler request failed	

Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898754 · Report as offensive     Reply Quote
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1274
Credit: 133,910,725
RAC: 244,327
United States
Message 1898755 - Posted: 2 Nov 2017, 20:45:14 UTC - in response to Message 1898751.  

There's a thread over on Q&A, completed tasks, with a similar issue. Jord forwarded that info to Eric, who's apparently trying to look into it. Perhaps you can piggyback on that one to press the issue.
ID: 1898755 · Report as offensive     Reply Quote
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,200,476
RAC: 70,389
United Kingdom
Message 1898756 - Posted: 2 Nov 2017, 20:51:54 UTC - in response to Message 1898754.  

I'd try one cycle with 'no new tasks' selected, to get rid of those completed tasks: then try requesting work again (waiting 303 seconds first, of course), but with a smaller cache setting. I don't think you're ever going to need 13.85 CPU-days of work in one go, when there's a limit pf 100 CPU tasks at a time.

What's the Host ID of Darksider?

OK, out now - leave some tasks for me, please ;)
ID: 1898756 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898762 - Posted: 2 Nov 2017, 21:09:56 UTC - in response to Message 1898756.  
Last modified: 2 Nov 2017, 21:12:21 UTC

Already tried that. Part of the 'kick the servers' process is setting NNT. Then interrupt the network communication. Wait out 305 seconds. Shut down BOINC and wait 1 minute and restart BOINC. Set tasks back to receive and then restart network communications. That process is what usually gets the servers to wake up and send you work.

My work cache settings are global and set for 2.0 days + 0.1 days additional. As you said there is no point in asking for more work when the server limits you to 100 task per CPU and 100 tasks per GPU at any time. I should have 400 tasks on board at any time if the servers are working correctly and there is work, I request work often enough every day that 2 days is more than sufficient. I crunch through my 300 GPU task allotment every 2 1/2 hours.

I have no clue why the servers are calculating that I am requesting that much work. It should be 2 days worth. I have 92 CPU tasks on board now. Zero GPU tasks.

The Host ID is 8306366 8306366
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898762 · Report as offensive     Reply Quote
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1823
Credit: 106,416,255
RAC: 451,991
Canada
Message 1898763 - Posted: 2 Nov 2017, 21:19:40 UTC - in response to Message 1898762.  

You are chasing goblins, it is a server problem affecting everyone, look at the SSP and haveland.
ID: 1898763 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898765 - Posted: 2 Nov 2017, 21:24:14 UTC - in response to Message 1898763.  

Thanks for the comment and clue Brent. I hadn't looked there yet since all the replies to me this morning was that everything was working fine for everyone else but me.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898765 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,211,862
RAC: 132,321
Australia
Message 1898767 - Posted: 2 Nov 2017, 21:36:24 UTC - in response to Message 1898765.  
Last modified: 2 Nov 2017, 21:40:11 UTC

Thanks for the comment and clue Brent. I hadn't looked there yet since all the replies to me this morning was that everything was working fine for everyone else but me.


. . I think that returns of only 99,000 in last hour seems low ... maybe not, but the creation rate of 0.5 tasks per sec definitely does.

. . And as usual 610K tasks in the hopper and none being sent out ???

. . And I am getting nothing as well :( (Down to 60 tasks on the big rig and dropping, no 300 cache there)

Stephen

:(
ID: 1898767 · Report as offensive     Reply Quote
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,200,476
RAC: 70,389
United Kingdom
Message 1898773 - Posted: 2 Nov 2017, 22:33:10 UTC - in response to Message 1898762.  

I have no clue why the servers are calculating that I am requesting that much work. It should be 2 days worth. I have 92 CPU tasks on board now. Zero GPU tasks.

The Host ID is 8306366
It's not the servers that calculate that - it's your own client doing the requesting. Asking for two days of work for each of 8 CPUs - that would be 16 days. You must have had some left.
ID: 1898773 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,211,862
RAC: 132,321
Australia
Message 1898774 - Posted: 2 Nov 2017, 22:51:04 UTC - in response to Message 1898767.  
Last modified: 2 Nov 2017, 22:53:41 UTC

Thanks for the comment and clue Brent. I hadn't looked there yet since all the replies to me this morning was that everything was working fine for everyone else but me.


. . I think that returns of only 99,000 in last hour seems low ... maybe not, but the creation rate of 0.5 tasks per sec definitely does.

. . And as usual 610K tasks in the hopper and none being sent out ???

. . And I am getting nothing as well :( (Down to 60 tasks on the big rig and dropping, no 300 cache there)

Stephen

:(


. . Update:

. . Down to 4 tasks, no work coming in, shutting down for the interim. Wake me when the work starts flowing again <joke>
ID: 1898774 · Report as offensive     Reply Quote
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6466
Credit: 175,834,377
RAC: 55,261
United States
Message 1898775 - Posted: 2 Nov 2017, 22:52:55 UTC

From my end the servers seem to be in good order.

Project details for: SETI@home including all dates
Scheduler Requests: 4411
Scheduler Success: 99 %, Count: 4404
Scheduler Failure: 0 %, Count: 7 (Total)
Scheduler Failure: 0 % of total, Count: 0 (Couldn't connect to server)
Scheduler Failure: 0 % of total, Count: 4 (HTTP service unavailable)
Scheduler Failure: 0 % of total, Count: 0 (HTTP internal server error)
Scheduler Failure: 0 % of total, Count: 3 (Couldn't resolve host name)
Scheduler Failure: 0 % of total, Count: 0 (Failure when receiving data from the peer)
Scheduler Failure: 0 % of total, Count: 0 (Timeout was reached)
Scheduler Timeout: 0 % of failures

Project details for: SETI@home on 02-Nov-2017
Scheduler Requests: 114
Scheduler Success: 100 %, Count: 114

Project details for: SETI@home on 01-Nov-2017
Scheduler Requests: 173
Scheduler Success: 100 %, Count: 173

Project details for: SETI@home on 31-Oct-2017
Scheduler Requests: 65
Scheduler Success: 100 %, Count: 65


Project details for: SETI@home including all dates
Total number of work requests: 4404
Number of requests gaining work: 1306
Number of requests not gaining work: 3098
Number of requests not gaining work: 2662 (project task limit)
Number of requests not gaining work: 42 (project down for maintenance)
Number of requests not gaining work: 0 (request too recent)
Number of requests not gaining work: 52 (Project has no tasks available)
Number of times no work was requested: 0
Number of tasks gained: 1590

Project details for: SETI@home on 02-Nov-2017
Total number of work requests: 114
Number of requests gaining work: 30
Number of requests not gaining work: 84
Number of requests not gaining work: 84 (project task limit)
Number of tasks gained: 33

Project details for: SETI@home on 01-Nov-2017
Total number of work requests: 173
Number of requests gaining work: 63
Number of requests not gaining work: 110
Number of requests not gaining work: 107 (project task limit)
Number of requests not gaining work: 3 (Project has no tasks available)
Number of tasks gained: 67

Project details for: SETI@home on 31-Oct-2017
Total number of work requests: 65
Number of requests gaining work: 17
Number of requests not gaining work: 48
Number of requests not gaining work: 39 (project task limit)
Number of requests not gaining work: 9 (project down for maintenance)
Number of tasks gained: 46

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1898775 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2634
Credit: 48,211,862
RAC: 132,321
Australia
Message 1898777 - Posted: 2 Nov 2017, 22:55:18 UTC - in response to Message 1898775.  

From my end the servers seem to be in good order.


. . Aaahh! teacher's pet! :)

Stephen

:)
ID: 1898777 · Report as offensive     Reply Quote
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,200,476
RAC: 70,389
United Kingdom
Message 1898778 - Posted: 2 Nov 2017, 22:57:53 UTC - in response to Message 1898775.  

That's why I suspect there's something odd about the way Keith's Linux client is doing the requesting, which causes the server to fall over with an error. And of course if the server daemon falls over, it has to restart and re-cache whatever it held in memory - that'll slow things down.

Keith's log contained:

690			11/2/2017 13:31:02	[http] [ID#0] Sent header to server: ÿ	
702	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: t (x86_64-pc-linux-gnu 7.8.3)
704	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: Ac
ID: 1898778 · Report as offensive     Reply Quote
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6466
Credit: 175,834,377
RAC: 55,261
United States
Message 1898782 - Posted: 2 Nov 2017, 23:54:48 UTC - in response to Message 1898778.  

That's why I suspect there's something odd about the way Keith's Linux client is doing the requesting, which causes the server to fall over with an error. And of course if the server daemon falls over, it has to restart and re-cache whatever it held in memory - that'll slow things down.

Keith's log contained:

690			11/2/2017 13:31:02	[http] [ID#0] Sent header to server: ÿ	
702	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: t (x86_64-pc-linux-gnu 7.8.3)
704	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: Ac

I wonder if while the daemon is recovering the feeder queue will report being empty and is related to their high rate of "Project has no tasks available" responses when requesting work.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1898782 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898784 - Posted: 3 Nov 2017, 0:11:32 UTC - in response to Message 1898778.  

That's was caused by the copy/paste from the remote BT server. It wasn't showing those characters in the machine log itself.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898784 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898785 - Posted: 3 Nov 2017, 0:12:18 UTC

I have shut down the machine and restarted it a couple times now. It hasn't changed the symptom at all.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898785 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898787 - Posted: 3 Nov 2017, 0:17:03 UTC - in response to Message 1898782.  

That's why I suspect there's something odd about the way Keith's Linux client is doing the requesting, which causes the server to fall over with an error. And of course if the server daemon falls over, it has to restart and re-cache whatever it held in memory - that'll slow things down.

Keith's log contained:

690			11/2/2017 13:31:02	[http] [ID#0] Sent header to server: ÿ	
702	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: t (x86_64-pc-linux-gnu 7.8.3)
704	SETI@home	11/2/2017 13:31:02	[http] [ID#1] Sent header to server: Ac

I wonder if while the daemon is recovering the feeder queue will report being empty and is related to their high rate of "Project has no tasks available" responses when requesting work.

The other machines haven't received the internal server error message today. They are Windows machines. I have received the error message on all machines in the past month. The Win10 machine moreso and it is a high production machine too that processes a lot of work fast each day. Not as fast as the Linux machine of course.

All machines down on work with everyone getting the " no work is available" message response from the servers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898787 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3793
Credit: 186,441,280
RAC: 237,398
United States
Message 1898791 - Posted: 3 Nov 2017, 0:39:03 UTC - in response to Message 1898787.  

I suppose when our machines run out of work in a short while We can all sit around and pretend there's nothing wrong with the Server. All my machines are Low with 2 getting Very Low. Increasing the cache didn't work this time. You can see it on the SSP as well. Both the Results out in the field & Results received in last hour have dropped well be;ow the recent normal levels. All I'm getting is;

Thu Nov 2 20:36:48 2017 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Thu Nov 2 20:36:48 2017 | SETI@home | [sched_op] CPU work request: 201653.26 seconds; 0.00 devices
Thu Nov 2 20:36:48 2017 | SETI@home | [sched_op] NVIDIA GPU work request: 516992.53 seconds; 0.00 devices
Thu Nov 2 20:36:51 2017 | SETI@home | Scheduler request completed: got 0 new tasks
Thu Nov 2 20:36:51 2017 | SETI@home | [sched_op] Server version 707
Thu Nov 2 20:36:51 2017 | SETI@home | Project has no tasks available
Thu Nov 2 20:36:51 2017 | SETI@home | Project requested delay of 303 seconds
Thu Nov 2 20:36:51 2017 | SETI@home | [sched_op] Deferring communication for 00:05:03
Thu Nov 2 20:36:51 2017 | SETI@home | [sched_op] Reason: requested by project

Over and over again...
ID: 1898791 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898792 - Posted: 3 Nov 2017, 0:39:45 UTC

I just tried an update on the Linux machine to override the backoff caused by the server error message. Looks like they might have straightened out the servers a bit. I am getting a proper response now. Just the normal "no work is available" message that everyone's been getting today when requesting work.
Darksider

2680	SETI@home	11/2/2017 17:35:49	Sending scheduler request: To fetch work.	
2681	SETI@home	11/2/2017 17:35:49	Requesting new tasks for CPU and NVIDIA GPU	
2682	SETI@home	11/2/2017 17:35:51	Scheduler request completed: got 0 new tasks	
2683	SETI@home	11/2/2017 17:35:51	Project has no tasks available	

Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898792 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2431
Credit: 184,216,578
RAC: 358,676
United States
Message 1898795 - Posted: 3 Nov 2017, 0:50:25 UTC

Sheesh! The RTS buffer is up over 800K tasks! And nobody is getting any of them. The splitters have run amok. You would think they have a process that tells the splitters to back off and stop once you reach a prescribed buffer threshold.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1898795 · Report as offensive     Reply Quote
Profile RueiKeProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 270
Credit: 104,132,713
RAC: 235,441
Taiwan
Message 1898796 - Posted: 3 Nov 2017, 1:00:33 UTC - in response to Message 1898785.  

I have shut down the machine and restarted it a couple times now. It hasn't changed the symptom at all.


My Linux machine is currently without work also. https://setiathome.berkeley.edu/results.php?hostid=8365846
The 437 in progress are ghosts from when I was fumbling around to get the machine up. My other systems are also low on work.
ID: 1898796 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.