Panic Mode On (57) Server problems?

Message boards : Number crunching : Panic Mode On (57) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1158874 - Posted: 4 Oct 2011, 22:36:54 UTC - in response to Message 1158846.  

The scheduler is really funny.
Seeing only 5.4 CPUs.

04.10.2011 23:05:30 SETI@home [sched_op] Starting scheduler request
04.10.2011 23:05:30 SETI@home Sending scheduler request: Requested by user.
04.10.2011 23:05:30 SETI@home Reporting 43 completed tasks, requesting new tasks for CPU and ATI GPU
04.10.2011 23:05:30 SETI@home [sched_op] CPU work request: 8902321.21 seconds; 5.40 CPUs
04.10.2011 23:05:30 SETI@home [sched_op] ATI GPU work request: 9069.50 seconds; 0.00 GPUs


That's 0.6 reserved for the ATI GPU (0.3 for each of 2 tasks it's hoping to get).

AFAICT the servers don't do anything with that count to ensure you get at least that many tasks for CPU, maybe sometime in the future. Having a non-zero instances does condition some logic to at least consider sending some work, though.
                                                                  Joe
ID: 1158874 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1158875 - Posted: 4 Oct 2011, 22:37:27 UTC

Well, thanks to something stuffing up my computer for the second time in as many weeks, a bunch of tasks errored out.
Couple that with: 04/10/2011 23:16:35 | SETI@home | Project has no tasks available, I am now down to my final four tasks. It's OK though, they should keep me going until I can get my next fix, I mean, some more tasks :)
ID: 1158875 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65766
Credit: 55,293,173
RAC: 49
United States
Message 1158881 - Posted: 4 Oct 2011, 23:06:49 UTC

I can't even report wu's, http errors at the HE router I guess, I wish the nuts doing this would just get lost.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1158881 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1158890 - Posted: 5 Oct 2011, 0:32:03 UTC - in response to Message 1158712.  

I'm sure all of you just read the latest news post.
http://setiathome.berkeley.edu/forum_thread.php?id=65678

She's down for the night, I'm sure. Hopefully, things will be fixed by the time the usual outage is done tomorrow.


Perhaps it is not only SETI. I came across this on Twitter about HE and a DDoS attack. I do not use Twitter but it looks current.

https://twitter.com/#!/henet


From the HE twitter feed
4 hours ago
Experiencing large DDos attack on our network. Network admins doing everything possible. No formal ETA at the moment. Will give update ASAP.

3 hours ago
Our network admins are still working as fast as posible to restore full connectivity and services. We'll continue to give updates as they come.

2 hours ago
Network still experiencing incidences of attack, though network admins are doing their best to keep the attack contained and mitigated.

It's not a Seti sever conking out this time, woohoo! :)



I wonder if Bank of America uses HE and SETI was caught in the crossfire.

http://abcnews.go.com/blogs/business/2011/10/bank-of-america-under-hacking-attack/
ID: 1158890 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1158908 - Posted: 5 Oct 2011, 1:24:31 UTC - in response to Message 1158890.  

I wonder if Bank of America uses HE and SETI was caught in the crossfire.

http://abcnews.go.com/blogs/business/2011/10/bank-of-america-under-hacking-attack/


Just did a traceroute to see. Goes through a handful of hops on Level3 in Dallas and then directly to BoA's own allocated network addresses.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1158908 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65766
Credit: 55,293,173
RAC: 49
United States
Message 1158909 - Posted: 5 Oct 2011, 1:28:16 UTC - in response to Message 1158890.  

I'm sure all of you just read the latest news post.
http://setiathome.berkeley.edu/forum_thread.php?id=65678

She's down for the night, I'm sure. Hopefully, things will be fixed by the time the usual outage is done tomorrow.


Perhaps it is not only SETI. I came across this on Twitter about HE and a DDoS attack. I do not use Twitter but it looks current.

https://twitter.com/#!/henet


From the HE twitter feed
4 hours ago
Experiencing large DDos attack on our network. Network admins doing everything possible. No formal ETA at the moment. Will give update ASAP.

3 hours ago
Our network admins are still working as fast as possible to restore full connectivity and services. We'll continue to give updates as they come.

2 hours ago
Network still experiencing incidences of attack, though network admins are doing their best to keep the attack contained and mitigated.

It's not a Seti sever conking out this time, woohoo! :)



I wonder if Bank of America uses HE and SETI was caught in the crossfire.

http://abcnews.go.com/blogs/business/2011/10/bank-of-america-under-hacking-attack/

Yep, they do as seen here: http://bgp.he.net/AS10794#_whois
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1158909 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1158947 - Posted: 5 Oct 2011, 4:51:50 UTC - in response to Message 1158909.  


Has anyone heard anything on the Scheduler issue? I didn't see anything in the News or Tech news threads.
Even after the outage i still can't contact the Scheduler.
Grant
Darwin NT
ID: 1158947 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19080
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1158949 - Posted: 5 Oct 2011, 5:24:53 UTC - in response to Message 1158947.  


Has anyone heard anything on the Scheduler issue? I didn't see anything in the News or Tech news threads.
Even after the outage i still can't contact the Scheduler.

I've had a few contacts over the last few hours, but it has been only about one in five attempts that are sucessful. Saying that I've rec'd no new tasks.

Also I noted about 3 hours ago there were 798,200 tasks available, that has now decreased to 783,867 and according to scarecrows graphs, creation rate is near 0.00 as you can get.

Think we need a full database clean up and compaction asap, rather than contact with the servers.
ID: 1158949 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1158953 - Posted: 5 Oct 2011, 5:33:06 UTC - in response to Message 1158949.  


Think we need a full database clean up and compaction asap, rather than contact with the servers.

I was under the understanding that this was taken care of this morning while the servers were off line.

ID: 1158953 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1158955 - Posted: 5 Oct 2011, 6:03:47 UTC
Last modified: 5 Oct 2011, 6:05:00 UTC

Well normally the tuesday outage does two things. One, it defragments and compresses the database, and then a copy of it is made and deployed on the replica (as well as sent to off-site storage).

So one would think that after defragging and compressing the database, it should be running smooth now, right? Well there's only so much of the database that will fit in RAM, and unfortunately, with 13M and climbing waiting to be purged, I'm guessing that is taking up RAM for task page queries or something else.

The issues we are experiencing with contacting the scheduler, internal DB operations in the server closet are likely as intermittent/slow because of the bloat.

I think turning the scheduler off and turning db_purge on until the backlog is cleared is about the only thing that can rectify this situation.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1158955 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19080
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1158957 - Posted: 5 Oct 2011, 6:07:41 UTC

Well 40 mins later and the MB's waiting to be sent still remains at 783,867, so definitely something is causing problems at the server end.
ID: 1158957 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1158960 - Posted: 5 Oct 2011, 6:29:35 UTC - in response to Message 1158955.  


I agree that things are getting out of control and something should be done about it. However, I think that turning off the scheduler is a bit drastic. Turning off the splitters seems more appropriate. This way people will still be able to report completed tasks (before the deadline).

Tom


Well normally the tuesday outage does two things. One, it defragments and compresses the database, and then a copy of it is made and deployed on the replica (as well as sent to off-site storage).

So one would think that after defragging and compressing the database, it should be running smooth now, right? Well there's only so much of the database that will fit in RAM, and unfortunately, with 13M and climbing waiting to be purged, I'm guessing that is taking up RAM for task page queries or something else.

The issues we are experiencing with contacting the scheduler, internal DB operations in the server closet are likely as intermittent/slow because of the bloat.

I think turning the scheduler off and turning db_purge on until the backlog is cleared is about the only thing that can rectify this situation.


ID: 1158960 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1158961 - Posted: 5 Oct 2011, 6:37:13 UTC
Last modified: 5 Oct 2011, 6:37:41 UTC

This is not pretty.
The kitties are getting rather depressed.
My top rig has been doing MW since yesterday just to keep it warm because it runs 24/7.
Two other rigs are out of Seti work, and can't even connect to report the last of what they've completed.
The other 5 slower rigs are still crunching up their last.

The main router is on the fritz, and the servers are tied up in knots when you can connect. Even the forums are doggy at times.

Come on back, Seti.

The kitties will leave the light on for ya.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1158961 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1158966 - Posted: 5 Oct 2011, 7:05:56 UTC - in response to Message 1158961.  

This is not pretty.
The kitties are getting rather depressed.
My top rig has been doing MW since yesterday just to keep it warm because it runs 24/7.
Two other rigs are out of Seti work, and can't even connect to report the last of what they've completed.
The other 5 slower rigs are still crunching up their last.

The main router is on the fritz, and the servers are tied up in knots when you can connect. Even the forums are doggy at times.

Come on back, Seti.

The kitties will leave the light on for ya.


Set NNT and then update, they will then report.

Don't forget to unset NNT after.



Kevin


ID: 1158966 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1158971 - Posted: 5 Oct 2011, 7:44:16 UTC - in response to Message 1158966.  

This is not pretty.
The kitties are getting rather depressed.
My top rig has been doing MW since yesterday just to keep it warm because it runs 24/7.
Two other rigs are out of Seti work, and can't even connect to report the last of what they've completed.
The other 5 slower rigs are still crunching up their last.

The main router is on the fritz, and the servers are tied up in knots when you can connect. Even the forums are doggy at times.

Come on back, Seti.

The kitties will leave the light on for ya.


Set NNT and then update, they will then report.

Don't forget to unset NNT after.




Is that a bug work-around to be able to contact the scheduler again?
Janice
ID: 1158971 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19080
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1158972 - Posted: 5 Oct 2011, 7:50:52 UTC

Just to let you know I rec'd one task, GPU VHAR, at 07:49:46.
ID: 1158972 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1158973 - Posted: 5 Oct 2011, 7:53:09 UTC - in response to Message 1158971.  
Last modified: 5 Oct 2011, 7:53:41 UTC


Set NNT and then update, they will then report.

Don't forget to unset NNT after.




Is that a bug work-around to be able to contact the scheduler again?


On a 6.12.x client setting NNT forces 'report straight after upload' behaviour.
Which would help if the problem was getting the client to report and not getting the report through...
ID: 1158973 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1158974 - Posted: 5 Oct 2011, 7:57:22 UTC - in response to Message 1158973.  

ahh gotcha. So no help in the just can not connect to scheduler. I have had 251 completed tasks here for hours on 6.10.58, and simply can not get there from here.
Janice
ID: 1158974 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19080
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1158975 - Posted: 5 Oct 2011, 7:58:00 UTC

Time in last post was UTC
But checking back I had rec'd another about 20 mins earlier.

Strange thing is that Results ready to send is still at 783,867 thats over an hour after I last posted that number. So either the page is lying, or "Results ready to send" cannot be greater than that, for some reason. Unpurged WU/tasks maybe?
ID: 1158975 · Report as offensive
S@NL - John van Gorsel
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 193
Credit: 139,673,078
RAC: 0
Netherlands
Message 1158976 - Posted: 5 Oct 2011, 8:00:59 UTC - in response to Message 1158966.  


Set NNT and then update, they will then report.

Don't forget to unset NNT after.


It could be a coincidence but it worked for me:

5-10-2011 9:51:41 | SETI@home | work fetch suspended by user
5-10-2011 9:51:43 | SETI@home | update requested by user
5-10-2011 9:51:46 | SETI@home | Sending scheduler request: Requested by user.
5-10-2011 9:51:46 | SETI@home | Reporting 361 completed tasks, not requesting new tasks
5-10-2011 9:51:56 | SETI@home | Scheduler request completed
5-10-2011 9:52:13 | SETI@home | work fetch resumed by user


The scheduler is still unreachable when asking for new work though...


Seti@Netherlands website
ID: 1158976 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (57) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.