Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10518
Credit: 142,939,985
RAC: 79,530
Australia
Message 1905837 - Posted: 9 Dec 2017, 5:20:34 UTC - in response to Message 1905832.  
Last modified: 9 Dec 2017, 5:28:21 UTC

Scheduler is still messed up. Getting this on my most stable and consistent machine.

Does that machine do AP work?
I found with my system that not having AP installed & selected made it harder to get MB work.

Even now, generally when there is plenty of AP, Arecibo & GBT work I don't have an issue getting MB work.
When the AP work stops flowing, or most of the MB work is one type or the other, that's when I have problems getting work again. And it's mostly on my i7, the C2D generally chugs along (although it too occasionally has issues).

EDIT- having said all that, something is screwier than usual with the Scheduler allocating work as I've had to do the triple update 4 times so far today to keep the work coming.
And I wonder how long till we run out of disk space with all the WUs waiting validation & deletion accumulating? At least the rate of return for work has dropped down to 110,000/hr.
Grant
Darwin NT
ID: 1905837 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 6131
Credit: 436,651,030
RAC: 966,617
United States
Message 1905840 - Posted: 9 Dec 2017, 5:29:45 UTC
Last modified: 9 Dec 2017, 5:41:24 UTC

Actually getting the "reached a limit of tasks in progress" across all machines now. Caches falling ....... falllingg ...... fallliiinnnggg .... boom!

I haven't had to resort to toggling off the AP project for a long time now. Guess I try that next.

[Edit] Nope, that didn't work either.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1905840 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10518
Credit: 142,939,985
RAC: 79,530
Australia
Message 1905845 - Posted: 9 Dec 2017, 5:44:11 UTC - in response to Message 1905840.  
Last modified: 9 Dec 2017, 5:49:34 UTC

I haven't had to resort to toggling off the AP project for a long time now. Guess I try that next.

These days I just leave the "Use only selected..." and "if no work for selected" all on Yes.

EDIT- noticed the cache running down again, triple update & 14WUs on the next automatic request.
The way this issue manifests itself really is rather peculiar.
Grant
Darwin NT
ID: 1905845 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2529
Credit: 307,222,116
RAC: 661,206
Canada
Message 1905851 - Posted: 9 Dec 2017, 6:05:10 UTC - in response to Message 1905827.  

Thanks for the explanation. Do you have to set NNT before hitting Update?
It only takes 2 updates, but it must be 2 that are NOT returning tasks in a row. The third is if you need to report something. It can be a bit tricky at times with fast GPUs to find the right time to make the 'nudge'.

I generally don't have to nudge them often. When I seen your messages here I checked and was down >10% on them all. But they all recovered on their own since, some back down now. I very rarely see them drop below 66% without them recovering on their own.
ID: 1905851 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10518
Credit: 142,939,985
RAC: 79,530
Australia
Message 1905854 - Posted: 9 Dec 2017, 6:12:03 UTC

I seem to be getting all the noisy GBT WUs that were mentioned earlier. Reporting 10+ results at a time at the moment.
Grant
Darwin NT
ID: 1905854 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 6131
Credit: 436,651,030
RAC: 966,617
United States
Message 1905857 - Posted: 9 Dec 2017, 6:22:29 UTC - in response to Message 1905851.  

Thanks for the explanation. Do you have to set NNT before hitting Update?
It only takes 2 updates, but it must be 2 that are NOT returning tasks in a row. The third is if you need to report something. It can be a bit tricky at times with fast GPUs to find the right time to make the 'nudge'.

I generally don't have to nudge them often. When I seen your messages here I checked and was down >10% on them all. But they all recovered on their own since, some back down now. I very rarely see them drop below 66% without them recovering on their own.

When this type of issue develops on my machines, it is always dire. I rarely see them recover on their own without some intervention. I can go for several hours during the wee hours when I'm sleeping and no machine will be running tasks from looking at the Event Logs on the machines. The first task of the morning is to open up BoincTasks and see what state the machines are in. Then round robin to kick every one over to get tasks again.

I had a long run of no issues up till the last server hiccup last week. Longest I can remember back to the 'ole days when the servers and machines just ran without intervention on my part.

The Triple Update would never work on the Linux machine I guess since it is always reporting tasks every five minutes .... that is when it has work. The Windows machines have been making a slow recovery back to full. The Linux machine continues to fall.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1905857 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10518
Credit: 142,939,985
RAC: 79,530
Australia
Message 1905862 - Posted: 9 Dec 2017, 6:29:40 UTC - in response to Message 1905857.  

The Triple Update would never work on the Linux machine I guess since it is always reporting tasks every five minutes

My i7 with 2*GTX1070s is the one that has the most issues, even though on every Scheduler request it is reporting work. The triple update is what gets it downloading work again.
Grant
Darwin NT
ID: 1905862 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2529
Credit: 307,222,116
RAC: 661,206
Canada
Message 1905863 - Posted: 9 Dec 2017, 6:35:00 UTC - in response to Message 1905857.  

The Triple Update would never work on the Linux machine I guess since it is always reporting tasks every five minutes
That doesn't matter. Just pick any time that you can ... Report tasks, then make 2 more updates before reporting any more.
ID: 1905863 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 6131
Credit: 436,651,030
RAC: 966,617
United States
Message 1905865 - Posted: 9 Dec 2017, 6:39:32 UTC - in response to Message 1905863.  

OK, got it. Was confused as to timing. Just tried the technique. We'll see.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1905865 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 15122
Credit: 196,282,618
RAC: 53,072
Australia
Message 1905866 - Posted: 9 Dec 2017, 6:50:42 UTC

According to my logs there's been no problems getting work here.

Cheers.
ID: 1905866 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 6131
Credit: 436,651,030
RAC: 966,617
United States
Message 1905867 - Posted: 9 Dec 2017, 6:57:03 UTC - in response to Message 1905865.  

I just tried through 3 cycles. I must have not done it right the first 2 times cause I still got the no tasks are available message. The third try was either correct or a consequence, but I got 79 tasks after the technique. I was about to do the ghost recovery protocol since I was done over 120 tasks. I'll see if the next request pulls me back flush. It did. Yay!
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1905867 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2529
Credit: 307,222,116
RAC: 661,206
Canada
Message 1905868 - Posted: 9 Dec 2017, 7:00:31 UTC - in response to Message 1905867.  

Cool.
ID: 1905868 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 10518
Credit: 142,939,985
RAC: 79,530
Australia
Message 1905871 - Posted: 9 Dec 2017, 7:14:03 UTC - in response to Message 1905866.  

According to my logs there's been no problems getting work here.

As I said, it's a particularly weird issue.
Grant
Darwin NT
ID: 1905871 · Report as offensive
rob smith Special Project $250 donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 16672
Credit: 350,516,779
RAC: 195,254
United Kingdom
Message 1905881 - Posted: 9 Dec 2017, 8:42:50 UTC - in response to Message 1905748.  

David - you are correct Beta data does not get into the main database. Runs on Beta can be for all sorts of reasons but are all to do with testing something in a near production environment.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1905881 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3817
Credit: 92,530,825
RAC: 152,782
Australia
Message 1905898 - Posted: 9 Dec 2017, 11:55:19 UTC - in response to Message 1905854.  
Last modified: 9 Dec 2017, 12:02:03 UTC

I seem to be getting all the noisy GBT WUs that were mentioned earlier. Reporting 10+ results at a time at the moment.


. . I have never seen so many noisy GBT WUs as I have in the past few weeks. It's a bit of a worry for a "radio noise free" location such as Greenbank.

Stephen

??
ID: 1905898 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3817
Credit: 92,530,825
RAC: 152,782
Australia
Message 1905899 - Posted: 9 Dec 2017, 12:00:30 UTC - in response to Message 1905866.  

According to my logs there's been no problems getting work here.

Cheers.


. . Oh you always say that :) ... except for the big outage the other week. But I must admit, apart from a few "no tasks available" responses, this weekend I am getting work pretty consistently myself.

Stephen

<touching wood>
ID: 1905899 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12342
Credit: 126,917,883
RAC: 36,558
United Kingdom
Message 1905903 - Posted: 9 Dec 2017, 12:16:24 UTC - in response to Message 1905898.  

I have never seen so many noisy GBT WUs as I have in the past few weeks. It's a bit of a worry for a "radio noise free" location such as Greenbank.
Maybe they've just tuned in to ET's equivalent of 'I Love Lucy', and our clients are being snooty about it?
ID: 1905903 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4470
Credit: 296,449,858
RAC: 512,754
United States
Message 1905962 - Posted: 9 Dec 2017, 18:00:30 UTC

For me the Results pages are Slow to respond, and I'm having trouble getting tasks for two machines. One of the machines has yet to respond and is getting very low on tasks. I may have to try the Bunker option for tonight's coldest night of the season.
ID: 1905962 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 6131
Credit: 436,651,030
RAC: 966,617
United States
Message 1905967 - Posted: 9 Dec 2017, 18:22:05 UTC - in response to Message 1905962.  
Last modified: 9 Dec 2017, 18:24:20 UTC

I am seeing that too because of the assimilators and purgers being offline for the past day. The Valid tasks are building rapidly and aren't being trimmed in the database. Things are getting slow. The Linux cruncher is continuing to have troubles getting work. For some reason the scheduler keeps throwing 10-20 minute GPU backoffs at it.
[Edit] Just looked and the assimilators have made an appearance finally. That should improve things shortly. Someone must be in the lab this morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1905967 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3817
Credit: 92,530,825
RAC: 152,782
Australia
Message 1905976 - Posted: 9 Dec 2017, 19:21:54 UTC - in response to Message 1905903.  

I have never seen so many noisy GBT WUs as I have in the past few weeks. It's a bit of a worry for a "radio noise free" location such as Greenbank.
Maybe they've just tuned in to ET's equivalent of 'I Love Lucy', and our clients are being snooty about it?


. . Being picky ? :)

Stephen

:)
ID: 1905976 · Report as offensive
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.