Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 31 · Next

AuthorMessage
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2433
Credit: 184,394,869
RAC: 359,601
United States
Message 1889423 - Posted: 12 Sep 2017, 22:40:32 UTC

Project back relatively early today. Hope that is a beginning of a trend.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1889423 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2636
Credit: 48,283,033
RAC: 133,182
Australia
Message 1889433 - Posted: 12 Sep 2017, 23:23:13 UTC - in response to Message 1889175.  
Last modified: 12 Sep 2017, 23:23:34 UTC

Web site went AWOL for a few minutes there, and the last Scheduler request timed out.


. . Also the servers went AWOL last night during the outage, all uploads failed to find the servers for quite a while, and since one of my rigs is still accessing the servers by their direct address (not DNS) then they were themselves unavailable.

Stephen

:(
ID: 1889433 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 12605
Credit: 169,334,082
RAC: 86,739
Australia
Message 1889436 - Posted: 12 Sep 2017, 23:30:36 UTC

Another early outrage start and quick recovery even with all them big w/u's. :-)

Cheers.
ID: 1889436 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8887
Credit: 115,111,170
RAC: 70,211
Australia
Message 1889465 - Posted: 13 Sep 2017, 3:44:28 UTC - in response to Message 1889436.  

Another early outrage start and quick recovery even with all them big w/u's. :-)

Took a couple of hours for the splitters to really get going- the ready-to-send buffer actually ran dry for a while till the splitters switched up a few gears.
Grant
Darwin NT
ID: 1889465 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15199
Credit: 251,936,905
RAC: 326,516
United Kingdom
Message 1889716 - Posted: 14 Sep 2017, 11:30:03 UTC
Last modified: 14 Sep 2017, 11:30:30 UTC

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1889716 · Report as offensive
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49054
Credit: 878,206,179
RAC: 200,283
United States
Message 1889731 - Posted: 14 Sep 2017, 13:33:07 UTC - in response to Message 1889716.  
Last modified: 14 Sep 2017, 13:38:55 UTC

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(

Dunno what's going on there.
My pendings have stayed pretty even, give or take, with 7 rigs running. But no major jump.
State: All (7342) · In progress (2548) · Validation pending (2568) · Validation inconclusive (99) · Valid (2112)
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1889731 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,231,870
RAC: 70,214
United Kingdom
Message 1889780 - Posted: 14 Sep 2017, 18:57:03 UTC - in response to Message 1889716.  

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
If you have 'ghosts', then your wingmates have pendings - and vice-versa. As you do, so will you be done to!
ID: 1889780 · Report as offensive
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 269,487,306
RAC: 298,357
Finland
Message 1889783 - Posted: 14 Sep 2017, 19:20:29 UTC

The WoW event ended. Some users may have shut down their rigs without emptying the cache.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1889783 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2636
Credit: 48,283,033
RAC: 133,182
Australia
Message 1889795 - Posted: 14 Sep 2017, 20:26:04 UTC - in response to Message 1889780.  

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
If you have 'ghosts', then your wingmates have pendings - and vice-versa. As you do, so will you be done to!


. . There is a relatively simple, if tedious, way to recover your ghosted tasks. The really tedious part is that you only get 20 at a time and if there are lots them then you have to repeat it over and over. But it will clear up the numbers for everybody involved.

. . The actual process is in a file on another rig so I cannot repeat it at the moment.

Stephen

.
ID: 1889795 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2636
Credit: 48,283,033
RAC: 133,182
Australia
Message 1889797 - Posted: 14 Sep 2017, 20:30:26 UTC - in response to Message 1889783.  

The WoW event ended. Some users may have shut down their rigs without emptying the cache.


. . That is true, and a very annoying and unnecessary thing to do. Good etiquette is to empty a machine's cache before shutting it down. Off course this is not possible in the event of hardware failure but how often is that the case?

Stephen

:(
ID: 1889797 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15199
Credit: 251,936,905
RAC: 326,516
United Kingdom
Message 1889880 - Posted: 15 Sep 2017, 6:06:34 UTC

I too thought it was fall-out from WOW, but on a computer with ~1700 pendings only about 250 date back that far, with the majority being from the last 7 days.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1889880 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2433
Credit: 184,394,869
RAC: 359,601
United States
Message 1889884 - Posted: 15 Sep 2017, 6:47:51 UTC

My oldest pendings go back to the last week of July which I hope will clear out this week if the wingmen report on time. And don't forget the validation servers had a hiccup on Aug 8-9 and missed all validations on those days. Have to wait for the original deadlines to pass to clear the validators on the second pass.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1889884 · Report as offensive
JohnDKCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 974
Credit: 123,516,385
RAC: 132,742
Denmark
Message 1889927 - Posted: 15 Sep 2017, 15:02:11 UTC - in response to Message 1889795.  

There is a relatively simple, if tedious, way to recover your ghosted tasks. The really tedious part is that you only get 20 at a time and if there are lots them then you have to repeat it over and over. But it will clear up the numbers for everybody involved.

. . The actual process is in a file on another rig so I cannot repeat it at the moment.

Stephen

.

This is what I do to recover ghosts WUs, text is copied from someone else, dont remember who...

As an alternative to the "ghost" recovery process that I had previously posted quite a while back (involving client_state backup and restore, etc.), I have another one to offer that I think is simpler and can be entirely controlled within BOINC Manager. It just requires a fast finger on your mouse button, since the key here is to interrupt a scheduler request before it completes. I just used this quite successfully over the weekend to recover 127 "ghosts" that I had created on Friday when I started trying to run the "Special" app on Linux. I just went back to that machine occasionally when I had a few minutes and ran it when I knew I had room in the queue for at least 20 tasks to be recovered, so 7 times in all. I figured that, since it was my fault for creating the "ghosts", the least I could do was try to recover them as a courtesy to my wingmen.

1) Set "No New Tasks"

2) Make sure you have enough room in your work buffer to accommodate your "ghosts" (up to a maximum of 20 per request). If you're not one of those who typically have a queue which reaches the task limits, simply increasing the size of your work buffer should be sufficient. Otherwise, you'll have to wait until you've reported enough completed tasks to free up the necessary space in your queue.

3) Wait for, or initiate (using Update), a scheduler request that reports at least one completed task.

4) As soon as you see the scheduler request commence, interrupt it by IMMEDIATELY clicking "Suspend network activity". I find it easiest to first open the "Activity" menu drop-down and, while keeping my mouse pointer poised over "Suspend network activity", keep a close eye on the Event Log awaiting the start of the scheduler request. Then, as soon as the scheduler request commences, just CLICK.

If successful, the Event Log will show lines like this, and stop:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Not requesting tasks: "no new tasks" requested via Manager
Suspending network activity - user request
-------------------------------------------------

If you get a "Scheduler request completed" line before the "Suspending network activity" line, you weren't quick enough. For me, at least, that hasn't been a problem.

5) To be on the safe side, at this point, I usually Exit BOINC completely, shutting down all running tasks, wait a minute or so, then restart BOINC. Note that network activity should still be suspended when BOINC resumes. You should also still see your task(s) "Ready to report".

6) "Allow New Tasks"

7) Resume network activity (always, or based on preferences, whichever is normal for you). If a scheduler request isn't triggered automatically, click "Update". The Event Log should now show something such as:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Scheduler request completed: got 4 new tasks
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.49.vlar_1
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.189.vlar_0
Resent lost task blc4_2bit_guppi_57432_25217_HIP57328_0003.22901.831.18.27.241.vlar_0
Resent lost task 01au09aa.11976.21340.7.34.21_1
-------------------------------------------------

followed by the usual task download messages.

NOTE: Since 20 "ghosts" seem to be the maximum that can be retrieved in one request, those with more than 20 "ghosts" will need to repeat the process multiple times, at least 5 minutes apart.

NOTE 2: If any of the "ghost" tasks are Arecibo VLARs, the scheduler may try to send them to an NVIDIA GPU (if you have one), which will fail, marking the task as "Abandoned". At least it's no longer a ghost.
ID: 1889927 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 954
Credit: 8,637,359
RAC: 837
New Zealand
Message 1890274 - Posted: 16 Sep 2017, 22:20:21 UTC

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress
ID: 1890274 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8887
Credit: 115,111,170
RAC: 70,211
Australia
Message 1890284 - Posted: 16 Sep 2017, 23:21:40 UTC - in response to Message 1890274.  

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress

Same for all the other blc_04 data.
Once the blc_05 data was loaded the splitters moved to those files, the only blc_04 WUs i've received since the blc_05 files were loaded have been re-sends.
Grant
Darwin NT
ID: 1890284 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 954
Credit: 8,637,359
RAC: 837
New Zealand
Message 1890307 - Posted: 17 Sep 2017, 1:08:37 UTC - in response to Message 1890284.  

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress

Same for all the other blc_04 data.
Once the blc_05 data was loaded the splitters moved to those files, the only blc_04 WUs i've received since the blc_05 files were loaded have been re-sends.

That is a factor that I hadn't even considered thanks Grant. At least with only BLC 04 resends going out it will help clean the database up a little bit by removing outstanding results
ID: 1890307 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2433
Credit: 184,394,869
RAC: 359,601
United States
Message 1890482 - Posted: 17 Sep 2017, 23:41:52 UTC

I'm assuming we are in a Arecibo VLAR storm from the splitters since all machines are receiving the "no work is available" message for the past couple of hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1890482 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2636
Credit: 48,283,033
RAC: 133,182
Australia
Message 1890484 - Posted: 17 Sep 2017, 23:46:19 UTC - in response to Message 1890482.  

I'm assuming we are in a Arecibo VLAR storm from the splitters since all machines are receiving the "no work is available" message for the past couple of hours.


. . Have you done the "kick the servers in the pants" thing yet??

. . I find I have to do that regularly and it gets work pretty consistently even when the previous response has been "no work available".

Stephen

??
ID: 1890484 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 12605
Credit: 169,334,082
RAC: 86,739
Australia
Message 1890485 - Posted: 18 Sep 2017, 0:01:06 UTC

I just had a look at the logs on my rigs and everything has been very fine here Keith.

Cheers.
ID: 1890485 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2433
Credit: 184,394,869
RAC: 359,601
United States
Message 1890488 - Posted: 18 Sep 2017, 0:14:37 UTC - in response to Message 1890484.  

No, I haven't. Been watching Nascar and football. It is not usual for ALL machines to need the "kick in the pants" at the same time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1890488 · Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.