Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 29 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36774
Credit: 261,360,520
RAC: 489
Australia
Message 1889436 - Posted: 12 Sep 2017, 23:30:36 UTC

Another early outrage start and quick recovery even with all them big w/u's. :-)

Cheers.
ID: 1889436 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1889465 - Posted: 13 Sep 2017, 3:44:28 UTC - in response to Message 1889436.  

Another early outrage start and quick recovery even with all them big w/u's. :-)

Took a couple of hours for the splitters to really get going- the ready-to-send buffer actually ran dry for a while till the splitters switched up a few gears.
Grant
Darwin NT
ID: 1889465 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22526
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1889716 - Posted: 14 Sep 2017, 11:30:03 UTC
Last modified: 14 Sep 2017, 11:30:30 UTC

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1889716 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1889731 - Posted: 14 Sep 2017, 13:33:07 UTC - in response to Message 1889716.  
Last modified: 14 Sep 2017, 13:38:55 UTC

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(

Dunno what's going on there.
My pendings have stayed pretty even, give or take, with 7 rigs running. But no major jump.
State: All (7342) · In progress (2548) · Validation pending (2568) · Validation inconclusive (99) · Valid (2112)
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1889731 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1889780 - Posted: 14 Sep 2017, 18:57:03 UTC - in response to Message 1889716.  

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
If you have 'ghosts', then your wingmates have pendings - and vice-versa. As you do, so will you be done to!
ID: 1889780 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1889783 - Posted: 14 Sep 2017, 19:20:29 UTC

The WoW event ended. Some users may have shut down their rigs without emptying the cache.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1889783 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1889795 - Posted: 14 Sep 2017, 20:26:04 UTC - in response to Message 1889780.  

Anyone else suffering from an ever growing "pendings" list.
A few weeks ago it was ~3000, now at ~6.5k (tasks actually in hand is still ~1300. - Don't take the "tasks in hand figure as valid as I'm suffering a massive pile of ghosts just now - own fault I upset my front-end firewall and didn't notice for a few hours :-(
If you have 'ghosts', then your wingmates have pendings - and vice-versa. As you do, so will you be done to!


. . There is a relatively simple, if tedious, way to recover your ghosted tasks. The really tedious part is that you only get 20 at a time and if there are lots them then you have to repeat it over and over. But it will clear up the numbers for everybody involved.

. . The actual process is in a file on another rig so I cannot repeat it at the moment.

Stephen

.
ID: 1889795 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1889797 - Posted: 14 Sep 2017, 20:30:26 UTC - in response to Message 1889783.  

The WoW event ended. Some users may have shut down their rigs without emptying the cache.


. . That is true, and a very annoying and unnecessary thing to do. Good etiquette is to empty a machine's cache before shutting it down. Off course this is not possible in the event of hardware failure but how often is that the case?

Stephen

:(
ID: 1889797 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22526
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1889880 - Posted: 15 Sep 2017, 6:06:34 UTC

I too thought it was fall-out from WOW, but on a computer with ~1700 pendings only about 250 date back that far, with the majority being from the last 7 days.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1889880 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1889884 - Posted: 15 Sep 2017, 6:47:51 UTC

My oldest pendings go back to the last week of July which I hope will clear out this week if the wingmen report on time. And don't forget the validation servers had a hiccup on Aug 8-9 and missed all validations on those days. Have to wait for the original deadlines to pass to clear the validators on the second pass.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1889884 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1889927 - Posted: 15 Sep 2017, 15:02:11 UTC - in response to Message 1889795.  

There is a relatively simple, if tedious, way to recover your ghosted tasks. The really tedious part is that you only get 20 at a time and if there are lots them then you have to repeat it over and over. But it will clear up the numbers for everybody involved.

. . The actual process is in a file on another rig so I cannot repeat it at the moment.

Stephen

.

This is what I do to recover ghosts WUs, text is copied from someone else, dont remember who...

As an alternative to the "ghost" recovery process that I had previously posted quite a while back (involving client_state backup and restore, etc.), I have another one to offer that I think is simpler and can be entirely controlled within BOINC Manager. It just requires a fast finger on your mouse button, since the key here is to interrupt a scheduler request before it completes. I just used this quite successfully over the weekend to recover 127 "ghosts" that I had created on Friday when I started trying to run the "Special" app on Linux. I just went back to that machine occasionally when I had a few minutes and ran it when I knew I had room in the queue for at least 20 tasks to be recovered, so 7 times in all. I figured that, since it was my fault for creating the "ghosts", the least I could do was try to recover them as a courtesy to my wingmen.

1) Set "No New Tasks"

2) Make sure you have enough room in your work buffer to accommodate your "ghosts" (up to a maximum of 20 per request). If you're not one of those who typically have a queue which reaches the task limits, simply increasing the size of your work buffer should be sufficient. Otherwise, you'll have to wait until you've reported enough completed tasks to free up the necessary space in your queue.

3) Wait for, or initiate (using Update), a scheduler request that reports at least one completed task.

4) As soon as you see the scheduler request commence, interrupt it by IMMEDIATELY clicking "Suspend network activity". I find it easiest to first open the "Activity" menu drop-down and, while keeping my mouse pointer poised over "Suspend network activity", keep a close eye on the Event Log awaiting the start of the scheduler request. Then, as soon as the scheduler request commences, just CLICK.

If successful, the Event Log will show lines like this, and stop:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Not requesting tasks: "no new tasks" requested via Manager
Suspending network activity - user request
-------------------------------------------------

If you get a "Scheduler request completed" line before the "Suspending network activity" line, you weren't quick enough. For me, at least, that hasn't been a problem.

5) To be on the safe side, at this point, I usually Exit BOINC completely, shutting down all running tasks, wait a minute or so, then restart BOINC. Note that network activity should still be suspended when BOINC resumes. You should also still see your task(s) "Ready to report".

6) "Allow New Tasks"

7) Resume network activity (always, or based on preferences, whichever is normal for you). If a scheduler request isn't triggered automatically, click "Update". The Event Log should now show something such as:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Scheduler request completed: got 4 new tasks
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.49.vlar_1
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.189.vlar_0
Resent lost task blc4_2bit_guppi_57432_25217_HIP57328_0003.22901.831.18.27.241.vlar_0
Resent lost task 01au09aa.11976.21340.7.34.21_1
-------------------------------------------------

followed by the usual task download messages.

NOTE: Since 20 "ghosts" seem to be the maximum that can be retrieved in one request, those with more than 20 "ghosts" will need to repeat the process multiple times, at least 5 minutes apart.

NOTE 2: If any of the "ghost" tasks are Arecibo VLARs, the scheduler may try to send them to an NVIDIA GPU (if you have one), which will fail, marking the task as "Abandoned". At least it's no longer a ghost.
ID: 1889927 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1890274 - Posted: 16 Sep 2017, 22:20:21 UTC

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress
ID: 1890274 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1890284 - Posted: 16 Sep 2017, 23:21:40 UTC - in response to Message 1890274.  

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress

Same for all the other blc_04 data.
Once the blc_05 data was loaded the splitters moved to those files, the only blc_04 WUs i've received since the blc_05 files were loaded have been re-sends.
Grant
Darwin NT
ID: 1890284 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1890307 - Posted: 17 Sep 2017, 1:08:37 UTC - in response to Message 1890284.  

blc04_2bit_blc04_guppi_57898_16542_DIAG_KIC8462852_0017 52.39 GB has been sitting at (108) for a number of days but it does not seem to be slowing splitter progress

Same for all the other blc_04 data.
Once the blc_05 data was loaded the splitters moved to those files, the only blc_04 WUs i've received since the blc_05 files were loaded have been re-sends.

That is a factor that I hadn't even considered thanks Grant. At least with only BLC 04 resends going out it will help clean the database up a little bit by removing outstanding results
ID: 1890307 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1890482 - Posted: 17 Sep 2017, 23:41:52 UTC

I'm assuming we are in a Arecibo VLAR storm from the splitters since all machines are receiving the "no work is available" message for the past couple of hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1890482 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1890484 - Posted: 17 Sep 2017, 23:46:19 UTC - in response to Message 1890482.  

I'm assuming we are in a Arecibo VLAR storm from the splitters since all machines are receiving the "no work is available" message for the past couple of hours.


. . Have you done the "kick the servers in the pants" thing yet??

. . I find I have to do that regularly and it gets work pretty consistently even when the previous response has been "no work available".

Stephen

??
ID: 1890484 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36774
Credit: 261,360,520
RAC: 489
Australia
Message 1890485 - Posted: 18 Sep 2017, 0:01:06 UTC

I just had a look at the logs on my rigs and everything has been very fine here Keith.

Cheers.
ID: 1890485 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1890488 - Posted: 18 Sep 2017, 0:14:37 UTC - in response to Message 1890484.  

No, I haven't. Been watching Nascar and football. It is not usual for ALL machines to need the "kick in the pants" at the same time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1890488 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1890491 - Posted: 18 Sep 2017, 0:46:08 UTC - in response to Message 1890488.  

Well, that was a chore. Unusual that all machines were synched up with network communication schedules. Had to sequentially go through the "kick in the pants" routine. Still ended up with two machines synched up which doesn't help with hitting the buffer at the same time for the limited 100 tasks available.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1890491 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1890492 - Posted: 18 Sep 2017, 0:48:17 UTC

Still way down on the Linux machine. I got all of 3 task after the kick routine. Down about 200 now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1890492 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.