Panic Mode On (112) Server Problems?

Message boards : Number crunching : Panic Mode On (112) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 33 · Next

AuthorMessage
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1934182 - Posted: 7 May 2018, 13:15:42 UTC - in response to Message 1934178.  

Me thinks you are screwed (unless there is another way to it) you need to interrupt tasks when being reported to get the resends to come in.
ID: 1934182 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1934184 - Posted: 7 May 2018, 13:22:01 UTC - in response to Message 1934182.  

Me thinks you are screwed (unless there is another way to it) you need to interrupt tasks when being reported to get the resends to come in.

Ok. Will give E@H a little help while waiting to fix the problem here. Thanks
ID: 1934184 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1934185 - Posted: 7 May 2018, 13:27:14 UTC - in response to Message 1934178.  

In the mean time, all work done but still appearing Em progresso (51) in my host task list.
Sure they are ghosts, did anyone remember the post where the ghost DL sequence is explained?

Did save the explanation, sorry don´t remember from who

As an alternative to the "ghost" recovery process that I had previously posted quite a while back (involving client_state backup and restore, etc.), I have another one to offer that I think is simpler and can be entirely controlled within BOINC Manager. It just requires a fast finger on your mouse button, since the key here is to interrupt a scheduler request before it completes. I just used this quite successfully over the weekend to recover 127 "ghosts" that I had created on Friday when I started trying to run the "Special" app on Linux. I just went back to that machine occasionally when I had a few minutes and ran it when I knew I had room in the queue for at least 20 tasks to be recovered, so 7 times in all. I figured that, since it was my fault for creating the "ghosts", the least I could do was try to recover them as a courtesy to my wingmen.

1) Set "No New Tasks"

2) Make sure you have enough room in your work buffer to accommodate your "ghosts" (up to a maximum of 20 per request). If you're not one of those who typically have a queue which reaches the task limits, simply increasing the size of your work buffer should be sufficient. Otherwise, you'll have to wait until you've reported enough completed tasks to free up the necessary space in your queue.

3) Wait for, or initiate (using Update), a scheduler request that reports at least one completed task.

4) As soon as you see the scheduler request commence, interrupt it by IMMEDIATELY clicking "Suspend network activity". I find it easiest to first open the "Activity" menu drop-down and, while keeping my mouse pointer poised over "Suspend network activity", keep a close eye on the Event Log awaiting the start of the scheduler request. Then, as soon as the scheduler request commences, just CLICK.

If successful, the Event Log will show lines like this, and stop:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Not requesting tasks: "no new tasks" requested via Manager
Suspending network activity - user request
-------------------------------------------------

If you get a "Scheduler request completed" line before the "Suspending network activity" line, you weren't quick enough. For me, at least, that hasn't been a problem.

5) To be on the safe side, at this point, I usually Exit BOINC completely, shutting down all running tasks, wait a minute or so, then restart BOINC. Note that network activity should still be suspended when BOINC resumes. You should also still see your task(s) "Ready to report".

6) "Allow New Tasks"

7) Resume network activity (always, or based on preferences, whichever is normal for you). If a scheduler request isn't triggered automatically, click "Update". The Event Log should now show something such as:
-------------------------------------------------
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Scheduler request completed: got 4 new tasks
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.49.vlar_1
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.189.vlar_0
Resent lost task blc4_2bit_guppi_57432_25217_HIP57328_0003.22901.831.18.27.241.vlar_0
Resent lost task 01au09aa.11976.21340.7.34.21_1
-------------------------------------------------

followed by the usual task download messages.

NOTE: Since 20 "ghosts" seem to be the maximum that can be retrieved in one request, those with more than 20 "ghosts" will need to repeat the process multiple times, at least 5 minutes apart.

NOTE 2: If any of the "ghost" tasks are Arecibo VLARs, the scheduler may try to send them to an NVIDIA GPU (if you have one), which will fail, marking the task as "Abandoned". At least it's no longer a ghost.
ID: 1934185 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1934186 - Posted: 7 May 2018, 13:49:56 UTC - in response to Message 1934185.  


3) Wait for, or initiate (using Update), a scheduler request that reports at least one completed task.


Thanks for your step by step explanation, but i believe i'm screwed, i have no more WU on my cache. All was crunched and sended .
ID: 1934186 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22227
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1934193 - Posted: 7 May 2018, 14:11:15 UTC

We are all heading into "Sit back and wait" mode until someone who can gets into the lab and gives the servers a prod to get them moving again.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1934193 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1934194 - Posted: 7 May 2018, 14:12:55 UTC - in response to Message 1934087.  
Last modified: 7 May 2018, 14:29:08 UTC

. . Probably just this machine then ...

Your Windows system has 1 GPU, and is doing CPU crunching, so that would mean a total of 200WUs.
According to your task list you've got 206.
So since you've got 6 more than the limit, and you're getting CPU work, I guess it won't let you have any more GPU work till those GPU Ghosts time out, more than 6 GPU WUs have been reported, or you re-claim them.

EDIT-
WTF? That system now shows 313 tasks in progress. 113 more than it should have.
Are you playing with the app_info.xml again?


. . Nope I have 113 ghosts left from the 1st attempt but after I wrote those messages I got a load of work. I should have posted that but I went out. so that was a full cache plus the 113 ghosts.

. . Now I am getting no work at all on the Linux boxes. They have been getting "no tasks available" since 2:30 pm AEST. Though occassionally they have received a small batch of 2 or 3 tasks at long irregualr intervals. Bertie has been out of work completely on CPU and GPU for over 4 hours. Not a good sign since there are 600K tasks in the RTS hopper ...

Stephen

PS, I have since seen the messages about the SSP being frozen ...

:(
ID: 1934194 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1934196 - Posted: 7 May 2018, 14:20:22 UTC - in response to Message 1934178.  
Last modified: 7 May 2018, 14:22:09 UTC

Hope Eric or Jeff get up early today.

+ 1
In the mean time, all work done but still appearing Em progresso (51) in my host task list.
Sure they are ghosts, did anyone remember the post where the ghost DL sequence is explained?


. . Set 'No new tasks'

. . make suren there are enough uploads to have time to go through the sequence and that it has been 5 mins since last update.

. . (I have open the event window to monitor the progress, the transfer window to know when the last task is uploading and the options window open to be ready to shut down the internet access)

. . Start the uploads, when the last transfer is running watch the event window for the 1st sign of a response then shut down the network.

. . Close Boinc down (client) give it a little while then open BOINC again. Set fetch new work and re-enable the network, you should get 20 resends.

Stephen

. . I have to do it from time to time.

. . And as Brent said, if you have no work to complete and upload then you are done like a duck dinner. It won't work.

> >
ID: 1934196 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1934198 - Posted: 7 May 2018, 14:21:24 UTC - in response to Message 1934194.  

There is probably nothing in RTS.
The SSP page is 32 hours stale.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1934198 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1934200 - Posted: 7 May 2018, 14:26:42 UTC - in response to Message 1934185.  
Last modified: 7 May 2018, 14:29:53 UTC

In the mean time, all work done but still appearing Em progresso (51) in my host task list.
Sure they are ghosts, did anyone remember the post where the ghost DL sequence is explained?

Did save the explanation, sorry don´t remember from who


NOTE 2: If any of the "ghost" tasks are Arecibo VLARs, the scheduler may try to send them to an NVIDIA GPU (if you have one), which will fail, marking the task as "Abandoned". At least it's no longer a ghost.


. . This part no longer applies unless you use very old hardware and apps. Nvidia GPUs are now getting Arecibo VLAR tasks as normal.

Stephen

:)
ID: 1934200 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1934227 - Posted: 7 May 2018, 15:56:21 UTC

The kitties are still patiently waiting for that big kick to the servers...........................
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1934227 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1934234 - Posted: 7 May 2018, 16:34:09 UTC

SSP is dry, but current.
ID: 1934234 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1934235 - Posted: 7 May 2018, 16:37:15 UTC - in response to Message 1934234.  

SSP is dry, but current.

Ahhh...well, that should mean that somebody is working on it.
Good news.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1934235 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1934246 - Posted: 7 May 2018, 18:06:55 UTC

anyone getting any WUs? over a million in "Workunits waiting for assimilation". Looks like they added some AP splitting to the mix too.
ID: 1934246 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11362
Credit: 29,581,041
RAC: 66
United States
Message 1934247 - Posted: 7 May 2018, 18:12:51 UTC - in response to Message 1934246.  

None here in a long time.
ID: 1934247 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1934249 - Posted: 7 May 2018, 18:20:14 UTC

Hello? Has the "prod" happened? Still not getting any CPU work, here...
.

Hello, from Albany, CA!...
ID: 1934249 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1934250 - Posted: 7 May 2018, 18:20:36 UTC

Got 1 new AP on 2 diff hosts but thats all.
ID: 1934250 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11362
Credit: 29,581,041
RAC: 66
United States
Message 1934253 - Posted: 7 May 2018, 18:37:42 UTC

5/7/2018 11:22:49 AM | SETI@home | Project has no tasks available

ID: 1934253 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1934261 - Posted: 7 May 2018, 19:54:00 UTC

Results ready to send 0 0 231 0m

hopefully this is good news... hope this means that the huge amount (1.1 million+) waiting to be assimilated are finally being assimilated.
ID: 1934261 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1934288 - Posted: 7 May 2018, 21:55:35 UTC

Ah, we have some result creation, and I'm starting to fill up. Just in time to go to bed ;-)

I'll sort out the mess in the morning.
ID: 1934288 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1934292 - Posted: 7 May 2018, 22:30:40 UTC - in response to Message 1934288.  
Last modified: 7 May 2018, 22:53:37 UTC

Ah, we have some result creation, and I'm starting to fill up. Just in time to go to bed ;-)

I'll sort out the mess in the morning.


. . Half your luck, getting nothing here ...

. . [edit] OK, I changed the host 'location' to home and got a FULL cache ...

Stephen

:(
ID: 1934292 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 33 · Next

Message boards : Number crunching : Panic Mode On (112) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.