Panic Mode On (105) Server Problems?

Message boards : Number crunching : Panic Mode On (105) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1864865 - Posted: 30 Apr 2017, 21:40:08 UTC - in response to Message 1864727.  


Yeah, going through that right now with one. 8 month old 250g SSD died, no backup, of course.

On the bright side it should still be under warranty. Best of luck with your data recovery

Thx
ID: 1864865 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1864876 - Posted: 30 Apr 2017, 22:23:34 UTC

Yeah it's a bugger when a hard drive dies, but at 7yrs old mine is out of warranty by 2yrs. :-(

I was also interrupted when merging computers so I forgot to reset the project, so I'll have some ghosts around for the next 6-7 weeks (yes I looked at the process of retrieving them, but I don't have the time to go through that process).

Cheers.
ID: 1864876 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1864877 - Posted: 30 Apr 2017, 22:26:53 UTC - in response to Message 1864876.  

Wiggo, if you wait about a week then try 1 attempt at recovery it should say "can't resend" (too late) and will release them all.
ID: 1864877 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864903 - Posted: 30 Apr 2017, 23:30:38 UTC - in response to Message 1864877.  

Good information. I wasn't aware there was an easy way out of the ghosts situation.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864903 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864905 - Posted: 30 Apr 2017, 23:37:57 UTC - in response to Message 1864703.  

Jimbocous, I will take you up on your offer of your easy Ghost Recovery Process. I assume it works on BOINC and not just SETI? I have two machines with MilkyWay work with ghosts from a corrupted BOINC directory caused by a gone bad memory overclock. It only corrupted two files, the master file and the stats file. The master was easy to recover but losing 6 months of stats was the one that really hurt. I only have two ghosts left here at SETI on Numbskull. I'm not concerned about them. The MW ghosts are not clearing and I thought they would by now since the turnover rate at MW is a matter of days and not months like here.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864905 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864913 - Posted: 1 May 2017, 0:22:04 UTC - in response to Message 1864876.  

Yeah it's a bugger when a hard drive dies, but at 7yrs old mine is out of warranty by 2yrs. :-(

I was also interrupted when merging computers so I forgot to reset the project, so I'll have some ghosts around for the next 6-7 weeks (yes I looked at the process of retrieving them, but I don't have the time to go through that process).

Cheers.


. . Fair cop, it is a little fiddly. It can take a couple of hours if there a great number (like more than 100). But when I managed to trash the cache on Mi-Burrito there were about 150 ghosts so I just ran the process once or twice a day and in less than a week I had recovered them all. It seemed better than leaving them in limbo for 2 months and only took 10 to 20 mins per day.

Happy crunching :)

Stephen

:)
ID: 1864913 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864915 - Posted: 1 May 2017, 0:30:14 UTC - in response to Message 1864841.  
Last modified: 1 May 2017, 0:40:33 UTC

Hi Stephen, I should try and find the answer myself with a forum search, but since I have your attention, what is the method you use to move a Arecibo VLAR onto a Nvidia GPU? The scheduler won't do it because of the way the move rules are written. I too have been in troubles especially on outage days where I run out quickly on GPU tasks and it would be nice to move some CPU tasks to GPU. At least for the slow FX systems. I've learned that is counter productive on the Ryzen system since it would run out just as fast for both task types on that system. Best to let it run un-optimized without rescheduling on outage days.


. . I use Stubbles' script. It is not compiled but does the task well. I have tried to get it to work under Linux as well but the file format for client_state.xml is different there and I cannot fathom how to make it work properly. I would suggest talking to Stubbles but he seems to have disappeared, I cannot even get an answer on email. But there are lots of messages about it in the early stages of the re-scheduling thread. He may still have a file server running.

. . It is a little invlolved to set up as you will need to install Sublime Text 3 and edit the script to tailor it to your drive setup for Seti. If you are using the default drive/directories then the script should pretty much run as is (I don't so I had to fiddle a bit). But you will need to edit the reference text file to match the version of app that you are running. I think the version was 8.12 when the script was done.

Stephen

..
ID: 1864915 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864916 - Posted: 1 May 2017, 0:32:36 UTC - in response to Message 1864877.  
Last modified: 1 May 2017, 0:38:48 UTC

Wiggo, if you wait about a week then try 1 attempt at recovery it should say "can't resend" (too late) and will release them all.


. . Yep, what he said... but I am not sure about the time frame, it could be 10 days or more, or maybe only 5, I cannot say how old the ghosts were when I got that result.

Stephen

:)
ID: 1864916 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864936 - Posted: 1 May 2017, 1:50:18 UTC - in response to Message 1864913.  


. . Fair cop, it is a little fiddly. It can take a couple of hours if there a great number (like more than 100). But when I managed to trash the cache on Mi-Burrito there were about 150 ghosts so I just ran the process once or twice a day and in less than a week I had recovered them all. It seemed better than leaving them in limbo for 2 months and only took 10 to 20 mins per day.

Happy crunching :)

Stephen

:)

I don't want to recover the MW tasks. I just want to put them out of their misery like Brent suggested.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864936 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1864938 - Posted: 1 May 2017, 1:54:27 UTC - in response to Message 1864915.  

Hi Stephen, I should try and find the answer myself with a forum search, but since I have your attention, what is the method you use to move a Arecibo VLAR onto a Nvidia GPU? The scheduler won't do it because of the way the move rules are written. I too have been in troubles especially on outage days where I run out quickly on GPU tasks and it would be nice to move some CPU tasks to GPU. At least for the slow FX systems. I've learned that is counter productive on the Ryzen system since it would run out just as fast for both task types on that system. Best to let it run un-optimized without rescheduling on outage days.


. . I use Stubbles' script. It is not compiled but does the task well. I have tried to get it to work under Linux as well but the file format for client_state.xml is different there and I cannot fathom how to make it work properly. I would suggest talking to Stubbles but he seems to have disappeared, I cannot even get an answer on email. But there are lots of messages about it in the early stages of the re-scheduling thread. He may still have a file server running.

. . It is a little invlolved to set up as you will need to install Sublime Text 3 and edit the script to tailor it to your drive setup for Seti. If you are using the default drive/directories then the script should pretty much run as is (I don't so I had to fiddle a bit). But you will need to edit the reference text file to match the version of app that you are running. I think the version was 8.12 when the script was done.

Stephen

..

I got a PM from Laurent offering to send me the Linux script but it hasn't shown up yet. I do have a Perl interpreter on one system so think it could work if I massage the script for Windows. I at least want to look at it to get an idea of what is required and how it works. I hope I can trace through the script to get the gist of it. Will be good to work the gray cells a bit and dust off some forgotten skills.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1864938 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864958 - Posted: 1 May 2017, 3:40:49 UTC - in response to Message 1864938.  
Last modified: 1 May 2017, 3:43:31 UTC


I got a PM from Laurent offering to send me the Linux script but it hasn't shown up yet. I do have a Perl interpreter on one system so think it could work if I massage the script for Windows. I at least want to look at it to get an idea of what is required and how it works. I hope I can trace through the script to get the gist of it. Will be good to work the gray cells a bit and dust off some forgotten skills.


. . I have used DropBox in the past so maybe I could zip up what I am using and you could have a play with that. I just have to remember how to use dropbox :( And if Laurent has a working Linux version of the script I would love to have a copy. It is a moot point on the Pentium-D and the Core2 Duo now but I think I will need it for Bertie when I switch him over to Linux.

Stephen

..
ID: 1864958 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1865033 - Posted: 1 May 2017, 15:42:36 UTC - in response to Message 1864876.  

...., so I'll have some ghosts around for the next 6-7 weeks (yes I looked at the process of retrieving them, but I don't have the time to go through that process).

Cheers.
As an alternative to the "ghost" recovery process that I had previously posted quite a while back (involving client_state backup and restore, etc.), I have another one to offer that I think is simpler and can be entirely controlled within BOINC Manager. It just requires a fast finger on your mouse button, since the key here is to interrupt a scheduler request before it completes. I just used this quite successfully over the weekend to recover 127 "ghosts" that I had created on Friday when I started trying to run the "Special" app on Linux. I just went back to that machine occasionally when I had a few minutes and ran it when I knew I had room in the queue for at least 20 tasks to be recovered, so 7 times in all. I figured that, since it was my fault for creating the "ghosts", the least I could do was try to recover them as a courtesy to my wingmen.

1) Set "No New Tasks"

2) Make sure you have enough room in your work buffer to accommodate your "ghosts" (up to a maximum of 20 per request). If you're not one of those who typically have a queue which reaches the task limits, simply increasing the size of your work buffer should be sufficient. Otherwise, you'll have to wait until you've reported enough completed tasks to free up the necessary space in your queue.

3) Wait for, or initiate (using Update), a scheduler request that reports at least one completed task.

4) As soon as you see the scheduler request commence, interrupt it by IMMEDIATELY clicking "Suspend network activity". I find it easiest to first open the "Activity" menu drop-down and, while keeping my mouse pointer poised over "Suspend network activity", keep a close eye on the Event Log awaiting the start of the scheduler request. Then, as soon as the scheduler request commences, just CLICK.

If successful, the Event Log will show lines like this, and stop:
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Not requesting tasks: "no new tasks" requested via Manager
Suspending network activity - user request

If you get a "Scheduler request completed" line before the "Suspending network activity" line, you weren't quick enough. For me, at least, that hasn't been a problem.

5) To be on the safe side, at this point, I usually Exit BOINC completely, shutting down all running tasks, wait a minute or so, then restart BOINC. Note that network activity should still be suspended when BOINC resumes. You should also still see your task(s) "Ready to report".

6) "Allow New Tasks"

7) Resume network activity (always, or based on preferences, whichever is normal for you). If a scheduler request isn't triggered automatically, click "Update". The Event Log should now show something such as:
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Scheduler request completed: got 4 new tasks
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.49.vlar_1
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.189.vlar_0
Resent lost task blc4_2bit_guppi_57432_25217_HIP57328_0003.22901.831.18.27.241.vlar_0
Resent lost task 01au09aa.11976.21340.7.34.21_1

followed by the usual task download messages.

NOTE: Since 20 "ghosts" seem to be the maximum that can be retrieved in one request, those with more than 20 "ghosts" will need to repeat the process multiple times, at least 5 minutes apart.

NOTE 2: If any of the "ghost" tasks are Arecibo VLARs, the scheduler may try to send them to an NVIDIA GPU (if you have one), which will fail, marking the task as "Abandoned". At least it's no longer a ghost.
ID: 1865033 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1865048 - Posted: 1 May 2017, 16:30:36 UTC - in response to Message 1865033.  

KUDOS....

Nice Hack.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1865048 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1865051 - Posted: 1 May 2017, 16:39:57 UTC - in response to Message 1864958.  


I got a PM from Laurent offering to send me the Linux script but it hasn't shown up yet. I do have a Perl interpreter on one system so think it could work if I massage the script for Windows. I at least want to look at it to get an idea of what is required and how it works. I hope I can trace through the script to get the gist of it. Will be good to work the gray cells a bit and dust off some forgotten skills.


. . I have used DropBox in the past so maybe I could zip up what I am using and you could have a play with that. I just have to remember how to use dropbox :( And if Laurent has a working Linux version of the script I would love to have a copy. It is a moot point on the Pentium-D and the Core2 Duo now but I think I will need it for Bertie when I switch him over to Linux.

Stephen

..

Laurent sent me the Linux Perl script last night. I have had a look at it and have done the necessary editing to change it to the Windows platform. It is a beta script and he has it set to not alter the actual client_state and app_info files, just make backups and the trial output right now. He wants the trial output first to see if it is running correctly. I am going to run it today and send him the files. I can ask him if he wants to send you the trial script as another test subject. You do have to have Perl installed to use the script and have to add a module but that is very easy to do with Perl. I had no issues since I already had Perl installed for my finance program and added the required module last night.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1865051 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1865080 - Posted: 1 May 2017, 20:05:37 UTC

Jeff Buck's "ghost fix" worked, first tried without stopping/starting BOINC but the scheduler just continued to complete without requesting new work, only after restarting did it request new work incl getting ghost WUs :)
ID: 1865080 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1865085 - Posted: 1 May 2017, 20:38:40 UTC - in response to Message 1865080.  

Jeff Buck's "ghost fix" worked, first tried without stopping/starting BOINC but the scheduler just continued to complete without requesting new work, only after restarting did it request new work incl getting ghost WUs :)
Glad to hear that it worked okay for you!

I think it can work without stopping/starting BOINC, but you would have to wait long enough for the original scheduler request to time out on the server. I don't know how long that might be and, frankly, I don't have the patience for it. That's why I recommend getting out of BOINC and stopping the running tasks. Saves a whole bunch of time. :^)
ID: 1865085 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1865094 - Posted: 1 May 2017, 20:48:01 UTC - in response to Message 1864905.  
Last modified: 1 May 2017, 20:48:31 UTC

...
ID: 1865094 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1865098 - Posted: 1 May 2017, 20:54:15 UTC - in response to Message 1865085.  

I haven't tried seeing if the request times out either.

What I do after interrupting the upload is to just leave it run 5m - hours, it doesn't seem to matter how long you leave it.
Then a 30 second BIONC restart, and enable new tasks/network.

Then repeat 10 times, because it usually a Re-scheduler issue that toasted over 200 tasks :((
ID: 1865098 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1865104 - Posted: 1 May 2017, 22:05:42 UTC - in response to Message 1864915.  

Hi Stephen, I should try and find the answer myself with a forum search, but since I have your attention, what is the method you use to move a Arecibo VLAR onto a Nvidia GPU? The scheduler won't do it because of the way the move rules are written. I too have been in troubles especially on outage days where I run out quickly on GPU tasks and it would be nice to move some CPU tasks to GPU. At least for the slow FX systems. I've learned that is counter productive on the Ryzen system since it would run out just as fast for both task types on that system. Best to let it run un-optimized without rescheduling on outage days.


. . I use Stubbles' script. It is not compiled but does the task well. I have tried to get it to work under Linux as well but the file format for client_state.xml is different there and I cannot fathom how to make it work properly. I would suggest talking to Stubbles but he seems to have disappeared, I cannot even get an answer on email. But there are lots of messages about it in the early stages of the re-scheduling thread. He may still have a file server running.

. . It is a little invlolved to set up as you will need to install Sublime Text 3 and edit the script to tailor it to your drive setup for Seti. If you are using the default drive/directories then the script should pretty much run as is (I don't so I had to fiddle a bit). But you will need to edit the reference text file to match the version of app that you are running. I think the version was 8.12 when the script was done.

Stephen

..

Offer's still open to you or anyone (Keith?) if you want the uncompiled QOpt script. The whole compile concept was an epic fail.
ID: 1865104 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1865105 - Posted: 1 May 2017, 22:08:48 UTC - in response to Message 1865104.  

Hi, I'm game. Feeling confident now that I managed to get Laurent's script working. Of course he did all the work, I just tweaked a bit for Windows.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1865105 · Report as offensive
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · Next

Message boards : Number crunching : Panic Mode On (105) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.