Panic Mode On (103) Server Problems?

Message boards : Number crunching : Panic Mode On (103) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 34 · Next

AuthorMessage
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1826236 - Posted: 23 Oct 2016, 3:43:02 UTC - in response to Message 1826234.  

Can you please explain how you get the day and year from "57628" I can possibly understand how you get the year? Is that from the "6"


I use this site to convert the MJD(Modified Julian Date) to normal dates that everyone can understand.

To get the Modified Julian Date, it is the Julian Date minus 2400000.5, to get the Julian Date starts from January 1, 4713 B.C til present.

e.g the Julian Date for 23rd of Oct 2016 is 2457685, and the Modified Julian Date is 57684.5
ID: 1826236 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1826241 - Posted: 23 Oct 2016, 4:40:09 UTC - in response to Message 1826236.  

Can you please explain how you get the day and year from "57628" I can possibly understand how you get the year? Is that from the "6"


I use this site to convert the MJD(Modified Julian Date) to normal dates that everyone can understand.

To get the Modified Julian Date, it is the Julian Date minus 2400000.5, to get the Julian Date starts from January 1, 4713 B.C til present.

e.g the Julian Date for 23rd of Oct 2016 is 2457685, and the Modified Julian Date is 57684.5

Brilliant thanks
ID: 1826241 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1826259 - Posted: 23 Oct 2016, 7:31:45 UTC - in response to Message 1826236.  

Can you please explain how you get the day and year from "57628" I can possibly understand how you get the year? Is that from the "6"


I use this site to convert the MJD(Modified Julian Date) to normal dates that everyone can understand.

To get the Modified Julian Date, it is the Julian Date minus 2400000.5, to get the Julian Date starts from January 1, 4713 B.C til present.

e.g the Julian Date for 23rd of Oct 2016 is 2457685, and the Modified Julian Date is 57684.5


. . Hi Kiska,

. . You say that like it is so easy :) I understand the way it works but it still eludes me, in that I need to run it through the converter to have any idea what the actual date is :) So looking at the Green Bank file names leaves me with no idea of their age except that it will be less than 12 months because they have been online for less than that :)

Stephen

.
ID: 1826259 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1826269 - Posted: 23 Oct 2016, 9:34:36 UTC - in response to Message 1826259.  
Last modified: 23 Oct 2016, 9:37:23 UTC

Can you please explain how you get the day and year from "57628" I can possibly understand how you get the year? Is that from the "6"


I use this site to convert the MJD(Modified Julian Date) to normal dates that everyone can understand.

To get the Modified Julian Date, it is the Julian Date minus 2400000.5, to get the Julian Date starts from January 1, 4713 B.C til present.

e.g the Julian Date for 23rd of Oct 2016 is 2457685, and the Modified Julian Date is 57684.5


. . Hi Kiska,

. . You say that like it is so easy :) I understand the way it works but it still eludes me, in that I need to run it through the converter to have any idea what the actual date is :) So looking at the Green Bank file names leaves me with no idea of their age except that it will be less than 12 months because they have been online for less than that :)

Stephen

.


Astronomers will find MJD conversion quite easy. And since next year I am going(perhaps) to be transferring to a bachelor of science, I should know.

e.g
Lets take some off SSP and convert to a time and date

blc2_2bit_guppi_57451_67970_HIP117473_0019

MJD: 57451 which translates to 4th of March 2016
Time: 18:52:50

blc3_2bit_guppi_57432_26201_HIP57328_OFF_0006

MJD: 57432 which translates to 14th of February 2016
Time: 7:16:41

blc3_2bit_guppi_57432_26529_HIP57328_0007

MJD: 57432 which translates to 14th of February 2016
Time: 7:22:09

The 2 recorded on 57432 Modified Julian Date tells me that each "tape" has approx. 330 seconds of data, which 51.31GB is pretty large
ID: 1826269 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1826482 - Posted: 24 Oct 2016, 13:24:48 UTC
Last modified: 24 Oct 2016, 13:25:19 UTC

. . @ Jeff Buck

. . Hi,

. . I don't know if it was in this thread but thanks for the advice on how to recover Ghosted Tasks. Although it is only a small portion of the tasks that were ghosted when I reverted to CUDA50 after trying out SoG r3500 on my C2D, I did manage to recover 21 tasks which will not be timing out now.

Stephen

.
ID: 1826482 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1826521 - Posted: 24 Oct 2016, 16:40:19 UTC - in response to Message 1826482.  

. . @ Jeff Buck

. . Hi,

. . I don't know if it was in this thread but thanks for the advice on how to recover Ghosted Tasks. Although it is only a small portion of the tasks that were ghosted when I reverted to CUDA50 after trying out SoG r3500 on my C2D, I did manage to recover 21 tasks which will not be timing out now.

Stephen

.

Good to hear! The DB and your wingmates will certainly benefit from clearing those tasks out sooner. If you repeat the process, you should eventually be able to get all your lost tasks back, 20 at a time, as long as you have room in your queue. The easiest way to do that is just to set No New Tasks for a while.

I assume you used the technique I detailed in Message 1798083. More recently, I discovered another approach that I think is a bit simpler but relies on timing and a fast mouse click.

Now, this second technique, as with the first one, requires that you have at least one task which has completed, uploaded, and is ready to report. Then you have to interrupt the scheduler request after it reports the task but before your host gets a response. The simplest way I've found to do that is to "Suspend network activity" the instant a scheduler request commences. How you achieve that may take a little trial and error on your part.

For my big boxes, which basically initiate scheduler requests just about every time the 5 minute timer expires, it's simply a matter of hovering my mouse cursor over the "Suspend network activity" menu item, watching the timer count down on the Projects tab, and then, once it gets to zero, watching the Event Log. As soon as the scheduler request commences, CLICK. If successful, the Event Log will show lines like this, and stop:
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Suspending network activity - user request

If you get a "Scheduler request completed" line before the "Suspending network activity" line, you weren't quick enough. For me, at least, that hasn't been a problem.

Now, to be on the safe side, at this point, I usually Exit BOINC completely, shutting down all running tasks, wait a minute or so, then restart BOINC. Note that network activity should still be suspended when BOINC resumes. You should also still see your task(s) "Ready to report". Assuming that you still had NNT set before you suspended network activity (highly recommended), this is now the time to Allow New Tasks.

Then, simply resume network activity (always, or based on preferences, whichever is normal for you). The Event Log should now show something such as:
Sending scheduler request: To fetch work.
Reporting 1 completed tasks
Requesting new tasks for CPU and NVIDIA GPU
Scheduler request completed: got 4 new tasks
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.49.vlar_1
Resent lost task blc4_2bit_guppi_57432_24865_PSR_J1136+1551_0002.22874.831.18.27.189.vlar_0
Resent lost task blc4_2bit_guppi_57432_25217_HIP57328_0003.22901.831.18.27.241.vlar_0
Resent lost task 01au09aa.11976.21340.7.34.21_1

followed by the usual task download messages.

It may seem complicated, but it's actually quite simple, as long as you're quick with the mouse click.

One caveat. If you have any Arecibo VLARs among your lost tasks, and you have an NVIDIA GPU, the scheduler may sometimes attempt to send those tasks to the GPU, which is a no-no. Those resends will fail as timeouts, but at least they're now freed up to send to another host.
ID: 1826521 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827107 - Posted: 28 Oct 2016, 4:43:46 UTC

I see we have some more of the guppi MESSIER031 WUs coming down the pipe. The 57397 date is the same as the batch we got back in April, but those were "blc0" while the new ones are "blc6". Those April WUs were at least 70% overflows as I recall. I wonder if these will do any better.

BTW, for those doing VLAR rescheduling, the MESSIER031 WUs are not VLARs, even though they're guppis. The ARs for the April batch tended to be around 0.30 or thereabouts, just slightly lower than normal ARs.
ID: 1827107 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1827127 - Posted: 28 Oct 2016, 7:10:00 UTC - in response to Message 1827107.  

BTW, for those doing VLAR rescheduling, the MESSIER031 WUs are not VLARs, even though they're guppis.

I can't speak to how Kevvy is handling that ... guess I should do some research.
ID: 1827127 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1827795 - Posted: 1 Nov 2016, 0:11:29 UTC - in response to Message 1827107.  

I see we have some more of the guppi MESSIER031 WUs coming down the pipe. The 57397 date is the same as the batch we got back in April, but those were "blc0" while the new ones are "blc6". Those April WUs were at least 70% overflows as I recall. I wonder if these will do any better.

BTW, for those doing VLAR rescheduling, the MESSIER031 WUs are not VLARs, even though they're guppis. The ARs for the April batch tended to be around 0.30 or thereabouts, just slightly lower than normal ARs.



. . Hi Jeff

. . I have only just started to receive these tasks but the few I have run on my GPU (GTX950) so far show runtimes as I would expect from an Arecibo task of around the same AR value. Arecibo tasks take between 30 and 42 mins, the longer times being lower AR values. The Blc6 Messiers are taking mid 40's which puts them in proportion on runtimes. I have a couple of dozen queued on the GPU so I will monitor their results. But this will make work harder because they will get moved by the rescheduler and I will then have to manually move them back. Whether I bother will depend on how consistently they maintain runtimes. But thanks for pointing that out. As for the CPU, I only have them hitting the CPU on my Core2 Duo so far and the times there seem very similar to Blc3s.

Stephen

.
ID: 1827795 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1827821 - Posted: 1 Nov 2016, 3:07:07 UTC - in response to Message 1827795.  
Last modified: 1 Nov 2016, 3:08:09 UTC

I see we have some more of the guppi MESSIER031 WUs coming down the pipe. The 57397 date is the same as the batch we got back in April, but those were "blc0" while the new ones are "blc6". Those April WUs were at least 70% overflows as I recall. I wonder if these will do any better.

BTW, for those doing VLAR rescheduling, the MESSIER031 WUs are not VLARs, even though they're guppis. The ARs for the April batch tended to be around 0.30 or thereabouts, just slightly lower than normal ARs.



. . Hi Jeff

. . I have only just started to receive these tasks but the few I have run on my GPU (GTX950) so far show runtimes as I would expect from an Arecibo task of around the same AR value. Arecibo tasks take between 30 and 42 mins, the longer times being lower AR values. The Blc6 Messiers are taking mid 40's which puts them in proportion on runtimes. I have a couple of dozen queued on the GPU so I will monitor their results. But this will make work harder because they will get moved by the rescheduler and I will then have to manually move them back. Whether I bother will depend on how consistently they maintain runtimes. But thanks for pointing that out. As for the CPU, I only have them hitting the CPU on my Core2 Duo so far and the times there seem very similar to Blc3s.

Stephen

.

I've suspended my VLAR rescheduling efforts since Friday evening, inasmuch as the only guppi VLARs coming along are the occasional resend. I figure that, even though the MESSIER031 files do have a slightly lower AR than "normal" Arecibo tasks, rescheduling is unlikely to provide much benefit. The Arecibo tasks they'd be swapped with could themselves have ARs that are even lower, just above the VLAR threshold. There's no real way of knowing in advance. I'll resume rescheduling once guppi VLARs start flowing again.

It's nice, though, to see that this batch of MESSIER031 files seem to be much more useful than the previous ones, which were mostly overflows.
ID: 1827821 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1827874 - Posted: 1 Nov 2016, 12:07:25 UTC - in response to Message 1827821.  


I've suspended my VLAR rescheduling efforts since Friday evening, inasmuch as the only guppi VLARs coming along are the occasional resend. I figure that, even though the MESSIER031 files do have a slightly lower AR than "normal" Arecibo tasks, rescheduling is unlikely to provide much benefit. The Arecibo tasks they'd be swapped with could themselves have ARs that are even lower, just above the VLAR threshold. There's no real way of knowing in advance. I'll resume rescheduling once guppi VLARs start flowing again.

It's nice, though, to see that this batch of MESSIER031 files seem to be much more useful than the previous ones, which were mostly overflows.



. . Yes I remember the run of what I call "noise bombs".

Stephen

.
ID: 1827874 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1827878 - Posted: 1 Nov 2016, 12:44:50 UTC

Hmmm. All data services (except upload/download) 'down for maintenance' at 05:30 am, but the message boards are up as normal?

I sniff trouble in the wind...
ID: 1827878 · Report as offensive
JLDun
Volunteer tester
Avatar

Send message
Joined: 21 Apr 06
Posts: 573
Credit: 196,101
RAC: 0
United States
Message 1827884 - Posted: 2 Nov 2016, 0:13:55 UTC

I just looked at the SSP @7:09 CDT, most processes back up, but.... no GBT tapes listed, and replica listed as 0 behind master; which we all know is unlikely so soon after the weekly outage. (And confirmation is I have three tasks returned, but mu task page listed them as still 'out in the field'.)
ID: 1827884 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1827885 - Posted: 2 Nov 2016, 0:14:04 UTC
Last modified: 2 Nov 2016, 0:15:47 UTC

Finally came up about 8pm EDT, got a big batch (~70 + 1 GBT) of

20ja16ab.26335.16018.4.31.nn


All ran only a few seconds (SoG). Only the GBT WU is running normally.
No compute errors.

What is going on???
ID: 1827885 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1827889 - Posted: 2 Nov 2016, 0:37:45 UTC - in response to Message 1827885.  

Was able to report all my completed work. Was completely out of GPU work from about noon PDT today and fell back on backup projects. Finally getting GPU tasks once I filled up my CPU allocations on all machines. Don't know where the work is coming from since I don't see any splitter output on the SSP even though it shows splitters running. Guess the new work it coming out of the already split buffer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1827889 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1827897 - Posted: 2 Nov 2016, 1:01:08 UTC - in response to Message 1827878.  

Hmmm. All data services (except upload/download) 'down for maintenance' at 05:30 am, but the message boards are up as normal?

I sniff trouble in the wind...


. . Hi Richard,

. . At this end the message boards were unavailable as well as WU servers, only able to upload completed tasks. Lasted from about 11pm last night until 11am this morning, that would be noon till midnight UTC.

. . Is it just me or are the outages getting longer ??

Stephen

.
ID: 1827897 · Report as offensive
JLDun
Volunteer tester
Avatar

Send message
Joined: 21 Apr 06
Posts: 573
Credit: 196,101
RAC: 0
United States
Message 1827901 - Posted: 2 Nov 2016, 1:10:49 UTC - in response to Message 1827897.  

Is it just me or are the outages getting longer ??


I tend to plan the outages to end around 4-4:30 local; today's was around 7:00.... so very much longer this time.
And remember having the replica off for an entire week recently? That makes a long outage. :-)
ID: 1827901 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1827905 - Posted: 2 Nov 2016, 1:29:16 UTC - in response to Message 1827901.  

Is it just me or are the outages getting longer ??


I tend to plan the outages to end around 4-4:30 local; today's was around 7:00.... so very much longer this time.
And remember having the replica off for an entire week recently? That makes a long outage. :-)

Yes and this one started prior to 7:30 AM PDT rather than the usual 9:00 AM
ID: 1827905 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1827933 - Posted: 2 Nov 2016, 4:36:58 UTC - in response to Message 1827905.  

Got back from work to find both of my systems in extreme Scheduler request backoff mode, one had run out of GPU work.
Did a manual Update & all the backed up work reported & new work started downloading.
Grant
Darwin NT
ID: 1827933 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1827957 - Posted: 2 Nov 2016, 10:37:53 UTC

Got a number of GPU shorties on my Linux box, all validated. I think they are Arecibo tasks, not guppies, which instead take 16 minutes on the Windows 10 PC with its GTX 750 OC.
Tullio
ID: 1827957 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 34 · Next

Message boards : Number crunching : Panic Mode On (103) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.