Panic Mode On (112) Server Problems?

Author	Message
Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 1938179 - Posted: 3 Jun 2018, 23:36:33 UTC blc13_2bit_blc13_guppi_58166_59941_DIAG_PSR_J1909-3744_0006 looks as if it's full of errors looking at the SSP. However it doesn't appear to be affecting splitter output ID: 1938179 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1938331 - Posted: 5 Jun 2018, 19:23:15 UTC . . OK that is officially the shortest outage I have ever seen. 3 hours and change. . . Don't I feel silly ... :) Stephen :) ID: 1938331 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1938335 - Posted: 5 Jun 2018, 20:02:51 UTC - in response to Message 1938331. . . OK that is officially the shortest outage I have ever seen. 3 hours and change. . . Don't I feel silly ... :) Stephen :) Me too. I wish we could count on one or the other long or short. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1938335 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1938560 - Posted: 7 Jun 2018, 8:19:52 UTC Looks like the splitters are getting tired. Going along OK, then take a break for a couple of hours. Get going again for a few more hours, then taking another 1-2hr break. Grant Darwin NT ID: 1938560 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1938861 - Posted: 9 Jun 2018, 22:26:30 UTC What is it about the weekend? Splitter output back to being less than level of demand. Grant Darwin NT ID: 1938861 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1938866 - Posted: 9 Jun 2018, 22:31:44 UTC - in response to Message 1938861. there is definitely something wrong with the splitters. The output has dropped. ID: 1938866 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1938943 - Posted: 10 Jun 2018, 15:02:17 UTC looking at the results to send numbers I'm guessing that the throttle range has been reset from in the 500s to in the 300s. I think that this is the new "normal" place for the results to send value. I'm concerned how this will affect recovery after Tuesday's planned outage. Might be ok if we have an outage and not an outrage. Guess we will find out Tuesday. ID: 1938943 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1938944 - Posted: 10 Jun 2018, 15:25:44 UTC - in response to Message 1938943. looking at the results to send numbers I'm guessing that the throttle range has been reset from in the 500s to in the 300s. I think that this is the new "normal" place for the results to send value. I'm concerned how this will affect recovery after Tuesday's planned outage. Might be ok if we have an outage and not an outrage. Guess we will find out Tuesday. I don't think the limiter on the splitters has been changed. They seem to get in a tangle and output doesn't keep up with demand, so RTS drops off. It was boosted by the arrival of a multibeam dataset, which increased splitter output while it was running. With that done, RTS will probably continue to drop off again, unless somebody steps in to realign the DB again, or some more multibeam work is added to the cache to split. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1938944 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1938957 - Posted: 10 Jun 2018, 16:15:27 UTC Last modified: 10 Jun 2018, 16:16:16 UTC Hmmmm, I can see both comments having validity. If as Unixchix says we no longer have outrages and the new normal outage is 3-4 hours, I think the RTS buffer in the 300K range would suffice. I can see a benefit to the database size if you don't have to allocate space for those 300K of extra tasks that would not need to split to pump up the RTS buffer to the past traditional size of 600K. Will have to wait and monitor to see where we're headed. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1938957 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1939140 - Posted: 11 Jun 2018, 15:42:06 UTC rts is now back in the 500k range with the splitting of some AP/MP data sets. Here is my new theory. The throttle is only on the gbt data as the AP/MP data is so minimal at this point. As AP/MP data is added to the queue the rts builds to 500k to 600k range total, but when the gbt portion of the rts reaches the 400k the throttle kicks in and it is all MP data being added until some of the gbt data gets taken out of the queue. As long as there is some MP data in the rts then the numbers stay up in the 500k range. This effect lingers as errors and timeouts cause resends so it doesn't fall the moment all MP data is split. It looks like they are splitting 10jn18aa right now. Hopefully they have one or two AP/MP files held back to split on Tuesday after the outage (I'm an optimist, so no outrage). My theories are just playful thoughts as I'm just happy I am getting a good supply of data and that the newbies that have come to join the project (from seeing HBO special??) are seeing a nice stable system that has handled the added load. I admit I was worried when the rts numbers fell to the 300k range, but all is good. ID: 1939140 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1939141 - Posted: 11 Jun 2018, 15:45:22 UTC - in response to Message 1939140. It looks like they are splitting 10jn18aa right now. Hopefully they have one or two AP/MP files held back to split on Tuesday after the outage (I'm an optimist, so no outrage). 10jn18aa is yesterday's recording at Arecibo, so I doubt they have anything 'held back'. But tomorrow, they may have today's recording - fingers crossed. ID: 1939141 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1939150 - Posted: 11 Jun 2018, 16:58:30 UTC - in response to Message 1939140. ok the splitter for MP went to 0 around 650k rts . so there is a throttle on MP... but I think there are different throttles for gbt and mp and maybe a secondary total throttle... hmm. ID: 1939150 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1939232 - Posted: 12 Jun 2018, 6:30:48 UTC - in response to Message 1939150. ok the splitter for MP went to 0 around 650k rts . so there is a throttle on MP... but I think there are different throttles for gbt and mp and maybe a secondary total throttle... hmm. Where you're typing MP I think you mean MB. AP= AstroPulse MB= Multi beam. I think it's just a case of the splitters have had a good rest & are now back to putting out enough work to keep the Ready-to-send buffer full. At least until the next time they revert to go slow mode. Grant Darwin NT ID: 1939232 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1939235 - Posted: 12 Jun 2018, 6:57:45 UTC Well that sucks. Looks like there's some dodgy WUs out there. 09jn18aa.15655.365782.9.36.243_1 Outcome Computation error Client state Compute error Exit status -112 (0xFFFFFF90) ERR_XML_PARSE as did 05jn18aa.24200.1472803.10.37.225_1 application SETI@home v8 created 10 Jun 2018, 10:53:59 UTC minimum quorum 2 initial replication 2 max # of error/total/success tasks 5, 10, 5 errors Too many errors (may have bug) and 05jn18aa.24200.1472803.10.37.231_0 Has crashed & burnt 3 times so far. Grant Darwin NT ID: 1939235 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1939238 - Posted: 12 Jun 2018, 7:10:16 UTC Yes there seems to be a few dodgy workunits out there. Task 3008834112 We all seem to have been hit with: </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>05jn18aa.18403.1473928.7.34.147_2_r1038861265_0</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> errors. Task names don't match. Assume that is why the file isn't found. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1939238 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1939239 - Posted: 12 Jun 2018, 7:27:31 UTC - in response to Message 1939235. Well that sucks. Looks like there's some dodgy WUs out there. I've had a couple, too. 05jn18aa.18372.1473928.6.33.130_2 Stopped at 0.0 and 0.0 on the clock, Exit status -6 (0xFFFFFFFA) Unknown error code Crashed 5 times so far. 05jn18aa.18403.1473928.7.34.113_2 Stopped at 0.0 and 0.0 on the clock, Exit status -6 (0xFFFFFFFA) Unknown error code Crashed 4 times so far. Humans may rule the world...but bacteria run it... ID: 1939239 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1939250 - Posted: 12 Jun 2018, 11:27:31 UTC - in response to Message 1939235. Last modified: 12 Jun 2018, 11:28:09 UTC Well that sucks. Looks like there's some dodgy WUs out there. 05jn18aa.24200.1472803.10.37.231_0 Has crashed & burnt 3 times so far. . . Yep I've been seeing them too. Had a few on a couple of the rigs. Stephen :( ID: 1939250 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1939256 - Posted: 12 Jun 2018, 12:50:22 UTC SETI@home error -6 Bad workunit header Along with those I'm also getting a few Download Errors. <error_code>-119 (md5 checksum failed for file)</error_code> I'm not seeing any Download Errors anywhere else, and why do I never see any Upload Errors? Uploading to SETI is different than Downloading? I Dunno... ID: 1939256 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1939257 - Posted: 12 Jun 2018, 13:50:53 UTC Upload is subtly different, and the files are smaller... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1939257 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1939258 - Posted: 12 Jun 2018, 14:03:33 UTC - in response to Message 1939257. That's nice. I can Download 5+ gigabyte files from Apple, 1+ gigabyte from nVidia and Ubuntu, but it fails on less than a megabyte from SETI. The machines Upload just as many files to SETI as they Download, never seen an Upload failure. ID: 1939258 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.