The Server Issues / Outages Thread - Panic Mode On! (117)

Author	Message
Darrell Wilcox Volunteer tester Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118	Message 2022933 - Posted: 13 Dec 2019, 2:45:52 UTC - in response to Message 2022904. @ Profile Wiggo "Democratic Socialist" and several others Yep, those sticking downloads are getting very annoying here this morning. :-( I know you are running Linux, but one of the Linux people could convert my Windows CMD file into a CHRON job to do much the same thing. This CMD file looks into BOINC to see if anything is in the "Transfer" queue. If there is, it sends a request to BOINC to retry each of my four projects, then sleeps for a minute before retrying again. When the Transfer queue is empty, it sleeps for 20 minutes. Once an hour, it requests BOINC to UPDATE. I only start this when we are having transfer problems, and stop it when they are cleared up. ========================================================================================================= @echo off prompt $T$G Setlocal EnableDelayedExpansion SET /A UpdateTime=0 cd /d S:\Program Files\BOINC :again set /A WaitTime=1200 for /F "tokens=1,2*" %%I in ('boinccmd.exe --get_file_transfers') do ( if /I "%%I"=="name:" ( set FN=%%J set /A WaitTime = 60 ) if /I "%%J"=="active:" if /I "%%K"=="no" ( boinccmd.exe --file_transfer http://setiathome.berkeley.edu !FN! retry 2> NUL boinccmd.exe --file_transfer http://einstein.phys.uwm.edu/ !FN! retry 2> NUL boinccmd.exe --file_transfer https://lhcathome.cern.ch/lhcathome/ !FN! retry 2> NUL boinccmd.exe --file_transfer http://boinc.bakerlab.org/rosetta/ !FN! retry 2> NUL ) ) if %WaitTime% EQU 1200 set /A UpdateTime=%UpdateTime%+1200 if %UpdateTime% GEQ 3600 (boinccmd.exe --project http://setiathome.berkeley.edu update set /A UpdateTime=0 ) Choice /C YQ /D Y /T %WaitTime% /M "%Date% %Time% Waiting for %WaitTime% seconds. Do it again now? Press Y for Yes, Q quit" if %ERRORLEVEL%==1 goto :again ========================================================================================================= ID: 2022933 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2022934 - Posted: 13 Dec 2019, 2:53:40 UTC - in response to Message 2022932. Fwiw, I set cc_config.xml <max_file_xfers>5</max_file_xfers>, down from 8, and I'm not seeing further stuck transfers. Could just be a coincidence. Just tried this. Didn't make any difference. Downloads at max 5 still went to instant backoffs. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2022934 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 2022936 - Posted: 13 Dec 2019, 3:04:18 UTC - in response to Message 2022933. I know you are running Linux, but one of the Linux people could convert my Windows CMD file into a CHRON job to do much the same thing. Cute! ID: 2022936 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022950 - Posted: 13 Dec 2019, 5:14:37 UTC - in response to Message 2022877. not a panic, but an observation. I've noticed all of my systems experiencing a handful of stuck downloads. it's fixed easily just hitting the "Retry Now" button. Been occurring for a while, along with the occasional upload taking a second attempt to get through, although it has been occurring more often over the last couple of days. Grant Darwin NT ID: 2022950 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 2022960 - Posted: 13 Dec 2019, 7:13:38 UTC Was clear for a while but starting to get sticky again ... ID: 2022960 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 2022988 - Posted: 13 Dec 2019, 17:14:50 UTC still seeing download issues persisting. someone want to shoot off a bat signal? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 2022988 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2023008 - Posted: 13 Dec 2019, 19:57:54 UTC I see half of each download request go to immediate backoff at the start of the download. I assume this is caused by the added new strain to the servers from the cache limit adjustment. Any server fine tuning that can ameliorate the issue? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2023008 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2023026 - Posted: 13 Dec 2019, 22:08:59 UTC - in response to Message 2023008. Last modified: 13 Dec 2019, 22:32:09 UTC I see half of each download request go to immediate backoff at the start of the download. I assume this is caused by the added new strain to the servers from the cache limit adjustment. Same here, occurs on my Linux system (no Hosts file setting). No issues on Windows system (Hosts file set to Georgem). I can't see it having anything to do with the cache limit adjustment unless somehow the larger Work in progress has an impact on Vader- while it does to transition work, I can't see the increase in Work-in-progress affecting that unless it's resulting in data no longer being cached. Likewise for it's Assimilator work. When it comes to downloads, Vader has always had issues over the years. Edit- just as I was about to go hunting for my Linux Hosts file & edit it, the downloads started downloading without assistance again. Grant Darwin NT ID: 2023026 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2023029 - Posted: 13 Dec 2019, 22:52:50 UTC Yes I just brought the daily driver back online and had to download over a hundred tasks to try and refill the cache depleted by the backed off downloads. They came down with no issues. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2023029 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2023068 - Posted: 14 Dec 2019, 9:00:41 UTC Splitters struggling again. After a large overshoot the Ready-to-send has fallen by over 300k in under 2 hours, and still heading south. Grant Darwin NT ID: 2023068 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 2023116 - Posted: 14 Dec 2019, 16:27:57 UTC We have between 1-2 hours (262k) in the RTS queue, plus whatever it can split, but it isn't splitting fast enough. There is over 7.2 million out in the field, so hopefully everyone has enough WUs for a bit while the system is slow to split. I can only assume it is busy validating or assimilating or deleting and that is why splitting has slowed down. Hopefully splitting will pick up again when whatever is keeping it busy now is done. ID: 2023116 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 2023267 - Posted: 16 Dec 2019, 1:06:36 UTC - in response to Message 2023068. Hi Grant, I see we're still experiencing server problems, and I've not heard anything about a shortage of CPU tasks, but I have not been able to get any on one of my systems. Getting plenty of GPU tasks , but no CPU tasks. Anyone else having this problem? I did install the new Boinc update but I doubt that that is the problem. Allen ID: 2023267 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2023269 - Posted: 16 Dec 2019, 1:16:03 UTC - in response to Message 2023267. Which host is giving you issues? I see cpu tasks received today on your hosts. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2023269 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 2023271 - Posted: 16 Dec 2019, 1:23:12 UTC - in response to Message 2023269. Yes, all others are doing okay for now. The one in question is ID: 8048221. Last server response says that it is out of work, but I have no CPU work. Allen ID: 2023271 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34754 Credit: 261,360,520 RAC: 489	Message 2023273 - Posted: 16 Dec 2019, 1:30:58 UTC Your CPU requests were likely in "back off request mode" and that can be for as long as 4 odd days. There was a simple way to check and rectify that with the ancient BOINC version that I use to use, but I can't seem find out how to do that yet with the much newer version that I'm using now. Cheers. ID: 2023273 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 2023276 - Posted: 16 Dec 2019, 1:35:06 UTC - in response to Message 2023273. Last modified: 16 Dec 2019, 1:36:32 UTC I guess I should explain my whole situation, just so there is nothing left to chance. For the last two days I have been given the response from the servers on request for work, that the servers may be down. I had had about 25 tasks to report, but it would not report them. Not knowing what else to do, I did a reset, as much as I hated to and loaded the newest version of Boinc. That got me connected again, but only able to receive GPU tasks. Well, that's the whole sad story. I was using version 7.6.2, I think. Allen ID: 2023276 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2023277 - Posted: 16 Dec 2019, 1:36:47 UTC - in response to Message 2023273. Your CPU requests were likely in "back off request mode" and that can be for as long as 4 odd days. There was a simple way to check and rectify that with the ancient BOINC version that I use to use, but I can't seem find out how to do that yet with the much newer version that I'm using now. Cheers. If you have the Manager running, highlight Seti in the projects tab and then select Properties. That shows you the backoff timeouts for both the cpu and gpu. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2023277 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 2023278 - Posted: 16 Dec 2019, 1:38:56 UTC - in response to Message 2023277. General URL http://setiathome.berkeley.edu/ User name AllenIN Team name Resource share 100 Disk usage 308.82 MB Computer ID 8708442 Suspended via GUI no Don't request tasks no Host location home Tasks completed 5,683 Tasks failed 1 Credit User 54,237,194 total, 30,464.61 average Host 384,969 total, 1,542.58 average Scheduling Scheduling priority -1.04 Last scheduler reply 12/15/2019 8:17:36 PM This is all I see there. Allen ID: 2023278 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2023279 - Posted: 16 Dec 2019, 1:39:47 UTC - in response to Message 2023276. I guess I should explain my whole situation, just so there is nothing left to chance. For the last two days I have been given the response from the servers on request for work, that the servers may be down. I had had about 25 tasks to report, but it would not report them. Not knowing what else to do, I did a reset, as much as I hated to and loaded the newest version of Boinc. That got me connected again, but only able to receive GPU tasks. Well, that's the whole sad story. I was using version 7.6.2, I think. Allen You are running anonymous platform. Do you still have cpu applications defined in your app_info? If running the Manager use the Event Logging flags to set cpu_sched_debug and see if you are even requesting cpu work. For a complete check, set work_fetch_debug and cpu_sched_status flags in the Event Log. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2023279 ·

AllenIN Volunteer tester Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311	Message 2023280 - Posted: 16 Dec 2019, 1:42:18 UTC - in response to Message 2023279. I'm sorry,but the log I sent you is from the wrong system. I'll get on the other system. ID: 2023280 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.