The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 52 · Next

AuthorMessage
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2022933 - Posted: 13 Dec 2019, 2:45:52 UTC - in response to Message 2022904.  

@ Profile Wiggo "Democratic Socialist" and several others
Yep, those sticking downloads are getting very annoying here this morning. :-(

I know you are running Linux, but one of the Linux people could convert my Windows CMD file
into a CHRON job to do much the same thing.

This CMD file looks into BOINC to see if anything is in the "Transfer" queue. If there is, it sends a
request to BOINC to retry each of my four projects, then sleeps for a minute before retrying again.
When the Transfer queue is empty, it sleeps for 20 minutes.
Once an hour, it requests BOINC to UPDATE.

I only start this when we are having transfer problems, and stop it when they are cleared up.

=========================================================================================================
@echo off
prompt $T$G
Setlocal EnableDelayedExpansion
SET /A UpdateTime=0

cd /d S:\Program Files\BOINC

:again
set /A WaitTime=1200

for /F "tokens=1,2*" %%I in ('boinccmd.exe --get_file_transfers') do ( if /I "%%I"=="name:" ( set FN=%%J
set /A WaitTime = 60 )
if /I "%%J"=="active:" if /I "%%K"=="no" (
boinccmd.exe --file_transfer http://setiathome.berkeley.edu !FN! retry 2> NUL
boinccmd.exe --file_transfer http://einstein.phys.uwm.edu/ !FN! retry 2> NUL
boinccmd.exe --file_transfer https://lhcathome.cern.ch/lhcathome/ !FN! retry 2> NUL
boinccmd.exe --file_transfer http://boinc.bakerlab.org/rosetta/ !FN! retry 2> NUL
)
)
if %WaitTime% EQU 1200 set /A UpdateTime=%UpdateTime%+1200

if %UpdateTime% GEQ 3600 (boinccmd.exe --project http://setiathome.berkeley.edu update
set /A UpdateTime=0 )

Choice /C YQ /D Y /T %WaitTime% /M "%Date% %Time% Waiting for %WaitTime% seconds. Do it again now? Press Y for Yes, Q quit"

if %ERRORLEVEL%==1 goto :again
=========================================================================================================
ID: 2022933 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022934 - Posted: 13 Dec 2019, 2:53:40 UTC - in response to Message 2022932.  

Fwiw, I set cc_config.xml <max_file_xfers>5</max_file_xfers>, down from 8, and I'm not seeing further stuck transfers. Could just be a coincidence.

Just tried this. Didn't make any difference. Downloads at max 5 still went to instant backoffs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022934 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2022936 - Posted: 13 Dec 2019, 3:04:18 UTC - in response to Message 2022933.  

I know you are running Linux, but one of the Linux people could convert my Windows CMD file
into a CHRON job to do much the same thing.
Cute!
ID: 2022936 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2022950 - Posted: 13 Dec 2019, 5:14:37 UTC - in response to Message 2022877.  

not a panic, but an observation.

I've noticed all of my systems experiencing a handful of stuck downloads. it's fixed easily just hitting the "Retry Now" button.
Been occurring for a while, along with the occasional upload taking a second attempt to get through, although it has been occurring more often over the last couple of days.
Grant
Darwin NT
ID: 2022950 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2022960 - Posted: 13 Dec 2019, 7:13:38 UTC

Was clear for a while but starting to get sticky again ...
ID: 2022960 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022988 - Posted: 13 Dec 2019, 17:14:50 UTC

still seeing download issues persisting.

someone want to shoot off a bat signal?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022988 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023008 - Posted: 13 Dec 2019, 19:57:54 UTC

I see half of each download request go to immediate backoff at the start of the download. I assume this is caused by the added new strain to the servers from the cache limit adjustment.

Any server fine tuning that can ameliorate the issue?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023008 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2023026 - Posted: 13 Dec 2019, 22:08:59 UTC - in response to Message 2023008.  
Last modified: 13 Dec 2019, 22:32:09 UTC

I see half of each download request go to immediate backoff at the start of the download. I assume this is caused by the added new strain to the servers from the cache limit adjustment.
Same here, occurs on my Linux system (no Hosts file setting). No issues on Windows system (Hosts file set to Georgem).
I can't see it having anything to do with the cache limit adjustment unless somehow the larger Work in progress has an impact on Vader- while it does to transition work, I can't see the increase in Work-in-progress affecting that unless it's resulting in data no longer being cached. Likewise for it's Assimilator work.
When it comes to downloads, Vader has always had issues over the years.


Edit- just as I was about to go hunting for my Linux Hosts file & edit it, the downloads started downloading without assistance again.
Grant
Darwin NT
ID: 2023026 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023029 - Posted: 13 Dec 2019, 22:52:50 UTC

Yes I just brought the daily driver back online and had to download over a hundred tasks to try and refill the cache depleted by the backed off downloads. They came down with no issues.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023029 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2023068 - Posted: 14 Dec 2019, 9:00:41 UTC

Splitters struggling again.
After a large overshoot the Ready-to-send has fallen by over 300k in under 2 hours, and still heading south.
Grant
Darwin NT
ID: 2023068 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2023116 - Posted: 14 Dec 2019, 16:27:57 UTC

We have between 1-2 hours (262k) in the RTS queue, plus whatever it can split, but it isn't splitting fast enough. There is over 7.2 million out in the field, so hopefully everyone has enough WUs for a bit while the system is slow to split. I can only assume it is busy validating or assimilating or deleting and that is why splitting has slowed down. Hopefully splitting will pick up again when whatever is keeping it busy now is done.
ID: 2023116 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 2023267 - Posted: 16 Dec 2019, 1:06:36 UTC - in response to Message 2023068.  

Hi Grant,

I see we're still experiencing server problems, and I've not heard anything about a shortage of CPU tasks, but I have not been able to get any on one of my systems. Getting plenty of GPU tasks , but no CPU tasks. Anyone else having this problem? I did install the new Boinc update but I doubt that that is the problem.

Allen
ID: 2023267 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023269 - Posted: 16 Dec 2019, 1:16:03 UTC - in response to Message 2023267.  

Which host is giving you issues? I see cpu tasks received today on your hosts.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023269 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 2023271 - Posted: 16 Dec 2019, 1:23:12 UTC - in response to Message 2023269.  

Yes, all others are doing okay for now. The one in question is ID: 8048221. Last server response says that it is out of work, but I have no CPU work.

Allen
ID: 2023271 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34754
Credit: 261,360,520
RAC: 489
Australia
Message 2023273 - Posted: 16 Dec 2019, 1:30:58 UTC

Your CPU requests were likely in "back off request mode" and that can be for as long as 4 odd days.

There was a simple way to check and rectify that with the ancient BOINC version that I use to use, but I can't seem find out how to do that yet with the much newer version that I'm using now.

Cheers.
ID: 2023273 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 2023276 - Posted: 16 Dec 2019, 1:35:06 UTC - in response to Message 2023273.  
Last modified: 16 Dec 2019, 1:36:32 UTC

I guess I should explain my whole situation, just so there is nothing left to chance. For the last two days I have been given the response from the servers on request for work, that the servers may be down. I had had about 25 tasks to report, but it would not report them. Not knowing what else to do, I did a reset, as much as I hated to and loaded the newest version of Boinc. That got me connected again, but only able to receive GPU tasks.
Well, that's the whole sad story. I was using version 7.6.2, I think.

Allen
ID: 2023276 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023277 - Posted: 16 Dec 2019, 1:36:47 UTC - in response to Message 2023273.  

Your CPU requests were likely in "back off request mode" and that can be for as long as 4 odd days.

There was a simple way to check and rectify that with the ancient BOINC version that I use to use, but I can't seem find out how to do that yet with the much newer version that I'm using now.

Cheers.

If you have the Manager running, highlight Seti in the projects tab and then select Properties. That shows you the backoff timeouts for both the cpu and gpu.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023277 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 2023278 - Posted: 16 Dec 2019, 1:38:56 UTC - in response to Message 2023277.  

General
URL
http://setiathome.berkeley.edu/
User name
AllenIN
Team name
Resource share
100
Disk usage
308.82 MB
Computer ID
8708442
Suspended via GUI
no
Don't request tasks
no
Host location
home
Tasks completed
5,683
Tasks failed
1
Credit
User
54,237,194 total, 30,464.61 average
Host
384,969 total, 1,542.58 average
Scheduling
Scheduling priority
-1.04
Last scheduler reply
12/15/2019 8:17:36 PM
This is all I see there.

Allen
ID: 2023278 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023279 - Posted: 16 Dec 2019, 1:39:47 UTC - in response to Message 2023276.  

I guess I should explain my whole situation, just so there is nothing left to chance. For the last two days I have been given the response from the servers on request for work, that the servers may be down. I had had about 25 tasks to report, but it would not report them. Not knowing what else to do, I did a reset, as much as I hated to and loaded the newest version of Boinc. That got me connected again, but only able to receive GPU tasks.
Well, that's the whole sad story. I was using version 7.6.2, I think.

Allen

You are running anonymous platform. Do you still have cpu applications defined in your app_info?

If running the Manager use the Event Logging flags to set cpu_sched_debug and see if you are even requesting cpu work. For a complete check, set work_fetch_debug and cpu_sched_status flags in the Event Log.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023279 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 2023280 - Posted: 16 Dec 2019, 1:42:18 UTC - in response to Message 2023279.  

I'm sorry,but the log I sent you is from the wrong system. I'll get on the other system.
ID: 2023280 · Report as offensive
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.