The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 94 · Next

AuthorMessage
Niteryder
Volunteer tester

Send message
Joined: 1 Mar 99
Posts: 64
Credit: 22,663,988
RAC: 18
United States
Message 2025517 - Posted: 30 Dec 2019, 1:44:26 UTC

My 100+ downloads just went through.
ID: 2025517 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2025521 - Posted: 30 Dec 2019, 2:08:21 UTC - in response to Message 2025496.  

IMHO the SETI powers need to rethink about the rise of the WU limits, sometimes roll back to a lower level (something like 150 CPU +200 GPU) could help the servers stability.

Thanks for voicing your opinion
ID: 2025521 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2025536 - Posted: 30 Dec 2019, 7:41:51 UTC - in response to Message 2025505.  

IMHO the SETI powers need to rethink about the rise of the WU limits, sometimes roll back to a lower level (something like 150 CPU +200 GPU) could help the servers stability.

. . If they can confirm that is the problem. It would be a shame though, I am liking the 300 limit :(
Stephen
:(

I agree but after the change the system never returned to the normal, always something is happening from time to time.
If you look at the SSP each time the total WU reaches close to 23 MM something weird happening, download stalls, no new work, slow Web pages or spliting, etc. could be a hell of a coincidence of course.
Even post this msg takes a while.


. . Yes things are certainly 'not well', and some of the symptoms existed before the unfortunate change of the server software to 7.15, but that is what really brought things into disarray. But I guess when things are not going well being conservative makes sense so reducing the limits to what you suggested might at least give an indication if that is at all part of the problem..

. . Time to bite the bullet ... :(

Stephen

:(
ID: 2025536 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2025538 - Posted: 30 Dec 2019, 8:15:09 UTC

Well, looking at the graphs the Scheduler likes to take a bit of time off most days, but it's not a fixed time. And the length of time it takes off varies too.


Grant
Darwin NT
ID: 2025538 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2025539 - Posted: 30 Dec 2019, 8:27:39 UTC
Last modified: 30 Dec 2019, 8:27:50 UTC

Auto resend is on it seems

Mon 30 Dec 2019 09:25:36 CET | SETI@home | Sending scheduler request: To report completed tasks.
Mon 30 Dec 2019 09:25:36 CET | SETI@home | Reporting 183 completed tasks
Mon 30 Dec 2019 09:25:36 CET | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Scheduler request completed: got 212 new tasks
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task 22dc19ab.14994.13762.10.37.157_0 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_69621_NGC3379_0055.14940.409.22.45.150.vlar_0 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_68985_NGC3379_0053.15006.409.22.45.145.vlar_0 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_69621_NGC3379_0055.14940.409.22.45.174.vlar_0 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_68985_NGC3379_0053.14913.409.21.44.177.vlar_1 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_68985_NGC3379_0053.14924.409.22.45.176.vlar_1 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_68985_NGC3379_0053.14913.409.21.44.180.vlar_0 (expired)
Mon 30 Dec 2019 09:25:44 CET | SETI@home | Didn't resend lost task blc46_2bit_guppi_58692_68985_NGC3379_0053.15006.409.22.45.177.vlar_0 (expired)
ID: 2025539 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2025541 - Posted: 30 Dec 2019, 8:29:52 UTC - in response to Message 2025539.  

Auto resend is on it seems
Surprised the servers are still functioning if that is the case.
Grant
Darwin NT
ID: 2025541 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2025544 - Posted: 30 Dec 2019, 8:38:49 UTC

Yeh, it would appear that resends got turned on when the abortive update took place,and didn't get turned off when the reversion took place. It's a well known fact that resends puts such a high load on the database that everything else suffers
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2025544 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2025547 - Posted: 30 Dec 2019, 8:47:33 UTC

Well, it certainly wasn't on for me a couple of days ago. I waited a couple of days to make sure, then proceeded to recover them the Hard way. I was down to the last machine, about half way finished, and suddenly a normal attempt decided to 'Expire' All the remaining Ghosts. I hadn't done anything different from the previous successful procedures. I think the Server just decides to arbitrarily do whatever it wants at certain times. Any way, all my Ghosts are gone, and I had to do it manually.
ID: 2025547 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2025548 - Posted: 30 Dec 2019, 8:49:43 UTC - in response to Message 2025544.  

Yeh, it would appear that resends got turned on when the abortive update took place,and didn't get turned off when the reversion took place. It's a well known fact that resends puts such a high load on the database that everything else suffers
And it might explain why the Scheduler has been taking time off each day recently.
Maybe disabling Resend lost task before changing the server side limits again & see how things go for a while would be the way to go.
Grant
Darwin NT
ID: 2025548 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2025550 - Posted: 30 Dec 2019, 8:56:52 UTC

I have a problem getting CPU work on my main host for days now.

BOINC do ask for both GPU and CPU work on each request but don't send any CPU work. Tried lowering cache to 0.1 days, but then BOINC says cache is full, even with 0 CPU in cache. Tried getting WCG work, but BOINC says it doesn't need any work.

Looking at BOINC SETI properties is says

CPU task request deferred for 00:04:28
CPU task request deferral interval 00:10:00
So any idea what to do??
ID: 2025550 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2025553 - Posted: 30 Dec 2019, 9:12:19 UTC

Sit back and join the rest of us waiting.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2025553 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2025554 - Posted: 30 Dec 2019, 9:12:50 UTC - in response to Message 2025550.  

I have a problem getting CPU work on my main host for days now.
The one with the spoofed client?
You won't get any CPU work until the GPU reaches it's cache limit, or the server side limit.
Lowering the cache setting with the spoofed GPUs should allow you to get some CPU work- if the spoofed cache is full/has reached the server side limit.
is it actually asking for any CPU work? If not, i'd reduce the spoofing level & see if that lets it pickup some CPU work.

I'd suggest posting the Logs for work requests for people to take a look at to see what is going on.
Grant
Darwin NT
ID: 2025554 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2025557 - Posted: 30 Dec 2019, 9:48:25 UTC - in response to Message 2025554.  
Last modified: 30 Dec 2019, 9:48:54 UTC

With cache of 1 day it asks for both GPU and CPU work but don't get any CPU work. With a cache of 0.1 day, it says cache is full even though I have 0 CPU in cache.

I do use the spoofed client from the beginning I think, have not been a problem before.
ID: 2025557 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2025559 - Posted: 30 Dec 2019, 9:54:22 UTC
Last modified: 30 Dec 2019, 10:47:01 UTC

Another strange thing. I tried deselecting GPU work in prefs, then BOINC did only ask for CPU work BUT it still send GPU work.

Edit: tried going from 33 to 25 spoofed GPUs, then I got 35 CPU WUs. Hope it's working OK now :)
ID: 2025559 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2025567 - Posted: 30 Dec 2019, 12:23:07 UTC
Last modified: 30 Dec 2019, 12:37:31 UTC

If you run the old spoofed client it has a 10000 WU cache limit, with the new SETI limit a 33 then 33 x 300 + 200 ) or 10100 so close to the edge, that could be the cause of your problem.

You have 2 options

- Reduce the number of spoofed GPUs - like you done
or
-DL the 20000 WU Limit version from my repository.
ID: 2025567 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2025721 - Posted: 31 Dec 2019, 16:30:19 UTC

No outrage today.... it will be a Confluence: A Thorsday Outrage as the first of the decade! This must be an omen...

Weekly Outage will be on Thursday
Because of the holidays, we are conducting the weekly maintenance outage on Thursday this week.
31 Dec 2019, 16:09:25 UTC

ID: 2025721 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2025753 - Posted: 31 Dec 2019, 21:30:34 UTC - in response to Message 2025721.  

No outrage today.... it will be a Confluence: A Thorsday Outrage as the first of the decade! This must be an omen...

Weekly Outage will be on Thursday
Because of the holidays, we are conducting the weekly maintenance outage on Thursday this week.
31 Dec 2019, 16:09:25 UTC


So instead of a Tuesday "hangover" we get a Thursday "hangover" just in time to start drinking (again) on Friday.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2025753 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2025758 - Posted: 31 Dec 2019, 21:39:07 UTC

It maybe a Thursday "hangover" for you Tom, but it'll be P.O.E.T.S. Day down under. ;-)

Cheers.
ID: 2025758 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2025761 - Posted: 31 Dec 2019, 21:42:33 UTC - in response to Message 2025758.  

It maybe a Thursday "hangover" for you Tom, but it'll be P.O.E.T.S. Day down under. ;-)

Cheers.


Now that I have a reference, I would say its not just "down under" either. ;)

Tom
A proud member of the OFA (Old Farts Association).
ID: 2025761 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2025876 - Posted: 1 Jan 2020, 10:05:04 UTC

Just checked my systems to find a bunch of downloads in extended backoff. A couple of retries & Suspend/Enable Networking got them moving again.
Grant
Darwin NT
ID: 2025876 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.