The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 94 · Next

AuthorMessage
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2027791 - Posted: 15 Jan 2020, 23:18:08 UTC - in response to Message 2027749.  

Scheduler request happened and it took about 30 seconds to respond, but it responded and acknowledged all of them.

Didn't get any new tasks though.


Update:

2020-01-15 16:58:59 SETI@home [sched_op_debug] Starting scheduler request
2020-01-15 16:58:59 SETI@home Sending scheduler request: To fetch work.
2020-01-15 16:58:59 SETI@home Reporting 2 completed tasks, requesting new tasks
2020-01-15 16:58:59 SETI@home [sched_op_debug] CPU work request: 1685721.26 seconds; 0.00 idle CPUs
2020-01-15 16:59:04 SETI@home Scheduler request completed:
got 21 new tasks
2020-01-15 16:59:04 SETI@home [sched_op_debug] Server version 709
2020-01-15 16:59:04 SETI@home Project requested delay of 303 seconds
2020-01-15 16:59:04 SETI@home [sched_op_debug] estimated total CPU job duration: 69015 seconds
2020-01-15 16:59:04 SETI@home [sched_op_debug] Deferring communication for 5 min 3 sec
2020-01-15 16:59:04 SETI@home [sched_op_debug] Reason: requested by project


I received some work. Once.

So it's just going to be slow-going, but it'll all work out eventually. Be patient.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2027791 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2027792 - Posted: 15 Jan 2020, 23:18:16 UTC - in response to Message 2027787.  
Last modified: 15 Jan 2020, 23:22:09 UTC

The 200/call (if that's what is being talked about) is a hard limit.
Unless I misunderstand, it's more about having increased the total CPU limit from 100 to 200, and the total per GPU limit from 100 to 400 (now 300). This caught a lot of people by surprise when it happened (again, lack of communication) and required rethinking cache size settings for folks.
[edit] Would also be nice if those cache limits could be set per-project instead of (or in addition to) per-client across all projects, but it is what it is.
ID: 2027792 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 2027793 - Posted: 15 Jan 2020, 23:22:30 UTC

During my 20+ years with SETI we have had several outages that have lasted this long or longer. The good thing is that they are truly very few, and very far apart. If you look at the major corporations with large staff who have had many outages over the years, you have to wonder how Berkeley's very small staff has done so well! (Actually, I know why, because I have visited them. They are some of the most dedicated, enthusiastic, and extremely intelligent people I have ever met!!!)

As to the questions about the backoff times getting longer and longer, it is a very astute bit of programming by the Berkeley staff. You see, each time the software has another backoff without a successful download, it lengthens the backoff time. This is necessary because, as time goes on, more and more computers begin begging for tasks. Without this code, when SETI's servers do come back online, the massive request for tasks creates a DOS attack that overloads their servers. This causes a new problem, which then greatly lengthens the time it takes to get back to normal. (Think of how effective deliberate DOS attacks have been at bringing down major websites for long periods of time.)

Don't worry, though, they even thought to code an automatic reset back to a shorter wait time before the backoff gets too long. This ensures that none of us will be left waiting too long before our computers ask for tasks.

Like I said, the staff at Berkeley is very good at what they do!!! :-)
ID: 2027793 · Report as offensive
Thomas Womack
Volunteer tester
Avatar

Send message
Joined: 7 Sep 03
Posts: 4
Credit: 4,912,298
RAC: 64
United States
Message 2027794 - Posted: 15 Jan 2020, 23:26:38 UTC

Hi all, my first posting here even though I have been processing work units for years and NEVER had a problem getting timely downloads or uploads accomplished until today...it seems that something is very much amiss with the seti servers that isnt reflected on the status page or maybe I just dont know how to read it. At any rate, cant seem to get new work units at all today and wondering if the powers that be know about the problem or if this is new? Would love an update on whats happening so I know what to do with the many devices I have trying to run this program. If I should just shut them off and wait for a large star to appear in the western sky? Could we have an update on whats happening????

thanks,

Tom
ID: 2027794 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2027796 - Posted: 15 Jan 2020, 23:32:08 UTC

I believe I mentioned that some time ago. To lower the Total number of outstanding tasks on the Server simply lower the Cache limit to One Day. The machines that can complete 5000 tasks a day will get 5000 tasks a day no matter what kind of task limits you set. On the other hand, the machines that only complete 100 tasks a day will have 100 tasks on their machine instead of 1000. Why should you have all those tasks assigned to machines that don't need them? All you are doing is causing problems for the Server by having 1000 tasks on a machines that complete 100 a day.
Again, if a machine completes 10000 tasks a day, then it will get 10000 tasks a day no matter if it's in the In Progress column or the Pending column. If you lower the In Progress then the Pending will rise to reach 10000. It doesn't matter which column the tasks are in, it still totals 10000 to the Server at the end of the day.
ID: 2027796 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2027798 - Posted: 15 Jan 2020, 23:34:41 UTC

8.24 is now available on main for AMD GPUs: https://setiathome.berkeley.edu/apps.php

Just as Eric promised.
ID: 2027798 · Report as offensive
SalemWill

Send message
Joined: 30 Jul 05
Posts: 1
Credit: 228,091
RAC: 5
United States
Message 2027799 - Posted: 15 Jan 2020, 23:35:44 UTC - in response to Message 2027794.  

I am having the same issue. States there is no new work to be downloaded.
ID: 2027799 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2027800 - Posted: 15 Jan 2020, 23:37:44 UTC - in response to Message 2027796.  

... To lower the Total number of outstanding tasks on the Server simply lower the Cache limit to One Day. ..
Agreed. Would be more effective, but again the issue is that since it's a BOINC limit rather than a SETI limit, I'm not sure how you do that? Perhaps it's a reflection of how user and the internet have changed over the past 20 years since this was all originally designed. The model no longer fits, in that one regard.
ID: 2027800 · Report as offensive
Profile Pierre A Renaud
Avatar

Send message
Joined: 3 Apr 99
Posts: 998
Credit: 9,101,544
RAC: 65
Canada
Message 2027801 - Posted: 15 Jan 2020, 23:39:37 UTC - in response to Message 2027794.  

A normal tuesday maintenance outage turned into a (lengthy) 24+ hrs outrage (for reasons to be specified by the staff when they can), which is rather rare. Now the servers are catching up with demand. Come to this thread to get the latest news on server Issues / Outages.

Hi all, my first posting here even though I have been processing work units for years and NEVER had a problem getting timely downloads or uploads accomplished until today...it seems that something is very much amiss with the seti servers that isnt reflected on the status page or maybe I just dont know how to read it. At any rate, cant seem to get new work units at all today and wondering if the powers that be know about the problem or if this is new? Would love an update on whats happening so I know what to do with the many devices I have trying to run this program. If I should just shut them off and wait for a large star to appear in the western sky? Could we have an update on whats happening????

thanks,

Tom

Apr 3, 1999 - May 3, 2020
ID: 2027801 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2027802 - Posted: 15 Jan 2020, 23:39:55 UTC - in response to Message 2027773.  

Good News - Beta is giving out work
Bad News - Main is somewhat constipated with the RTS at >1,000,000 and nothing getting out (for me)
As others have said - something has been done to the servers.


. . You are not alone ...

Stephen

:(
ID: 2027802 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2027803 - Posted: 15 Jan 2020, 23:40:01 UTC - in response to Message 2027798.  

8.24 is now available on main for AMD GPUs: https://setiathome.berkeley.edu/apps.php

Just as Eric promised.


Will that help with the late model Amd gpu headaches or is that strictly a driver fix?

Tom
A proud member of the OFA (Old Farts Association).
ID: 2027803 · Report as offensive
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2027804 - Posted: 15 Jan 2020, 23:41:12 UTC - in response to Message 2027790.  

@ rob smith

Hey! I remember those days! Took 96 hours (4 days) on my 32MB 133 Mhz PC for a single WU.
ID: 2027804 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2027806 - Posted: 15 Jan 2020, 23:43:12 UTC
Last modified: 15 Jan 2020, 23:49:27 UTC

Wed 15 Jan 2020 05:41:54 PM CST | SETI@home | Scheduler request completed: got 49 new tasks

The next update was nada tasks....
Tom
A proud member of the OFA (Old Farts Association).
ID: 2027806 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2027807 - Posted: 15 Jan 2020, 23:45:07 UTC - in response to Message 2027784.  

The server status page seems to update only once every few hours but the last time it updated it said there was over a million results ready to send. However even when that that information was fresh, my both computers got just 'Project has no tasks available' over and over. Are the anonymous systems being discriminated against again like during the christmas?

Edit: right after I typed that my bigger box received over a hundred tasks!


. . The Phantom of the Fora strikes again ...

Stephen

:)
ID: 2027807 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2027809 - Posted: 15 Jan 2020, 23:46:54 UTC - in response to Message 2027785.  

For the fast majority of users (maybe more than 50 K), who run on a "set & forget" and produces maybe 20 WU per day or less a large WU cache is not needed and unnecessary increases the size of the DB. That change is what we are talking about.
And the proper way to deal with that, for users of any production volume, is to set realistic cache size limits so that the process can self-regulate, rather than flailing about trying to find a sweet spot in externally imposed limits. If everyone set their caches for a max of 1-2 days, and stuck to that, actual device limits would be unneeded. Assuming, of course, that the client calculated the requirement accurately.


. . +1

. . Personally I have always argued that one days worth of work is plenty for a cache size.

Stephen

<shrug>
ID: 2027809 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2027810 - Posted: 15 Jan 2020, 23:48:34 UTC - in response to Message 2027787.  

The 200/call (if that's what is being talked about) is a hard limit. The actual "ready for dispatch" queue is 200 work units long, so anyone needing more than tha is going to need more than 1 call to fill their cache.


. . Nope, it's the 200/CPU and 300/GP limit that is causing the angst ...

Stephen

<shrug>
ID: 2027810 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2027812 - Posted: 15 Jan 2020, 23:55:05 UTC - in response to Message 2027798.  

8.24 is now available on main for AMD GPUs: https://setiathome.berkeley.edu/apps.php
Just as Eric promised.


. . Now to get that message to the card owners ...

Stephen

:)
ID: 2027812 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2027814 - Posted: 16 Jan 2020, 0:08:15 UTC - in response to Message 2027800.  

... To lower the Total number of outstanding tasks on the Server simply lower the Cache limit to One Day. ..
Agreed. Would be more effective, but again the issue is that since it's a BOINC limit rather than a SETI limit, I'm not sure how you do that?
The number of tasks you receive per period of time is controlled by the the Apps APR number. That's why you receive few tasks until the first 11 tasks are completed and the APR number is set. The higher the APR the More tasks are sent. This is why the Bug in the APR number on Server version 715 is so troubling, the Server version 715 will hang the APR number after about a day or so instead of updating the number after tasks are completed. Hopefully the Cern people are fixing that bug as well as the Anonymous platform Bug.
ID: 2027814 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2027815 - Posted: 16 Jan 2020, 0:15:05 UTC

What is blocking the tasks in the 'Results ready to send' queue from going to the clients requesting work?

The rrts is so high the splitters have stopped. And it stays high despite of that. So it looks like no one is getting much work. I got one big bunch of tasks and only 'project has no jobs' before and after that.
ID: 2027815 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2027817 - Posted: 16 Jan 2020, 0:20:46 UTC - in response to Message 2027815.  

I think it's the same for all of us, Villa Saari. The servers are probably busy with other tasks for incoming completed work and are prioritizing that over sending out new tasks. Just my guess.
ID: 2027817 · Report as offensive
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.