The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 107 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2038882 - Posted: 19 Mar 2020, 9:47:40 UTC

Yep, back to doing real work. Just got a few hundred for each of the starving clients.
ID: 2038882 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2038898 - Posted: 19 Mar 2020, 11:45:12 UTC

Immediately after I got my caches filled, I started getting just '0 tasks' returns and the caches are depleting again...
ID: 2038898 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19065
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2038903 - Posted: 19 Mar 2020, 12:28:29 UTC - in response to Message 2038898.  

Immediately after I got my caches filled, I started getting just '0 tasks' returns and the caches are depleting again...

For me it was only for about 30 mins around 12 noon GMT that there were no new tasks.The cache has been re-filled since then.
ID: 2038903 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2038906 - Posted: 19 Mar 2020, 12:33:12 UTC

Starting to get "Project has no tasks available."

Looks like the brick fell off the accelerator pedal...
ID: 2038906 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2038907 - Posted: 19 Mar 2020, 12:33:43 UTC

Results returned and awaiting validation hits another record high. I wonder if it'll make to to 18 million by the 31st?
Grant
Darwin NT
ID: 2038907 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2038909 - Posted: 19 Mar 2020, 12:35:35 UTC - in response to Message 2038906.  

Starting to get "Project has no tasks available."

Looks like the brick fell off the accelerator pedal...
All the hosts that were in backoff mode are now contacting the Scheduler & returning work and asking for more. Demand exceeds supply.
Grant
Darwin NT
ID: 2038909 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2038937 - Posted: 19 Mar 2020, 15:03:01 UTC

And then I couldn't post to the BOINC forums anymore. Posts stay in perpetual trying to send mode. Oh well.
Ooh same here
ID: 2038937 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2038939 - Posted: 19 Mar 2020, 15:12:40 UTC - in response to Message 2038909.  

Starting to get "Project has no tasks available."

Looks like the brick fell off the accelerator pedal...
All the hosts that were in backoff mode are now contacting the Scheduler & returning work and asking for more. Demand exceeds supply.


. . Well I swapped over to the NBN tonight and now the wheels have well and truly fallen off. I can contact other sites but BOINC is borked. One machine uploads, reports and downloads just fine. The next machine can upload OK but attempts to report get "cannot contact server, internet is OK". The other two machine cannot upload anything, total fail and almost immediate project backoff. :(

. . AAArrrggghhh!!! :(

Stephen

:(
ID: 2038939 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19065
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2038940 - Posted: 19 Mar 2020, 15:16:42 UTC - in response to Message 2038939.  
Last modified: 19 Mar 2020, 15:17:40 UTC

I have no idea why but the word KISS popped into my head ;-)
ID: 2038940 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2039017 - Posted: 20 Mar 2020, 1:22:06 UTC

Apparently the servers dried up again.

The SSP stopped updating and both of my computers stopped getting any new work. Scheduler request still work but always return 0 tasks.

The number of results in the database was at a new record when the SSP still updated: 24.186 million.
ID: 2039017 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34766
Credit: 261,360,520
RAC: 489
Australia
Message 2039019 - Posted: 20 Mar 2020, 1:38:25 UTC

I can't say that I'm having any problems here at all really as my 2 main rigs are topping up on every 2nd or 3rd request.

Cheers.
ID: 2039019 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2039030 - Posted: 20 Mar 2020, 2:18:36 UTC

No new WU, eventually when one drops in my backyard, obviously a resend, is crunched almost instantly because my host is programmed to automatically start any resend first.

For now the large cache is holding. Hope that will be fixed during the night.
ID: 2039030 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2039035 - Posted: 20 Mar 2020, 2:51:25 UTC - in response to Message 2039030.  
Last modified: 20 Mar 2020, 2:54:00 UTC

This situation may sort itself out. Possibly when results and progress drop below a certain number work will get sent out again. I am just guessing I have no hard facts. I have 15 tasks mailing for my GPU and 23 currently running on my CPU when they are finished I will be out of work until the server gives me more. I think a lot of us are in this boat. As I posted what happened I got given around 20 CPU tasks. Good luck everyone
ID: 2039035 · Report as offensive     Reply Quote
Dr Who Fan
Volunteer tester
Avatar

Send message
Joined: 8 Jan 01
Posts: 3214
Credit: 715,342
RAC: 4
United States
Message 2039039 - Posted: 20 Mar 2020, 3:12:22 UTC

The server will probably limp along on what can be done remotely until further notice:
California Governor Issues ‘Stay at Home’ Order for Residents
ID: 2039039 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2039042 - Posted: 20 Mar 2020, 3:17:00 UTC

We have 14 partial blc files to split currently. The 16 splitting channels usually work on 11 files. The splitters are going at full tilt right now filling up the RTS queue (YES!), but I fear not only running out of files to split, but that the rate of splitting will be lower as it is splitting fewer files.

We still have Aricebo files, so we won't run out today though.

Just wondering if they will add more tomorrow. ???
ID: 2039042 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2039060 - Posted: 20 Mar 2020, 5:57:50 UTC

Well, the Web site is almost dead. Forums aren't completely dead- just mostly dead.
And the Scheduler is MIA again.
I'm sure we did this yesterday.
Grant
Darwin NT
ID: 2039060 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2039076 - Posted: 20 Mar 2020, 7:23:42 UTC - in response to Message 2039039.  

The server will probably limp along on what can be done remotely until further notice
The only things that can't be done remotely are hardware upgrades or pressing the reset button of a crashed and unresponsive server. Servers are managed remotely even when the person doing it is physically on site because servers in server racks rarely even have any local keyboards and displays.

And even when there is a local console, no one wants to use it unless he has to because server rooms are too cold and too noisy places to work in.
ID: 2039076 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2039078 - Posted: 20 Mar 2020, 7:43:41 UTC

What I think the Seti staff should do is to write and run a simple script that moves all the database rows of workunits waiting for assimilation that aren't waiting for any unreturned results, the result rows linked to those workunits and the workunit and result files associated with them into a separate backup storage.

That would remove about half of all the results in the database and the memory pressure they cause, which would probably resolve the server problems for a long enough time to last until the work distribution is stopped.

And than after it is stopped and the database starts shrinking, those backed up workunits could be returned back to the assimilation queue to be processed.
ID: 2039078 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2039079 - Posted: 20 Mar 2020, 7:51:12 UTC - in response to Message 2039076.  

The server will probably limp along on what can be done remotely until further notice
The only things that can't be done remotely are hardware upgrades or pressing the reset button of a crashed and unresponsive server. Servers are managed remotely even when the person doing it is physically on site because servers in server racks rarely even have any local keyboards and displays.

And even when there is a local console, no one wants to use it unless he has to because server rooms are too cold and too noisy places to work in.
I've visited two BOINC server rooms - the Einstein ATLAS cluster 10 years ago, and SETI's last summer. Both had a 'crash cart' parked in a corner - a trolley with monitor, keyboard and sundry useful tools. Probably a mouse too, though most server work seems to be done at the command line.

In SETI's case, each equipment rack is locked with its own security code. Eric opened one door to plug in the crash cart, and remotely shut down a different server which needed a hard disk replacing. That done and tested, he wheeled another trolley to the server we needed to upgrade, adjusted the working height, and slid that server out of the rack and onto the worksurface.

I don't remember it being seriously cold, but certainly 'pleasantly cool' compared to the California summer outside. And I've still got the disposable earplugs I was advised to wear. Souvenir!

Fun fact: Eric doesn't even have University authority to enter the server CoLo by himself. We had to meet Jeff Cobb outside to go through the formalities and be signed in.
ID: 2039079 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2039080 - Posted: 20 Mar 2020, 7:51:48 UTC - in response to Message 2039078.  

Sounds good. In saying this I do not know how to write such a thing plus is it worth writing one for 11 or 12 days?
ID: 2039080 · Report as offensive     Reply Quote
Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.