The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 94 · Next

AuthorMessage
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2029562 - Posted: 27 Jan 2020, 21:42:12 UTC - in response to Message 2029559.  
Last modified: 27 Jan 2020, 21:42:33 UTC

Yup, stuck uploads on and off for days/weeks.
They clear though without any intervention.

Problem with the fast multi GPU hosts is that sometimes they doesn't clear fast enough. In my case with 20+ stuck uploads, BOINC won´t request new work.
ID: 2029562 · Report as offensive
Profile xpozd
Avatar

Send message
Joined: 26 Jan 15
Posts: 88
Credit: 280,183
RAC: 1
Canada
Message 2029563 - Posted: 27 Jan 2020, 21:44:02 UTC - in response to Message 2029505.  

@Richard Haselgrove
thanks for that, i re-installed everything with newer versions,
that took care of that issue with the messages.
and i can download new tasks now again.

I appreciate getting new tasks now.
but its like the tasks are showing a countdown for the deadline.
each task i did get shows 52d,14:37:xx the xx changes like a countdown.
Im guessing thats normal, i just never seen that before.
ill just run it as usual and see where it goes.

thanks

  • win7starter
  • boinc: 7.14.2
  • boinc tasks: 1.78
  • Lunatics Win32 v0.44

ID: 2029563 · Report as offensive
Profile xpozd
Avatar

Send message
Joined: 26 Jan 15
Posts: 88
Credit: 280,183
RAC: 1
Canada
Message 2029588 - Posted: 28 Jan 2020, 0:01:47 UTC - in response to Message 2029563.  

each task i did get shows 52d,14:37:xx the xx changes like a countdown.
Im guessing thats normal, i just never seen that before.


turns out thats a setting on boinc tasks that i usually dont use.
so i guess all is solved for now.

  • win7starter
  • boinc: 7.14.2
  • boinc tasks: 1.78
  • Lunatics Win32 v0.44

ID: 2029588 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2029604 - Posted: 28 Jan 2020, 2:17:52 UTC
Last modified: 28 Jan 2020, 3:00:19 UTC

I am amazed as I write the return rate is 7000 more results per hour than their is total in the RTS queue. I am aware there are a few short years out there plus some noise bombs.

I must say there has been a vast purging improvement because I have the result number in the 700,000 range I cannot remember the last time I saw it that low
RTS is now more than the return rate
ID: 2029604 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029606 - Posted: 28 Jan 2020, 3:16:58 UTC - in response to Message 2029604.  
Last modified: 28 Jan 2020, 3:23:30 UTC

I am amazed as I write the return rate is 7000 more results per hour than their is total in the RTS queue.
They are intentionally keeping the RTS queue short to keep the result table in the database from growing too big to fit in RAM.
If/when the huge validation queue shrinks, then the RTS queue should return to its normal size.

In my opinion they should this coming downtime shut only the scheduler and splitters down but let the rest of the system run until validators, assimilators and purgers have no more work to do and only then shut the rest of the system down and do their normal downtime stuff.
ID: 2029606 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1646
Credit: 12,921,799
RAC: 89
New Zealand
Message 2029616 - Posted: 28 Jan 2020, 5:09:29 UTC - in response to Message 2029606.  

I am amazed as I write the return rate is 7000 more results per hour than their is total in the RTS queue.
They are intentionally keeping the RTS queue short to keep the result table in the database from growing too big to fit in RAM.
If/when the huge validation queue shrinks, then the RTS queue should return to its normal size.

In my opinion they should this coming downtime shut only the scheduler and splitters down but let the rest of the system run until validators, assimilators and purgers have no more work to do and only then shut the rest of the system down and do their normal downtime stuff.

Good point I had forgotten about that I thought it was only before last week's outage. I read in another thread that this week's outage may be 24 hours
ID: 2029616 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2029626 - Posted: 28 Jan 2020, 7:48:26 UTC - in response to Message 2029557.  

I have had stuck upload problems with both 7.14.2 & 7.16.3 for some weeks I guess.

Ditto

I've never had a stuck up load, just plenty uploads instantly timing out on their first attempt. Over the last couple of days there have been periods when they would time out (not always instantly, but within a few seconds) on their second & even a third attempt.

Downloads on the other hand, i been having plenty of stickys there. Came home to many of them today- mostly on my Linux system (no hosts file setting). Sitting there, timinr counting away but no data moving. Disabling & re-enabling network access generally gets them going again.
Grant
Darwin NT
ID: 2029626 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22815
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2029636 - Posted: 28 Jan 2020, 9:07:11 UTC - in response to Message 2029606.  

Shutting everything down apart from the splitters & scheduler wouldn't help as they are both in continuous dialogue with the main database, which wouldn't be running.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2029636 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2029642 - Posted: 28 Jan 2020, 10:10:41 UTC

Both the AP and MB deleters appear to have given up, they're both developing significant backlogs.
Grant
Darwin NT
ID: 2029642 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029653 - Posted: 28 Jan 2020, 13:11:49 UTC - in response to Message 2029636.  

Shutting everything down apart from the splitters & scheduler wouldn't help as they are both in continuous dialogue with the main database, which wouldn't be running.
I said shutting down just scheduler and splitters. The database would obviously be still running but without load from people asking/reporting work so that the backlogged stuff would have a good opportunity to get their stuff done.
ID: 2029653 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2029671 - Posted: 28 Jan 2020, 15:56:29 UTC

Still up?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2029671 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7381
Credit: 44,181,323
RAC: 238
United States
Message 2029675 - Posted: 28 Jan 2020, 16:37:37 UTC - in response to Message 2029671.  

Still up?

Hi Ian,

Could the maintenance be postponed in order to allow the DB and servers to settle in after last weeks DB redo?

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2029675 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2029678 - Posted: 28 Jan 2020, 16:48:11 UTC - in response to Message 2029675.  

Don't think so, they usually put out a notice if they plan to skip a maintenance day.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2029678 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029679 - Posted: 28 Jan 2020, 16:53:46 UTC - in response to Message 2029678.  

Don't think so, they usually put out a notice if they plan to skip a maintenance day.
Having said that, we normally have an automatic 'advance warning' notice on the front page from midnight PST on the day in question - and it ain't there.
ID: 2029679 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2029680 - Posted: 29 Jan 2020, 4:36:15 UTC

and we are back!!
ID: 2029680 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029682 - Posted: 29 Jan 2020, 4:43:28 UTC - in response to Message 2029680.  
Last modified: 29 Jan 2020, 5:02:58 UTC

and we are back!!
Back yes, but how long until we can actually get new work? My GPU still has plenty of stuff to do but the CPU is dry in 30 minutes.

Also looks like I can't get any CPU work for a long time even if some was available. My client wants to report all the GPU results first, so my CPU queue stays 'full' from completed work and server won't give me more. Also suspending completed tasks doesn't apparently prevent the client from reporting them, so I can't force it to report CPU stuff first.
ID: 2029682 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 2029685 - Posted: 29 Jan 2020, 5:41:26 UTC
Last modified: 29 Jan 2020, 5:45:32 UTC

I am getting "project servers may be temporarily down" message, apparently since around 17.15Z on 28/1/20 but the SSP does not show an associated lag - anyone else?
Member of the 20 Year Club



ID: 2029685 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029686 - Posted: 29 Jan 2020, 5:57:29 UTC - in response to Message 2029685.  
Last modified: 29 Jan 2020, 6:00:21 UTC

I am getting "project servers may be temporarily down" message, apparently since around 17.15Z on 28/1/20 but the SSP does not show an associated lag - anyone else?
Servers are congested from all the empty hosts bombing it asking for work, so most requests fail. In this situation you can make the requests work a bit more reliably if you set 'no new work' so that your client is only reporting work but not asking for more.
ID: 2029686 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2029687 - Posted: 29 Jan 2020, 5:59:42 UTC - in response to Message 2029685.  

I am getting "project servers may be temporarily down" message
Never seen that message before.
Just the usual "Project has no tasks available". Scheduler response time is better than it has has been recently after outages, and have been able to report work with no issues.
But the forums are in extreme slow motion at present, and i expect it'll be hours before we can get any work- it takes a long time after the system comes up after an outage for the splitters to start up again these days.
Grant
Darwin NT
ID: 2029687 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029688 - Posted: 29 Jan 2020, 6:06:15 UTC - in response to Message 2029687.  
Last modified: 29 Jan 2020, 6:07:58 UTC

Never seen that message before. Just the usual "Project has no tasks available". Scheduler response time is better than it has has been recently after outages, and have been able to report work with no issues.
I have had some issues. Just after the downtime ended I was able to report tasks for a while without problems but when the big masses woke up from their long backoffs, I started getting just timeouts and internal server errors. 10 or 15 minutes ago the requests started working again.

And right now I got my first new task after dt! Just a single one.
ID: 2029688 · Report as offensive
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.