Panic Mode On (113) Server Problems?

Message boards : Number crunching : Panic Mode On (113) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 37 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1959759 - Posted: 11 Oct 2018, 16:37:10 UTC - in response to Message 1959756.  

Falling . . . falling . . . . . falling
The website never fixed the problems with the servers before the outage. And the servers still are not right. Too many tasks not validated. Impossible to look at any of your hosts because the task list never refreshes, you only see the spinner.
You mean the human being - probably singular - driving maintenance didn't do it. Not enough donations means not enough staff.
ID: 1959759 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1959763 - Posted: 11 Oct 2018, 16:54:58 UTC

Yesterday about this time of day there was an outage as well.
Possibly something to do with the heavy assimilation load plus an additional daily script????
ID: 1959763 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22189
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1959780 - Posted: 11 Oct 2018, 18:37:14 UTC

It's not dead, it's just slumbering in a dark corner....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1959780 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1959786 - Posted: 11 Oct 2018, 18:59:08 UTC
Last modified: 11 Oct 2018, 19:04:00 UTC

For everyone bellyacheing about the "No tasks available" tag when there are tasks shown as available:

That full number (500k+) of WU's are not in a table in memory - the table in memory that the scheduler uses to fill requests for work is much smaller,
(like 400 tasks, last time I checked...) and refills once a second. (at least, the last time someone explained this to me it was "refills once a second"…) When you get the
"no tasks available" message, it's this 400 WU buffer that is out... and with the long (5 minute plus a few seconds) wait between honored requests
for work, you may run into it repeatedly if demand for WU's is high...

In short, SETI is showing the NORMAL behavior for itself when there is high demand for WU's...
.

Hello, from Albany, CA!...
ID: 1959786 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1959800 - Posted: 11 Oct 2018, 21:05:45 UTC

No new work for 30+ mins, panic?
ID: 1959800 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1959801 - Posted: 11 Oct 2018, 21:07:06 UTC

The last work requests from any of my hosts that netting anything other than 0 was an hour ago. Caches falling fast.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1959801 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1959805 - Posted: 11 Oct 2018, 21:40:39 UTC - in response to Message 1959801.  

The last work requests from any of my hosts that netting anything other than 0 was an hour ago. Caches falling fast.


Alll is working fine from here. Receiving new work as normal.
ID: 1959805 · Report as offensive
Profile lunkerlander
Avatar

Send message
Joined: 23 Jul 18
Posts: 82
Credit: 1,353,232
RAC: 4
United States
Message 1959810 - Posted: 11 Oct 2018, 22:36:43 UTC - in response to Message 1959782.  
Last modified: 11 Oct 2018, 22:41:17 UTC

Ah well, it's not as if they're analyzing the tasks we crunch, so there's millions and millions of finished tasks just sitting there for years and years.....
ET may have been found years ago, but I seriously doubt we'll ever know.
Nebula too, like NTPCkr will likely just disappear from the discussions.

So, it's really only the credit hounds that's in dire need of an endless flow of new tasks.
Not for scientific reasons, only for credit reasons :-)


I've thought about this too! With the amount of time, effort, and money that has been put into this project for so long, the data that have been created haven't been put to the best use. I think that it would benefit the project scientifically more to invest resources into Nebula than to buying more GPUs to crunch more work units.

How many people are working on Nebula? Is it just one person in his spare time? Perhaps a team of people could collaborate and troubleshoot problems with Nebula. You guys have been a great resource for everyone here in the forums when it comes to helping getting computers and BOUND issues solved. If even 1% of us users had some programming knowledge, they could offer support. If others would donate half of what a new gpu would cost towards finishing the nebula backend, it would do much more for the project than just produce new data from workunits that aren't being analysed.

I'm not trying to sound negative. I'm very interested in this project and would like to see it succeed. I just think a lot more emphasis needs to be given to nebula, perhaps even via discussions here on this forum.
ID: 1959810 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1959813 - Posted: 11 Oct 2018, 22:57:19 UTC - in response to Message 1959810.  

Unfortunately, we as contributors don't have a explicit say as to where our donations are used. We don't even know whether specific fundraising programs like last years hardware purchase donation event actually gets used to buy hardware or not. The project is not very transparent in that regard. Occasionally we get little tidbits of information about past fundraising. I just hope that my donation last year actually bought hardware for Parkes as it was proposed. Now I suppose we should start a hardware purchase fundraiser for the servers since they are the ones constantly having issues this year.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1959813 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1959835 - Posted: 12 Oct 2018, 4:38:20 UTC

I don't think we have enough files loaded to get us through the weekend. While the seti team is great about giving us more on the weekends, I thought maybe they should load the data on Friday, and hopefully the system will behave and then they can have a weekend without worrying about it.
ID: 1959835 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1959840 - Posted: 12 Oct 2018, 6:20:03 UTC - in response to Message 1959786.  

In short, SETI is showing the NORMAL behavior for itself when there is high demand for WU's...

No, it's not.
Generally after an outage, if there is work in the Ready-to-send buffer a request will result in some work, even after extended outages. It's actually very unusual to not get work when there are WUs ready to do. The days of getting multiple "Project has no tasks available messages" were in the distant past.
Given the present low demand for work due to the long GBT WU runtimes & lack of Arecibo work, there is some other issue causing the Scheduler to not issue work, or to not have work to issue.
It is not normal behaviour.
Grant
Darwin NT
ID: 1959840 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 1959870 - Posted: 12 Oct 2018, 11:41:01 UTC

Hardware changes made more demands of tasks ...

Sever still use old behavior of considered " normal user with normal crunching machine "

new hardware with old rules won't work !

It's time for a Major Upgrade ! nah ! that's all

but does the minds have changed too ?

we have the money ... some or more , but , who makes the decisions ?

the same old minded ones ?

it's time to do a kick in the anthill ..

Sorry for any convenience this may cause, but this must be done ! not to evolve is to disappear indeed
ID: 1959870 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1959873 - Posted: 12 Oct 2018, 12:15:23 UTC

The Server had been working fine since the last change allowing Arecibo VLARs on GPUs. Recently some problem has developed that causes the Server to lose contact with the RTS and a few other items, obvious when the As Of* times start rising. When that happens tasks are not sent to Hosts and the As Of* times start increasing, this is NOT Normal. If you look right now the As Of* times are a little off, that is not normal either, but, isn't causing problems right now. It will probably cause problems later though, something just isn't right. The other day the Splitters were running when contact was lost, they kept running until contact was restored, and had split enough that the total was around 1.2 million tasks when contact was restored. When you see those impossible Creation Rate numbers it's because when contact is restored and there is a large difference, the numbers have to be large enough to make up for the difference. That's how you can get a 156/sec number. It wasn't really 156/sec, but it had to make up for the difference between 600K and 1200k. This is NOT Normal. Hopefully the problem will be identified and solved soon. Looking at the SSP right now, I see As Of numbers off. I'm expecting the thing to croak at any time.
ID: 1959873 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1959875 - Posted: 12 Oct 2018, 12:29:36 UTC - in response to Message 1959873.  

The SSP isn't updated continuously - there's normally a 10 minute or 20 minute interval between updates.

Also, not every value is updated simultaneously. It's normal to see one or more As of* values between the main page updates. I wouldn't take any notice of an As of* less than 20 minutes - and at the moment, the largest is 6 minutes.
ID: 1959875 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1959876 - Posted: 12 Oct 2018, 12:33:17 UTC - in response to Message 1959875.  

The thing will croak later today, probably just after everyone leaves for the weekend...watch.
ID: 1959876 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1959885 - Posted: 12 Oct 2018, 12:49:12 UTC - in response to Message 1959876.  

The thing will croak later today, probably just after everyone leaves for the weekend...watch.

Murphys law could explain that!
ID: 1959885 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1959904 - Posted: 12 Oct 2018, 13:41:17 UTC

something is definitely wrong with the system still. I'd like to hopefully think that it is related to handling the last mess and thus won't happen once things are back to normal, but it is probably something else.

It really bothers me that the throttle on splitting isn't working. If the machine is going to take a break from handing out WUs then the splitters should stop. It really shouldn't be taking a 30 minute break from handing out WUs anyway.

In good news Results returned and awaiting validation is now a healthier 4 million instead of 10 million. It could probably still go lower. Results for db purge is at 5 million and hopefully will lower once other numbers are lower and the system doesn't have to spend time on the backlog cleaning from the last mess.
ID: 1959904 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1959906 - Posted: 12 Oct 2018, 13:59:15 UTC - in response to Message 1959782.  

And back again at "Project has no tasks available "

Ah well, it's not as if they're analyzing the tasks we crunch, so there's millions and millions of finished tasks just sitting there for years and years.....
ET may have been found years ago, but I seriously doubt we'll ever know.
Nebula too, like NTPCkr will likely just disappear from the discussions.

So, it's really only the credit hounds that's in dire need of an endless flow of new tasks.
Not for scientific reasons, only for credit reasons :-)


. . You are so very cynical ...

Stephen

:(
ID: 1959906 · Report as offensive
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1959967 - Posted: 12 Oct 2018, 22:36:13 UTC

It is holding up so far.. wonder if we will survive Saturday and Sunday tho..!
And we've got enough tapes for the weekend now.
ID: 1959967 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1960238 - Posted: 14 Oct 2018, 6:24:22 UTC

Is anybody else noticing a higher than usual number of pending validation results? Nothing looks out of kilter on the SS page. It could just be down to I have a new GPU
ID: 1960238 · Report as offensive
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 37 · Next

Message boards : Number crunching : Panic Mode On (113) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.