Panic Mode On (105) Server Problems?

Message boards : Number crunching : Panic Mode On (105) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 34 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1852440 - Posted: 3 Mar 2017, 1:35:47 UTC

Patience rewarded. Caches are filling back up again after the toggle.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1852440 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1852458 - Posted: 3 Mar 2017, 4:14:49 UTC - in response to Message 1852440.  

Patience rewarded. Caches are filling back up again after the toggle.

Same here.
Grant
Darwin NT
ID: 1852458 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36569
Credit: 261,360,520
RAC: 489
Australia
Message 1852460 - Posted: 3 Mar 2017, 4:28:30 UTC - in response to Message 1852440.  

Patience rewarded. Caches are filling back up again after the toggle.

My pendings are shrinking while my current valids have gone through the roof.

Cheers.
ID: 1852460 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1852461 - Posted: 3 Mar 2017, 4:42:54 UTC - in response to Message 1852460.  

My pendings are shrinking while my current valids have gone through the roof.

Likewise.
I'm expecting another boost in about 10 days as another batch of outstanding work gets re-issued.
Grant
Darwin NT
ID: 1852461 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852471 - Posted: 3 Mar 2017, 6:36:24 UTC

What can I say?
Once again, the kitties have had no trouble getting their caches full, with no meowing around with settings.
I still have no clue why some of you are having such difficulties.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852471 · Report as offensive
Profile UniMatrixZ
Avatar

Send message
Joined: 2 Feb 01
Posts: 102
Credit: 30,826,065
RAC: 3
Sweden
Message 1852478 - Posted: 3 Mar 2017, 7:00:29 UTC

My linux machine was having trouble this morning getting the Project has no tasks available.
It was down to 11 GPU WU then all the sudden boom cache full.
Strange problem!

"SETI is probably the most important quest of our time,
and it amazes me that governments and corporations
are not supporting it sufficiently."- Arthur C. Clarke 2006
ID: 1852478 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1852514 - Posted: 3 Mar 2017, 9:58:17 UTC

News about the full-Amazon outage of last Tuesday: it was a big one.
At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

Removing a significant portion of the capacity caused each of these systems to require a full restart. While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.

I wonder what the team member will have to talk about at his/her next review, if he/she's still working there. :)
ID: 1852514 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1852652 - Posted: 3 Mar 2017, 22:21:29 UTC

Hoping some new tapes will get tossed towards the splitters before today is done, else it will be a long cold weekend...
Last BLC is about done.
ID: 1852652 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852702 - Posted: 4 Mar 2017, 2:03:30 UTC

Heads up messages sent to Eric and Jeff..................
All I can do except crunch.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852702 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852703 - Posted: 4 Mar 2017, 2:08:01 UTC

Jeff says that new data should be appearing over the next few hours.
So, there should be no panic.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852703 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1852706 - Posted: 4 Mar 2017, 2:13:51 UTC

Well, there's a problem with the tasks named 14dc10ab.26018.24607.5.32.*. I have had several that run to 0.003% and then seemingly get stuck on my new RX470. They crash after 4 minutes, or immediately when I exit & restart BOINC.
ID: 1852706 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852709 - Posted: 4 Mar 2017, 2:23:25 UTC - in response to Message 1852706.  

Well, there's a problem with the tasks named 14dc10ab.26018.24607.5.32.*. I have had several that run to 0.003% and then seemingly get stuck on my new RX470. They crash after 4 minutes, or immediately when I exit & restart BOINC.

Maybe run nVidia?
I am sorry if I am not really able to diagnose with all NV cards online.
Maybe somebody else can step in.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852709 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1852714 - Posted: 4 Mar 2017, 2:42:01 UTC - in response to Message 1852706.  
Last modified: 4 Mar 2017, 2:57:54 UTC

Well, there's a problem with the tasks named 14dc10ab.26018.24607.5.32.*. I have had several that run to 0.003% and then seemingly get stuck on my new RX470. They crash after 4 minutes, or immediately when I exit & restart BOINC.

Just forced my Manager to run them, SoG and GTX 1070s. 5min 5-10sec run time.
Edit- 5min 2-15sec outliers.

How long are they running to get to the 0.0003% point? Generally the first 15-20 seconds on my system are the CPU setting up the WU and the percentage done stays at 0%, then it starts counting up as the GPU starts processing.
It looks like yours are failing at the point the GPU starts to crunch. What application are you running? Have you tried a re-boot? Any recent driver changes?
Grant
Darwin NT
ID: 1852714 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852716 - Posted: 4 Mar 2017, 2:53:31 UTC

I run anything the servers send.
And do not cry about if some things do not get me creds at the same rate as others.
There is a scientific point for every WU the project sends out, and that is what I signed up for.
The rest of you should get on board and stop whining.

I am currently running 8 computers, each with at least 2 GPUs on board.
And I spend a lot of time on the boards.
So IF the servers are coughing up furballs, I am very commonly the first one to notice it, and advise the authorities.

If the project is not sending out work to everbody that requests it, I step in quickly.
And usually within a short time of my notifying them, the admins step in and fix it.

This is a volunteer project. Nobody ever promised all work, all creds, all the time.
And as long as the servers are up, I am rather tired of the vomiting here at times.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852716 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1852717 - Posted: 4 Mar 2017, 2:56:21 UTC - in response to Message 1852703.  

Thanks Mark for being the liaison between the NC forums members and the scientists. Looks like Jeff is on top of the issue.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1852717 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1852719 - Posted: 4 Mar 2017, 3:07:33 UTC - in response to Message 1852716.  

I am rather tired of the vomiting here at times.

And some of us are rather tired about you abusing people who are just pointing out an issue, no whining or bleating involved.

Since late December, there have been issues with getting work, even when there is plenty available. Sometimes it's not as bad as others, and other times it's a major hassle, so we point it out.
There are times the servers are up, but the web site, forums & Scheduler have been either missing in action, or extremely slow to respond, so we point it out.
We were running low on work to split, so someone pointed that out.
Someone is having uses with some work units, and mentioned it.

It's not whinging, whining, bleating or vomiting. It's just pointing out an issue.
But if you really want someone to start carrying on, then keep abusing those that are just pointing out issues and i'm sure they'll give you something to really complain about. Or get the issues fixed, then then there will be no need to point them out in the first place.
I'm in favour of the second option.
Grant
Darwin NT
ID: 1852719 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852720 - Posted: 4 Mar 2017, 3:08:39 UTC - in response to Message 1852717.  
Last modified: 4 Mar 2017, 3:12:40 UTC

Thanks Mark for being the liaison between the NC forums members and the scientists. Looks like Jeff is on top of the issue.

The software gets tangled up at times.
Did any one of you ever have to reboot a computer to solve a problem?
I shall bet you have.

The Seti servers are not Google level servers. The project does not have that kind of money, and as much as I wish, I do not have that much to donate to the project.
The Seti project does not have Google level backups, though they deserve it.

The project is run on some simple rack level servers in a remote location.
They are prone to errors like my own are.
EXCEPT.............
When they have a minute of down time, half the world notices it.
Nobody notices if the kitties are down. BTW, I have two rigs down at the moment, and an not in a big hurry to pick them back up.

So, the moral of the story is this.............
We need a new Panic Mode thread, please.
And everybody should not use it unless the project is in danger mode.

There are other threads, and you can even create your own if you think your situation requires individual attention.

Could you PLEASE stop using the 'Panic Mode' thread for any other purposes?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852720 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1852721 - Posted: 4 Mar 2017, 3:19:30 UTC - in response to Message 1852720.  

We need a new Panic Mode thread, please.
And everybody should not use it unless the project is in danger mode.

No, we don't need a new Panic Mode thread, and it is not for when the project is in danger mode.
It's for when there are issues with the project, for reporting them and discussing them.

About the only off topic posts here are about the Amazon outage, you complaining about people mentioning project issues (in the very thread that was started for them), Keith thanking you, and me responding to you.
4 out of 59 posts is bugger all in my book.
Grant
Darwin NT
ID: 1852721 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1852724 - Posted: 4 Mar 2017, 3:21:39 UTC - in response to Message 1852720.  


The software gets tangled up at times.

What Grant, I, and others have pointed out ..... is that the software is tangled up ALL the time. Ever since December when the project put in the fix for the 8.22 ATI app users. That screwed up things for the Nvidia users. What I find infuriating is the project not acknowledging that there is an issue.

As Grant eloquently wrote, that is not whining or whinging. It is simply pointing out there is an problem.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1852724 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1852728 - Posted: 4 Mar 2017, 3:38:14 UTC
Last modified: 4 Mar 2017, 3:41:25 UTC

I shall never report a problem to Eric again.
You can get your own avenues.
As I am one of the top users on Seti, I have a path through his many filters.
Good luck.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1852728 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 34 · Next

Message boards : Number crunching : Panic Mode On (105) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.