The Server Issues / Outages Thread - Panic Mode On! (119)

留言板 : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
留言板合理

To post messages, you must log in.

前 · 1 . . . 97 · 98 · 99 · 100 · 101 · 102 · 103 . . . 110 · 后

作者消息
TBar
志愿者测试人员

发送消息
已加入:22 May 99
贴子:5204
积分:840,779,836
近期平均积分:2,768
United States
消息 2036409 - 发表于:7 Mar 2020, 2:36:15 UTC

Has anyone considered there may be something about those 14 million results that the system just doesn't care for? If I were looking over some of those results I believe I would balk at those tasks which validated with a Quorum of 1. It would be nice if those tasks could be extracted and examined, even better if they could be temporarily removed and the system run without them to see how the system worked. They certainly don't seem to be in a hurry to leave by themselves.
ID: 2036409 · 举报违规帖子     回复 引用
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13012
积分:208,696,464
近期平均积分:304
Australia
消息 2036405 - 发表于:7 Mar 2020, 2:24:10 UTC - 回复消息 2036399.  

He was referring to this one:
Yep.
You can see a sharp drop in the Results returned and awaiting validation numbers. Unfortunately, it was only a drop in the ocean.
Grant
Darwin NT
ID: 2036405 · 举报违规帖子     回复 引用
Speedy
志愿者测试人员
Avatar

发送消息
已加入:26 Jun 04
贴子:1590
积分:12,921,799
近期平均积分:89
New Zealand
消息 2036404 - 发表于:7 Mar 2020, 2:21:56 UTC - 回复消息 2036399.  

Thanks Keith I thought that's the one he was referring to
ID: 2036404 · 举报违规帖子     回复 引用
Profile Keith Myers Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:29 Apr 01
贴子:11776
积分:1,160,866,277
近期平均积分:1,873
United States
消息 2036399 - 发表于:7 Mar 2020, 2:17:09 UTC - 回复消息 2036395.  

He was referring to this one:

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036399 · 举报违规帖子     回复 引用
Speedy
志愿者测试人员
Avatar

发送消息
已加入:26 Jun 04
贴子:1590
积分:12,921,799
近期平均积分:89
New Zealand
消息 2036395 - 发表于:7 Mar 2020, 2:05:12 UTC - 回复消息 2036390.  

I think that Eric's script to clear out the validator queue a bit helped for a while.
Looking at the graphs it sorted out a few 100 thousand results.
The huge backlog waiting on results to be returned remains, it just had a bit shaved off the top.

Hi Grant, are you referring to the results in progress and validation graph?
ID: 2036395 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2036394 - 发表于:7 Mar 2020, 1:59:37 UTC - 回复消息 2036384.  
最近的修改日期:7 Mar 2020, 2:10:22 UTC

The splitters were briefly turned on and the RTS buffer filled to normal levels. But that was sucked dry in about 30 minutes from all the empty hosts. Now back to no work but hard to verify since the replica is so far behind now.

I think that Eric's script to clear out the validator queue a bit helped for a while.



06-Mar-2020 14:19:39 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:24:47 [SETI@home] Scheduler request completed: got 65 new tasks
06-Mar-2020 14:29:52 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:34:59 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 14:40:07 [SETI@home] Scheduler request completed: got 51 new tasks
06-Mar-2020 14:45:16 [SETI@home] Scheduler request completed: got 7 new tasks
06-Mar-2020 14:50:23 [SETI@home] Scheduler request completed: got 26 new tasks
06-Mar-2020 15:07:56 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 15:33:30 [SETI@home] Scheduler request completed: got 15 new tasks
06-Mar-2020 15:38:37 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:43:46 [SETI@home] Scheduler request completed: got 5 new tasks
06-Mar-2020 15:48:53 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:54:05 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 15:59:12 [SETI@home] Scheduler request completed: got 1 new tasks
06-Mar-2020 16:04:19 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:09:26 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:14:33 [SETI@home] Scheduler request completed: got 2 new tasks
06-Mar-2020 16:19:40 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 16:24:49 [SETI@home] Scheduler request completed: got 5 new tasks
06-Mar-2020 16:29:58 [SETI@home] Scheduler request completed: got 0 new tasks
06-Mar-2020 16:35:05 [SETI@home] Scheduler request completed: got 3 new tasks
06-Mar-2020 16:40:13 [SETI@home] Scheduler request completed: got 0 new tasks

It lasted a little while. Nothing since 00:40 UTC. Edit* and the Replica is only 104 minutes behind now. What's an hour and a half between friends?
ID: 2036394 · 举报违规帖子     回复 引用
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13012
积分:208,696,464
近期平均积分:304
Australia
消息 2036390 - 发表于:7 Mar 2020, 1:11:01 UTC - 回复消息 2036384.  

I think that Eric's script to clear out the validator queue a bit helped for a while.
Looking at the graphs it sorted out a few 100 thousand results.
The huge backlog waiting on results to be returned remains, it just had a bit shaved off the top.
Grant
Darwin NT
ID: 2036390 · 举报违规帖子     回复 引用
Profile Keith Myers Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:29 Apr 01
贴子:11776
积分:1,160,866,277
近期平均积分:1,873
United States
消息 2036384 - 发表于:7 Mar 2020, 0:49:48 UTC

The splitters were briefly turned on and the RTS buffer filled to normal levels. But that was sucked dry in about 30 minutes from all the empty hosts. Now back to no work but hard to verify since the replica is so far behind now.

I think that Eric's script to clear out the validator queue a bit helped for a while.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036384 · 举报违规帖子     回复 引用
Profile TimeLord04
志愿者测试人员
Avatar

发送消息
已加入:9 Mar 06
贴子:20328
积分:33,933,039
近期平均积分:23
United States
消息 2036375 - 发表于:7 Mar 2020, 0:24:06 UTC

Don't be angry! Somehow, I just got a FULL queue of 150 GPU Tasks PER Card, x2.

Looking forward to Crunching tonight! 😀

Haven't had work since Tuesday's Outrage! 😱😱😱
Was beginning to think I wouldn't have work
all the rest of the month!

Got lucky on two Request passes on the Server
and filled the queue to capacity just a few minutes
ago.

Crunching will begin, again, on Hackintosh-Andromeda
just about an hour and a half, at 6 PM - Pacific.
With the CUDA90 App, (v0.97), from TBar, I think this
queue will last a few hours. Then, it's anybody's guess
IF the Servers will keep me working throughout the rest
of the night.

Good Luck to everyone.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 2036375 · 举报违规帖子     回复 引用
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13012
积分:208,696,464
近期平均积分:304
Australia
消息 2036365 - 发表于:6 Mar 2020, 22:43:41 UTC - 回复消息 2036361.  

Current result creation rate * 403.7581/sec
Now the Scheduler needs to start dishing them out.
Most requests result in "Project has no tasks available". And as we've found over the last couple of months, after outage recoveries are rather protracted these days.
Grant
Darwin NT
ID: 2036365 · 举报违规帖子     回复 引用
Profile Keith Myers Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:29 Apr 01
贴子:11776
积分:1,160,866,277
近期平均积分:1,873
United States
消息 2036363 - 发表于:6 Mar 2020, 22:34:11 UTC

I was able to persuade a task to download one at at time. Think the download server is unable to support all the http requests. Might be smart to reduce the number of connections back to the default of 2.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2036363 · 举报违规帖子     回复 引用
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:9 Jul 00
贴子:50494
积分:1,018,363,574
近期平均积分:1,004
United States
消息 2036361 - 发表于:6 Mar 2020, 22:24:56 UTC
最近的修改日期:6 Mar 2020, 22:26:18 UTC

Current result creation rate * 403.7581/sec
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 2036361 · 举报违规帖子     回复 引用
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13012
积分:208,696,464
近期平均积分:304
Australia
消息 2036359 - 发表于:6 Mar 2020, 22:19:32 UTC

Managed to pickup a couple of WUs, but downloads presently not happening without a lot of Retry pending transfers action.
Grant
Darwin NT
ID: 2036359 · 举报违规帖子     回复 引用
Ville Saari
Avatar

发送消息
已加入:30 Nov 00
贴子:1123
积分:49,177,052
近期平均积分:82,530
Finland
消息 2036344 - 发表于:6 Mar 2020, 21:36:50 UTC

Changing the deadlines has no effect whatsoever to the assimilation blockage.
ID: 2036344 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2036342 - 发表于:6 Mar 2020, 21:15:58 UTC - 回复消息 2036340.  

It's time to crunch or shut down.
ID: 2036342 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2036340 - 发表于:6 Mar 2020, 21:07:28 UTC - 回复消息 2036337.  

And I did send word to Eric about the servers being tied in a knot.
Whether there is much he can do about it is an open question at this point.
Make use of the Resend Deadline feature- set the deadline for resends to 3 days. Set the deadline for any new work (AP included) to 2 weeks.
The short deadline on Resends will clear out the ever increasing massive backlog (although i'm guessing it will take a week or so to have a significant impact). The 2 week deadline on all initial release work will stop the backlog from re-occuing in the short time the project is stll going to be issuing new work.


At this point, reduce it to 10 days. We are at the finish line.


We could reduce it to 5 in another week.
ID: 2036340 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2036337 - 发表于:6 Mar 2020, 21:04:21 UTC - 回复消息 2036335.  
最近的修改日期:6 Mar 2020, 21:04:51 UTC

And I did send word to Eric about the servers being tied in a knot.
Whether there is much he can do about it is an open question at this point.
Make use of the Resend Deadline feature- set the deadline for resends to 3 days. Set the deadline for any new work (AP included) to 2 weeks.
The short deadline on Resends will clear out the ever increasing massive backlog (although i'm guessing it will take a week or so to have a significant impact). The 2 week deadline on all initial release work will stop the backlog from re-occuing in the short time the project is stll going to be issuing new work.


At this point, reduce it to 10 days. We are at the finish line.
ID: 2036337 · 举报违规帖子     回复 引用
Grant (SSSF)
志愿者测试人员

发送消息
已加入:19 Aug 99
贴子:13012
积分:208,696,464
近期平均积分:304
Australia
消息 2036335 - 发表于:6 Mar 2020, 21:00:56 UTC - 回复消息 2036292.  
最近的修改日期:6 Mar 2020, 21:02:59 UTC

And I did send word to Eric about the servers being tied in a knot.
Whether there is much he can do about it is an open question at this point.
Make use of the Resend Deadline feature- set the deadline for resends to 3 days. Set the deadline for any new work (AP included) to 2 weeks.
The short deadline on Resends will clear out the ever increasing massive backlog (although i'm guessing it will take a week or so to have a significant impact). The 2 week deadline on all initial release work will stop the backlog from re-occuing in the short time the project is stll going to be issuing new work.
Grant
Darwin NT
ID: 2036335 · 举报违规帖子     回复 引用
Ian&Steve C.
Avatar

发送消息
已加入:28 Sep 99
贴子:3158
积分:1,282,604,591
近期平均积分:6,640
United States
消息 2036334 - 发表于:6 Mar 2020, 21:00:36 UTC - 回复消息 2036333.  

The guys in the lab are working on it. Hopefully it can come back in working order as a result
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2036334 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2036333 - 发表于:6 Mar 2020, 20:57:11 UTC

The Replica DB is now 65 minutes behind.
ID: 2036333 · 举报违规帖子     回复 引用
前 · 1 . . . 97 · 98 · 99 · 100 · 101 · 102 · 103 . . . 110 · 后

留言板 : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.