The Server Issues / Outages Thread - Panic Mode On! (119)

留言板 : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
留言板合理

To post messages, you must log in.

前 · 1 . . . 105 · 106 · 107 · 108 · 109 · 110 · 后

作者消息
Ville Saari
Avatar

发送消息
已加入:30 Nov 00
贴子:1123
积分:49,177,052
近期平均积分:82,530
Finland
消息 2034433 - 发表于:28 Feb 2020, 18:26:57 UTC - 回复消息 2034377.  

While that's been affirmed, I'm not seeing any improvement on the number of items waiting to be satisfied.
I don't think any real improvement will happen before whatever problem is preventing the assimilator from assimilating has been found and fixed.
ID: 2034433 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2034377 - 发表于:28 Feb 2020, 6:39:53 UTC - 回复消息 2034303.  

I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect
Yes they are, it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM.

Cheers.



While that's been affirmed, I'm not seeing any improvement on the number of items waiting to be satisfied.
ID: 2034377 · 举报违规帖子     回复 引用
Profile Wiggo "Democratic Socialist"
Avatar

发送消息
已加入:24 Jan 00
贴子:18713
积分:261,360,520
近期平均积分:489
Australia
消息 2034362 - 发表于:28 Feb 2020, 4:30:50 UTC - 回复消息 2034307.  
最近的修改日期:28 Feb 2020, 4:31:09 UTC

...it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM.
It's actually right in the News forum so visible in the BOINC client. (Edit: of course it was there yesterday and now that I post this it's disappeared lol.)
Thanks Mr Kevvy, I was suddenly very pressed for time just as I started to make that post. :-)

Cheers.
ID: 2034362 · 举报违规帖子     回复 引用
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
志愿者负责人
志愿者测试人员
Avatar

发送消息
已加入:15 May 99
贴子:3202
积分:1,114,826,392
近期平均积分:3,319
Canada
消息 2034307 - 发表于:27 Feb 2020, 23:12:35 UTC - 回复消息 2034303.  
最近的修改日期:27 Feb 2020, 23:22:50 UTC

...it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM.


It's actually right in the News forum so visible in the BOINC client. (Edit: of course it was there yesterday and now that I post this it's disappeared lol.)
ID: 2034307 · 举报违规帖子     回复 引用
Profile Wiggo "Democratic Socialist"
Avatar

发送消息
已加入:24 Jan 00
贴子:18713
积分:261,360,520
近期平均积分:489
Australia
消息 2034303 - 发表于:27 Feb 2020, 23:04:48 UTC - 回复消息 2034295.  
最近的修改日期:27 Feb 2020, 23:05:59 UTC

I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect
Yes they are, it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM.

Cheers.
ID: 2034303 · 举报违规帖子     回复 引用
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者负责人
志愿者测试人员

发送消息
已加入:7 Mar 03
贴子:18752
积分:416,307,556
近期平均积分:380
United Kingdom
消息 2034295 - 发表于:27 Feb 2020, 22:33:04 UTC

I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2034295 · 举报违规帖子     回复 引用
Profile Jimbocous Project Donor
志愿者测试人员
Avatar

发送消息
已加入:1 Apr 13
贴子:1849
积分:268,616,081
近期平均积分:1,349
United States
消息 2034292 - 发表于:27 Feb 2020, 22:28:05 UTC - 回复消息 2034169.  
最近的修改日期:27 Feb 2020, 22:35:36 UTC

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.
Maybe related, but I think it is more to do with how much work the host requests.
When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks.
That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing.

Sounds like a reality, not "an illusion". Main cruncher cache here is around >25% ~50%. [my error] No heartburn, work is getting assigned and completed, but it seems clear that there's more than "first come, first served" going on.
ID: 2034292 · 举报违规帖子     回复 引用
AllgoodGuy

发送消息
已加入:29 May 01
贴子:293
积分:16,348,499
近期平均积分:266
United States
消息 2034224 - 发表于:27 Feb 2020, 18:48:43 UTC - 回复消息 2034169.  

Validation Pending still steadily growing, looks like around 23 million objects waiting to be satisfied. Still getting work though, despite the RTS showing a pretty steady 0. I even fell asleep in the wrong configuration night before last to decrease my Pending column below normal average, but I'm well over that again. This poor system needs a break.
ID: 2034224 · 举报违规帖子     回复 引用
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14141
积分:200,643,578
近期平均积分:874
United Kingdom
消息 2034169 - 发表于:27 Feb 2020, 8:32:03 UTC - 回复消息 2034167.  

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.
Maybe related, but I think it is more to do with how much work the host requests.
When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks.
That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing.
ID: 2034169 · 举报违规帖子     回复 引用
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13873
积分:40,757,560
近期平均积分:67
United Kingdom
消息 2034167 - 发表于:27 Feb 2020, 8:17:43 UTC - 回复消息 2034131.  
最近的修改日期:27 Feb 2020, 8:25:22 UTC

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.

Maybe related, but I think it is more to do with how much work the host requests.
When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks.

Also, here are some numbers on tasks downloaded and validated in the 24 hrs since 08:06:31 26th Feb, ~24hours ago.
After 12 hours at ~20:00 26th
Downloaded - 345; In Progress 150; Valid 86
Processed = 345 - 150 = 195
Percentage of tasks downloaded and Validated in 12 hours = 100 * 86 / 195 = 44.1%

After 4 hours at ~08:00 27th
Downloaded - 523; In Progress 150; Valid 253
Processed = 523 - 150 = 373
Percentage of tasks downloaded and Validated in 24 hours = 100 * 253 / 373 = 67.8%

I only crunch on the GPU so it is fairly simple just to scroll through the pages and count each page then add up the page numbers.

edit] Prior to 08:06 yesterday the Seti cache was empty.
ID: 2034167 · 举报违规帖子     回复 引用
Ville Saari
Avatar

发送消息
已加入:30 Nov 00
贴子:1123
积分:49,177,052
近期平均积分:82,530
Finland
消息 2034146 - 发表于:27 Feb 2020, 3:23:54 UTC - 回复消息 2034141.  

Could it be that the assimilation process is the problem.
There has clearly been some problem in assimilation for the last several weeks, but the problem can be in many different places. It could be the throughput of the boinc database that somehow hits the assimilator harder than the other processes. Or it could be a problem in the assimilator program itself. Or it can be the throughput of the science databases. Or the throughput of the upload filesystem where the result files the assimilator needs to read are.
ID: 2034146 · 举报违规帖子     回复 引用
Ville Saari
Avatar

发送消息
已加入:30 Nov 00
贴子:1123
积分:49,177,052
近期平均积分:82,530
Finland
消息 2034144 - 发表于:27 Feb 2020, 3:09:55 UTC - 回复消息 2034131.  

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.
It is an illusion. Everyone has the same priority but higher your RAC, the more successful scheduler request you need to keep your cache not depleting.

If every 12th request wins the lottery and gets some work, then you get some work once every hour and this may be all that a slow host needs to refill its cache to the brim but nowhere near the one hour production of a fast host.
ID: 2034144 · 举报违规帖子     回复 引用
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13873
积分:40,757,560
近期平均积分:67
United Kingdom
消息 2034141 - 发表于:27 Feb 2020, 2:54:08 UTC

Could it be that the assimilation process is the problem.

How difficult is it to translate the data we produce and all the other details necessary and put onto the science database.

this is what the Server Status page says;
sah_assimilator/ap_assimilator : Takes scientific data from validated results and puts them in the SETI@home (or Astropulse) database for later analysis.
ID: 2034141 · 举报违规帖子     回复 引用
Profile petri33
志愿者测试人员

发送消息
已加入:6 Jun 02
贴子:1668
积分:623,086,772
近期平均积分:156
Finland
消息 2034135 - 发表于:27 Feb 2020, 2:20:40 UTC - 回复消息 2034131.  

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.


+1
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 2034135 · 举报违规帖子     回复 引用
Profile Jimbocous Project Donor
志愿者测试人员
Avatar

发送消息
已加入:1 Apr 13
贴子:1849
积分:268,616,081
近期平均积分:1,349
United States
消息 2034131 - 发表于:27 Feb 2020, 1:41:16 UTC

I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.
ID: 2034131 · 举报违规帖子     回复 引用
Boiler Paul

发送消息
已加入:4 May 00
贴子:232
积分:4,965,771
近期平均积分:64
United States
消息 2034126 - 发表于:27 Feb 2020, 1:17:51 UTC - 回复消息 2034125.  

and, of course, after I post, I receive work!
ID: 2034126 · 举报违规帖子     回复 引用
Boiler Paul

发送消息
已加入:4 May 00
贴子:232
积分:4,965,771
近期平均积分:64
United States
消息 2034125 - 发表于:27 Feb 2020, 1:12:09 UTC

work can be hard to come by. all I've gotten over the past few hours is the Project has no tasks available in the log. Just need to be patient
ID: 2034125 · 举报违规帖子     回复 引用
TBar
志愿者测试人员

发送消息
已加入:22 May 99
贴子:5204
积分:840,779,836
近期平均积分:2,768
United States
消息 2034123 - 发表于:27 Feb 2020, 1:02:31 UTC - 回复消息 2034116.  

That's what you get for assuming, Since making that post the machine that had been run out of work now has 400 tasks instead of Zero. I had absolutely nothing to do with it.
ID: 2034123 · 举报违规帖子     回复 引用
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13873
积分:40,757,560
近期平均积分:67
United Kingdom
消息 2034116 - 发表于:27 Feb 2020, 0:45:07 UTC - 回复消息 2034108.  

Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479
That leaves 1 machine still working. I suppose when that one runs Out of Work I'll just shut every thing down and brag about how much money I'm saving on electricity.

I can only assume it is your problem. I've had very few problems since 08:00 26th UTC.
ID: 2034116 · 举报违规帖子     回复 引用
TBar
志愿者测试人员

发送消息
已加入:22 May 99
贴子:5204
积分:840,779,836
近期平均积分:2,768
United States
消息 2034108 - 发表于:27 Feb 2020, 0:11:39 UTC

Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479
That leaves 1 machine still working. I suppose when that one runs Out of Work I'll just shut every thing down and brag about how much money I'm saving on electricity.
ID: 2034108 · 举报违规帖子     回复 引用
前 · 1 . . . 105 · 106 · 107 · 108 · 109 · 110 · 后

留言板 : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.