Panic Mode On (105) Server Problems?

Message boards : Number crunching : Panic Mode On (105) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 34 · Next

AuthorMessage
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1857111 - Posted: 22 Mar 2017, 23:44:39 UTC - in response to Message 1857089.  

My machines were getting dribs and drabs up to about an hour ago. Since then all have filled up so I guess the bottleneck is slowly disappearing (Keith, Grant, appreciate that you have issues that others do not appear to be having). My only issue with this outage was that I misjudged its depth and so didn't squirrel enough WUs aside to try and get all the way through without being affected by the "bun fight" at the back end. I wish we could get a 1 or 2 day buffer of WUs, that would be really helpful. As a side note, I also do POGS every now and then, and they run with a standard 5 day buffer.
ID: 1857111 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1857113 - Posted: 22 Mar 2017, 23:55:04 UTC - in response to Message 1857111.  

. I wish we could get a 1 or 2 day buffer of WUs, that would be really helpful.


That would require a change in the current limitation of 100 Work units per GPU/CPU. I run out of work in 7 hours.....
ID: 1857113 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1857116 - Posted: 23 Mar 2017, 0:08:07 UTC

And correct me if I am wrong. They won't do that is because the database would be too big and unmanageable. Right?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1857116 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1857123 - Posted: 23 Mar 2017, 0:20:28 UTC - in response to Message 1857116.  

The current limits were an accident. It really wasn't supposed to be 100 Per GPU and CPU, it ended up that way because of however they coded it. The limit was supposed to be even less. (at least that is how I remember it being told to me)

But if you allowed people to download as much as they want, then there becomes issues with thousands of lost work units (maybe even millions) having to time out if someone decides just not to return the work units.

So I think the idea was to restrict the number to what they thought was a manageable amount...

Lets see what inside some of the other provide..
ID: 1857123 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1857140 - Posted: 23 Mar 2017, 1:32:57 UTC

Both of my machine are back to full after spending most of the day empty.

The back-offs can be murder in a challenge.

ID: 1857140 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1857151 - Posted: 23 Mar 2017, 2:24:28 UTC - in response to Message 1857123.  

The current limits were an accident. It really wasn't supposed to be 100 Per GPU and CPU, it ended up that way because of however they coded it. The limit was supposed to be even less. (at least that is how I remember it being told to me)

But if you allowed people to download as much as they want, then there becomes issues with thousands of lost work units (maybe even millions) having to time out if someone decides just not to return the work units.

So I think the idea was to restrict the number to what they thought was a manageable amount...

Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...
.

Hello, from Albany, CA!...
ID: 1857151 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1857155 - Posted: 23 Mar 2017, 3:00:13 UTC - in response to Message 1857151.  

The current limits were an accident...
Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...

An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ...
ID: 1857155 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11416
Credit: 29,581,041
RAC: 66
United States
Message 1857161 - Posted: 23 Mar 2017, 3:44:09 UTC - in response to Message 1857155.  

When they aren't crunching the electricity usage goes way down. Something to consider is Seti owes Einstein a lot of love for their allowing us to use their Atlas cluster to run Nebula,that would be a very worthwhile project when you are out of work.
ID: 1857161 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1857166 - Posted: 23 Mar 2017, 3:59:41 UTC - in response to Message 1857155.  

The current limits were an accident...
Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...

An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ...

I almost never ran out of CPU work in the past... at least when the outages were less than 10-12 hours. Different situation now with the Ryzen 1700X machine. Man! That machines really chews through the CPU units now. Definitely would appreciate an increase to 200 unit per CPU on that machine. 200 units per GPU would also be appreciated.

My machines never really go idle. When the SETI work dries up, they automatically move on to MilkyWay and Einstein work. Still keeps the CPU sort of busy if only feeding the GPU's.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1857166 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1857169 - Posted: 23 Mar 2017, 5:13:11 UTC - in response to Message 1857151.  

The current limits were an accident. The limit was supposed to be even less. (at least that is how I remember it being told to me)

So I think the idea was to restrict the number to what they thought was a manageable amount...

Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...


. . 200 sounds pretty good to me, two of my three machines can chew through their present allotments in 6 to 8 hours. This is not very comforting as outages take longer and longer each week. But I don't think you will convince the decision makers to go along with that proposal.

Stephen

<shrug>
ID: 1857169 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1857172 - Posted: 23 Mar 2017, 5:18:39 UTC - in response to Message 1857155.  

The current limits were an accident...
Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...

An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ...


. . 8 to 11 hours? You've got it good. This week it was over 30 hours for me. After about 14 hours I could upload results AND report. But I did not regain access to the message boards or any new work until over 30 hours had passed. For me it started about 2am yesterday AEDT and did not finally end (return to some sort of normalcy) until approx 9am today AEDT. Very disheartening.

Stephen

:(
ID: 1857172 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1857173 - Posted: 23 Mar 2017, 5:25:19 UTC - in response to Message 1857155.  

The current limits were an accident...
Lets see what inside some of the other provide..


Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours...

An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week.

And bump it up to 1,000 per GPUs to get through these extra long extended outages.


Unfortunately that would probably make the outages longer as the database would require more maintenance.
Maybe it's time for some new to meet the demands?
Grant
Darwin NT
ID: 1857173 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1857174 - Posted: 23 Mar 2017, 5:26:07 UTC

BTW- WTF happened?
Grant
Darwin NT
ID: 1857174 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1857175 - Posted: 23 Mar 2017, 5:33:11 UTC - in response to Message 1857161.  
Last modified: 23 Mar 2017, 5:39:05 UTC

When they aren't crunching the electricity usage goes way down. Something to consider is Seti owes Einstein a lot of love for their allowing us to use their Atlas cluster to run Nebula,that would be a very worthwhile project when you are out of work.


. . Yep the power consumption goes down that's true, but two or three machines sitting idle while still chewing up 100W to 200W of electrical power each for anything from 4 to 12 hours or sometimes longer it is a pointless expenditure and waste.

. . Having said that I had set up E@H as my backup project for when I run out of work from SETI. It took a while to get it right but I did and that was under windows. After moving two machines to Linux last week that was gone and I had neglected to redo it before this weeks outage, so I was without work for many hours. When I tried to set it up again I could not remember my password (getting old sucks) so I had to sort all that out and that took a few hours too. But I now have that working under Linux and can feel at ease again, knowing that when I fail to get work from SETI the machines can still be productive.

Stephen

:)
ID: 1857175 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1857176 - Posted: 23 Mar 2017, 5:33:47 UTC

Just had a look in my manager's Event log, getting the odd Scheduler error: Couldn't connect to server.
When it's ok the response is nice and quick- within 3 seconds.
Grant
Darwin NT
ID: 1857176 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1857177 - Posted: 23 Mar 2017, 5:37:33 UTC - in response to Message 1857174.  

BTW- WTF happened?


. . The shadow knows!

. . But no-one else ...

Stephen

:(
ID: 1857177 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1857182 - Posted: 23 Mar 2017, 6:19:19 UTC - in response to Message 1857172.  

. . 8 to 11 hours? You've got it good. This week it was over 30 hours for me. After about 14 hours I could upload results AND report. But I did not regain access to the message boards or any new work until over 30 hours had passed. For me it started about 2am yesterday AEDT and did not finally end (return to some sort of normalcy) until approx 9am today AEDT. Very disheartening.

Stephen

:(

Totally different issues, I think. And yes, this week I saw the same ~30 hrs you did, and though I did manage to fill most caches right after maintenance and before the crash, I was again out of work in a few more hours.

It's reasonable to ask that there be enough work cached to allow people to remain productive during scheduled maintenance events, with enough cushion to support the oddball overtime of that event.

It's a very different discussion to talk about what happens when a mess-up like the other day occurs. Not reasonable to have enough cached to get through something like that. BUT, I do think it's only fair to ask that when an issue like that occurs we are given an explanation and that steps are taken to minimize recurrence.

Resources are always an issue, sure, but I don't think that enough consideration is given to keeping the systems up and running. For a while I was optimistic that some efforts were being made to improve communication with "the masses". Not so sure, now, and there's no excuse for that.
ID: 1857182 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22535
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1857184 - Posted: 23 Mar 2017, 6:41:20 UTC

The project has NEVER promised to keep us fed with work, they have ALWAYS told us to have standby projects to fall back onto in the event of (extended) outages.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1857184 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1857185 - Posted: 23 Mar 2017, 6:55:28 UTC - in response to Message 1857184.  

The project has NEVER promised to keep us fed with work, they have ALWAYS told us to have standby projects to fall back onto in the event of (extended) outages.

But it would be nice if it were possible to carry at least 24 hours of work, even 48. The option is there in the settings- it would be appreciated if it were possible.
Why not raise the server side limits, but restrict all users to a maximum of 2 days cache, leaving more work available to faster crunchers to see them through these outages, but still limiting the load on the database?

If there is no data available to crunch, then there's no work.
But If there is work available, but the project has issues, it would be nice to be able to continue processing it.

It has been said that there will (eventually) be way more work than the present user base can process in a reasonable time, and more crunchers are needed. But it doesn't matter how many crunchers there are, if they can't process the work. And even if the servers spend most of a week down, if in their uptime people are able to get enough work to tide them through to the next uptime, people will continue to crunch.
Grant
Darwin NT
ID: 1857185 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22535
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1857189 - Posted: 23 Mar 2017, 8:06:59 UTC

There was an announcement around a bout Christmas that they were running out of the first batch of BLP data, and were reprocessing selected Arecibo data using the latest apps. It would appear that some new BLP data has arrived, but who knows how much.

BTW Chris - There is an easier way to trigger back-up projects - set SETI to priority to 1000 and the other projects to zero, this way SETI will get work if it is available, and the other projects will only deliver when you have run out of SETI work. If you use the "locations" you can have have sets of projects with different priorities.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1857189 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 34 · Next

Message boards : Number crunching : Panic Mode On (105) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.