Panic Mode On (106) Server Problems?

Message boards : Number crunching : Panic Mode On (106) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 29 · Next

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1868180 - Posted: 19 May 2017, 16:40:00 UTC

The project has no tasks?
I'm starving.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1868180 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1868181 - Posted: 19 May 2017, 16:41:19 UTC - in response to Message 1868163.  

[quote]And cats manage the workload ...:)
Alas, my cats have very few management skills. They may be angels with fur, but manage only themselves.. ;-)

Well, at least the one you have posted as an avatar always manages to give me a smile


Great to hear that she's good for something, besides being a drama queen ! That's my Zarah... a real doll, and a champion to boot.
Humans may rule the world...but bacteria run it...
ID: 1868181 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868185 - Posted: 19 May 2017, 16:55:52 UTC - in response to Message 1868174.  

You really have to wonder just what besides availability of tasks the scheduler looks at before delivering a "no tasks available" response?
Gotta be something along the lines of
With the lag the server was showing validating and assimilating, it makes me think this outage was slow database related.
ID: 1868185 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1868188 - Posted: 19 May 2017, 17:07:58 UTC - in response to Message 1868185.  
Last modified: 19 May 2017, 17:08:46 UTC

Gotta be something along the lines of With the lag the server was showing validating and assimilating, it makes me think this outage was slow database related.


I also note there are four BLC splitter jobs that haven't progressed in several hours, but then again there are hundreds of thousand of work units in the to send queue.

Whatever it is, I wrote to Dr. K... not that I have any special e-mail priority :^) But perhaps it will get fixed. I notice that my larger machines finally got some work though not nearly as much as a full queue so it is improving probably as other machine's queues fill albeit slower than they should.

Nothing worse SETI-wise than a work stoppage on a Friday and facing a weekend of machines sitting idle.
ID: 1868188 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1868195 - Posted: 19 May 2017, 17:39:57 UTC - in response to Message 1868180.  

The project has no tasks?
I'm starving.


Sorry,
seems to have been a temporary hickup.

My mobile operator has enabled the latest speeds. The connection is not reliable yet. It is fast. That I must admit.

Last Result:
Download Speed: 36518 kbps (4564.8 KB/sec transfer rate)
Upload Speed: 31664 kbps (3958 KB/sec transfer rate)
Latency: 38 ms
Jitter: 2 ms
19.5.2017 at 20.37.01


And that's over a mobile connection with a cable to the Sierra Wireless home station. I'm living in a small town.

All fine now.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1868195 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868202 - Posted: 19 May 2017, 18:02:18 UTC - in response to Message 1868180.  

The project has no tasks?
I'm starving.


. . I was but the caches have filled up finally.

:)
ID: 1868202 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868230 - Posted: 19 May 2017, 20:35:18 UTC

The servers recovered quite nicely from this outage from what I can tell.
ID: 1868230 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868237 - Posted: 19 May 2017, 20:51:30 UTC - in response to Message 1868230.  

The servers recovered quite nicely from this outage from what I can tell.

I still have one laggard that isn't getting anything for some reason. Did the flip-flip several time to no effect. The other two computers have recovered from the shortage mostly. Again, the Windows 10 computer makes the fastest recovery for some reason. And it still is getting a primary diet of Arecibo standard angle range tasks for the CPU. Which I quickly reschedule to the GPUs and swap the BLC tasks they seem to primarily get. Nobody has convinced me that the servers don't see the Ryzen system as something completely different and therefore gets a different mix of tasks compared to the Windows 7 FX systems. This has gone on for 2 months now consistently. Random probability must be in the 5 nines now with the mix of Arecibo shorties or standard AR's being sent to the CPU exclusively.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868237 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1868251 - Posted: 19 May 2017, 21:51:14 UTC - in response to Message 1868237.  
Last modified: 19 May 2017, 22:05:25 UTC

I still have one laggard that isn't getting anything for some reason. Did the flip-flip several time to no effect. The other two computers have recovered from the shortage mostly. Again, the Windows 10 computer makes the fastest recovery for some reason. And it still is getting a primary diet of Arecibo standard angle range tasks for the CPU. Which I quickly reschedule to the GPUs and swap the BLC tasks they seem to primarily get. Nobody has convinced me that the servers don't see the Ryzen system as something completely different and therefore gets a different mix of tasks compared to the Windows 7 FX systems. This has gone on for 2 months now consistently. Random probability must be in the 5 nines now with the mix of Arecibo shorties or standard AR's being sent to the CPU exclusively.

My i7 system has refilled with 90% GBT WUs, my C2D is mostly Arecibo.
The i7 still hasn't quite managed to get a full cache. Going to start flipping application settings again. Would be nice if they could fix that particular issue.

EDIT- Changed application preferences and cache filled.
Grant
Darwin NT
ID: 1868251 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1868256 - Posted: 19 May 2017, 21:54:53 UTC - in response to Message 1868188.  

I also note there are four BLC splitter jobs that haven't progressed in several hours,

More like a couple of weeks.
Grant
Darwin NT
ID: 1868256 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1868265 - Posted: 19 May 2017, 22:38:40 UTC
Last modified: 19 May 2017, 22:39:03 UTC

Surprisingly... I got a reply: unit production is up, but return time is way down. Conclusion: there was a "shorty storm". It should be over now, so we can return to "normal."*

*I need quotes around that here. 😀
ID: 1868265 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1868269 - Posted: 19 May 2017, 22:51:48 UTC - in response to Message 1868265.  
Last modified: 19 May 2017, 23:01:39 UTC

Surprisingly... I got a reply: unit production is up, but return time is way down. Conclusion: there was a "shorty storm". It should be over now, so we can return to "normal."*

No shorty storm (although there were quite a few shorties around)
Many caches were completely empty, plenty of others almost so; so work was being returned almost as soon as it went out.
As the caches refill, return times will gradually increase again until the caches are full and turnaround times are back to being cache size & work type dependent.
The graphs are very useful in these situations.
Grant
Darwin NT
ID: 1868269 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868273 - Posted: 19 May 2017, 22:58:51 UTC - in response to Message 1868251.  

Changing preferences didn't do it. But shutting down BOINC for five minutes and then restarting DID get the tasks flowing again for the recalcitrant computer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868273 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1868276 - Posted: 19 May 2017, 23:02:01 UTC - in response to Message 1868273.  

Changing preferences didn't do it. But shutting down BOINC for five minutes and then restarting DID get the tasks flowing again for the recalcitrant computer.

It really is weird, in a rather annoying way.
Grant
Darwin NT
ID: 1868276 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1868280 - Posted: 19 May 2017, 23:17:00 UTC - in response to Message 1868276.  

Changing preferences didn't do it. But shutting down BOINC for five minutes and then restarting DID get the tasks flowing again for the recalcitrant computer.

It really is weird, in a rather annoying way.

You know, that raises some very interesting points.
As you may recall, I'm typically the guy saying that I'm experiencing none of these issues, and generally have no trouble getting the caches full.
Since I did the whole GuppiRescheduler/QOpt thing, I use Windoze' Task Scheduler to shut down BOINC periodically (either every 4 hrs or every 8 hrs, depending on the machine) to run GR.
Wonder if there's any correlation there? Might be worth a look-see, for anyone experiencing this. Just a thought.
ID: 1868280 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868282 - Posted: 19 May 2017, 23:23:35 UTC - in response to Message 1868280.  

I did much the same to restart BOINC after TBar noted getting tasks after a restart.

I made a script to stop/run/start rescheduling every 15 minutes for loading up for maintenance - I just set it to 6 minutes and managed to get ~150 tasks when tasks weren't flowing.
ID: 1868282 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1868285 - Posted: 19 May 2017, 23:26:58 UTC - in response to Message 1868282.  

I did much the same to restart BOINC after TBar noted getting tasks after a restart.

I made a script to stop/run/start rescheduling every 15 minutes for loading up for maintenance - I just set it to 6 minutes and managed to get ~150 tasks when tasks weren't flowing.

That just makes it even weirder.
Grant
Darwin NT
ID: 1868285 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1868288 - Posted: 19 May 2017, 23:36:33 UTC - in response to Message 1868273.  
Last modified: 19 May 2017, 23:39:18 UTC

Changing preferences didn't do it. But shutting down BOINC for five minutes and then restarting DID get the tasks flowing again for the recalcitrant computer.


It does that... it's like it gets depressed after asking numerous times and gives up.

No shorty storm (although there were quite a few shorties around)


OK... perhaps a short shorty shower? Some partially galactic hydrogenated shortening?
ID: 1868288 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1868302 - Posted: 20 May 2017, 0:47:12 UTC

Was there a problem here?

I must of missed it.

Cheers.
ID: 1868302 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868303 - Posted: 20 May 2017, 0:49:25 UTC - in response to Message 1868269.  

Surprisingly... I got a reply: unit production is up, but return time is way down. Conclusion: there was a "shorty storm". It should be over now, so we can return to "normal."*

No shorty storm (although there were quite a few shorties around)
Many caches were completely empty, plenty of others almost so; so work was being returned almost as soon as it went out.
As the caches refill, return times will gradually increase again until the caches are full and turnaround times are back to being cache size & work type dependent.
The graphs are very useful in these situations.


. . It would seem that the staff in the "centre" are not quite up to speed because, like Grant, my caches were COMPLETELY EMPTY. Zero tasks of any kind. Not for GPU nor CPU. Zilch. Every request met with "No tasks available". Whatever the cause it was NOT a shorty storm.

Stephen

:(
ID: 1868303 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (106) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.