Panic Mode On (104) Server Problems?

Message boards : Number crunching : Panic Mode On (104) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 42 · Next

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1848573 - Posted: 13 Feb 2017, 23:57:36 UTC - in response to Message 1848563.  

Oh no, not degrading again.....

Save us Obi-Wan Korpela, you're our only hope...



You tell 'em, Fidel!
ID: 1848573 · Report as offensive
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 43
Credit: 100,336,791
RAC: 140
Message 1848576 - Posted: 14 Feb 2017, 0:10:21 UTC
Last modified: 14 Feb 2017, 0:11:35 UTC

The progress on file 18dc09aa has been at a standstill all day (EST), so the 5 splitters working on it have essentially been offline.
All the work we see has been coming from the remaining 2 splitters.
ID: 1848576 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1848577 - Posted: 14 Feb 2017, 0:10:24 UTC

While 5 splitters are working on 1 file things will be in a bind. :-(

Cheers.
ID: 1848577 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1848591 - Posted: 14 Feb 2017, 3:23:09 UTC - in response to Message 1848576.  
Last modified: 14 Feb 2017, 3:23:37 UTC

Posted: 14 Feb 2017, 0:10:21 UTC;
The progress on file 18dc09aa has been at a standstill all day (EST), so the 5 splitters working on it have essentially been offline.
All the work we see has been coming from the remaining 2 splitters.
Hours later, Nothing has changed.
File 18dc09aa = (5)
The 5 Splitters are still stuck on one One File without any tasks being split, leaving only 2 Splitters working. Creation rate still around 10/sec.
ID: 1848591 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1848608 - Posted: 14 Feb 2017, 5:37:05 UTC - in response to Message 1848591.  
Last modified: 14 Feb 2017, 5:44:28 UTC

Posted: 14 Feb 2017, 0:10:21 UTC;
The progress on file 18dc09aa has been at a standstill all day (EST), so the 5 splitters working on it have essentially been offline.
All the work we see has been coming from the remaining 2 splitters.
Hours later, Nothing has changed.
File 18dc09aa = (5)
The 5 Splitters are still stuck on one One File without any tasks being split, leaving only 2 Splitters working. Creation rate still around 10/sec.


..Would that be less than a quarter of what is needed to be comfortable??

Stephen

PS: . . Aint it marvellous, 3 rigs in starvation mode and less than 12 hours to the outage :(
. . . . . . .Oh well, there's always Einstein's Bar and Grill

PPS: . Small amendment, one of them just got some work ... yippee!

?
ID: 1848608 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1848613 - Posted: 14 Feb 2017, 6:13:24 UTC - in response to Message 1848608.  

Been out of work on one computer for over an hour. The next fastest computer will be out of work in half an hour. No tasks retrieved when asked for. My slowest computer might make it to morning. The SETI "Outrage" is coming earlier and earlier. Off to the Einstein Bar and Grill.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1848613 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1848618 - Posted: 14 Feb 2017, 7:16:30 UTC

Well, I better start up a heater before I retire for the night with the kitties.
As there seems to be little doubt that my GPUs are gonna start going cold.

Meowsigh.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1848618 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1848620 - Posted: 14 Feb 2017, 7:22:58 UTC

Yeah, kinda sucks to go into an outage with empty caches.
This can't be that difficult a problem to fix ...
ID: 1848620 · Report as offensive
Profile UniMatrixZ
Avatar

Send message
Joined: 2 Feb 01
Posts: 102
Credit: 30,826,065
RAC: 3
Sweden
Message 1848621 - Posted: 14 Feb 2017, 7:37:46 UTC
Last modified: 14 Feb 2017, 7:39:01 UTC

Hmmm and here we are again with empty caches and it's soon tuesday outage.
I really hope they can fix Centurion and that it would run more then a few days without crashing.

Do we know what the problem is with it?

"SETI is probably the most important quest of our time,
and it amazes me that governments and corporations
are not supporting it sufficiently."- Arthur C. Clarke 2006
ID: 1848621 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1848624 - Posted: 14 Feb 2017, 8:06:17 UTC - in response to Message 1848621.  

Hmmm and here we are again with empty caches and it's soon tuesday outage.
I really hope they can fix Centurion and that it would run more then a few days without crashing.

Do we know what the problem is with it?

Unclear to me whether this is a problem with Centurion, or whether it's just down due to not having any work to do.
SSP shows no BLC tapes ready to split.
What's not clear is what process places tapes into place so PFB and GBT splitters can find them and do work.
ID: 1848624 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1848625 - Posted: 14 Feb 2017, 8:24:23 UTC

It's a manual process - someone removes the processed disks (we still call them tapes after all these years of them being disks), loads the unprocessed disks, mounts them and walks away, the splitters then start chewing. Some of this may done remotely rather than in the CoLo, say in the SSL, but it always needs a human to do the loading from the transport cases into the disk array. Which is why having a small number of disks on a Friday is often bad news by Monday morning.
As to no GBT data being loaded yesterday, that box of disks must be empty :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1848625 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1848637 - Posted: 14 Feb 2017, 11:54:05 UTC - in response to Message 1848620.  

Yeah, kinda sucks to go into an outage with empty caches.
This can't be that difficult a problem to fix ...


. . Sadly the evidence seems to indicate the opposite :(

Stephen

:-(
ID: 1848637 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1848638 - Posted: 14 Feb 2017, 11:56:12 UTC - in response to Message 1848624.  

Hmmm and here we are again with empty caches and it's soon tuesday outage.
I really hope they can fix Centurion and that it would run more then a few days without crashing.

Do we know what the problem is with it?

Unclear to me whether this is a problem with Centurion, or whether it's just down due to not having any work to do.
SSP shows no BLC tapes ready to split.
What's not clear is what process places tapes into place so PFB and GBT splitters can find them and do work.


..I had the impression they have to be loaded, as in the appropriate storage device connected to the system.

Stephen

.
ID: 1848638 · Report as offensive
Profile Michel Makhlouta
Volunteer tester
Avatar

Send message
Joined: 21 Dec 03
Posts: 169
Credit: 41,799,743
RAC: 0
Lebanon
Message 1848643 - Posted: 14 Feb 2017, 12:20:41 UTC

i did something wrong in the last outage we've had and BOINC downloaded a massive cache of Einstein (bar and grill) and ended up stuck with them for days and still days to come. I will take advantage of this outage to finish remaining SETI WUs and install linux on my main cruncher...

just a side note... it took weeks for seti to reach a RAC of 53K, while Einstein is at 280K right now and seems far from its peak...
ID: 1848643 · Report as offensive
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 43
Credit: 100,336,791
RAC: 140
Message 1848662 - Posted: 15 Feb 2017, 0:52:01 UTC - in response to Message 1848576.  

The progress on file 18dc09aa has been at a standstill all day (EST), so the 5 splitters working on it have essentially been offline.
All the work we see has been coming from the remaining 2 splitters.


We're back from the outage and it looks like they removed that file from the queue. :)
ID: 1848662 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1848673 - Posted: 15 Feb 2017, 1:31:54 UTC - in response to Message 1848662.  


We're back from the outage and it looks like they removed that file from the queue. :)


But now 22au08aa seems to have cornered the market with 4 splitters.

Can't someone write some code to prohibit a splitter from working on a file with another splitter on it?

Something like:

Start splitter
Look at "no splitter list" file
Pick one at random
remove name from NSL file
Start splitting



AND: when mounting a "tape":
Manually add the name to the NSL file.
Or modify the mount command for it to do so automatically

Is that really hard? Or am I missing something?
ID: 1848673 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1848684 - Posted: 15 Feb 2017, 2:11:12 UTC - in response to Message 1848662.  

The progress on file 18dc09aa has been at a standstill all day (EST), so the 5 splitters working on it have essentially been offline.
All the work we see has been coming from the remaining 2 splitters.


We're back from the outage and it looks like they removed that file from the queue. :)


. . But still no GBT splitters running and still "no tasks available" most of the time. :(

. . If this is going to be the norm I might have to rent a room at the Einstein Bar and Grill.

:(

Stephen

:(
ID: 1848684 · Report as offensive
Wild6-NJ
Volunteer tester

Send message
Joined: 4 Aug 99
Posts: 43
Credit: 100,336,791
RAC: 140
Message 1848687 - Posted: 15 Feb 2017, 2:16:39 UTC - in response to Message 1848673.  


We're back from the outage and it looks like they removed that file from the queue. :)


But now 22au08aa seems to have cornered the market with 4 splitters.

Can't someone write some code to prohibit a splitter from working on a file with another splitter on it?

Something like:

Start splitter
Look at "no splitter list" file
Pick one at random
remove name from NSL file
Start splitting



AND: when mounting a "tape":
Manually add the name to the NSL file.
Or modify the mount command for it to do so automatically

Is that really hard? Or am I missing something?


But at least there's progress on this file.
The splitter output is now about 30/s.
When the other file was frozen, the output was about 10/s.
ID: 1848687 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1848691 - Posted: 15 Feb 2017, 2:33:11 UTC - in response to Message 1848624.  

Hmmm and here we are again with empty caches and it's soon tuesday outage.
I really hope they can fix Centurion and that it would run more then a few days without crashing.

Do we know what the problem is with it?

Unclear to me whether this is a problem with Centurion, or whether it's just down due to not having any work to do.
SSP shows no BLC tapes ready to split.
What's not clear is what process places tapes into place so PFB and GBT splitters can find them and do work.

The name of the process that loads the data for splitting is normally Eric or Jeff.
Unless they have some interns to delegate the task to.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1848691 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1848696 - Posted: 15 Feb 2017, 2:47:52 UTC

Why don't we start a fund for a robotic disk library? I used to do Tech Support on commercial tape backup robotic libraries and they aren't all that big. Should be able to find space for a robot at the CoLo or SSL or wherever the splitters are physically located. That would solve the issue of having to have a corporeal entity physically change out the disks as is done currently.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1848696 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 42 · Next

Message boards : Number crunching : Panic Mode On (104) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.