Panic Mode On (104) Server Problems?

Message boards : Number crunching : Panic Mode On (104) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 42 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1843676 - Posted: 21 Jan 2017, 23:18:11 UTC - in response to Message 1843637.  
Last modified: 21 Jan 2017, 23:21:07 UTC

The thickness of the dark green part at the beginning of a file being split bar will indicate how many splitters are working on that 1 file and sometimes all of them can be working on just 1 file. ;-)

That's probably it- there are so many channels in a GBT file that a single channel is too small to see. And all the GBT work I've got (other than re-sends) is all from the one file.

So that little sliver of dark green at the start of blc2_2bit_guppi_57423_32060_HIP53824_0017 isn't a single splitter, but all of them on the one file.

So many channels per tape it's hard to tell how many are working a given tape, and the resolution is too fine to tell visually ...
ID: 1843676 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36551
Credit: 261,360,520
RAC: 489
Australia
Message 1843681 - Posted: 21 Jan 2017, 23:41:54 UTC - in response to Message 1843676.  

The thickness of the dark green part at the beginning of a file being split bar will indicate how many splitters are working on that 1 file and sometimes all of them can be working on just 1 file. ;-)

That's probably it- there are so many channels in a GBT file that a single channel is too small to see. And all the GBT work I've got (other than re-sends) is all from the one file.

So that little sliver of dark green at the start of blc2_2bit_guppi_57423_32060_HIP53824_0017 isn't a single splitter, but all of them on the one file.

So many channels per tape it's hard to tell how many are working a given tape, and the resolution is too fine to tell visually ...

Any mouse over magnifier app will help there if required. ;-)

Cheers.
ID: 1843681 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1843687 - Posted: 22 Jan 2017, 0:10:43 UTC - in response to Message 1843676.  
Last modified: 22 Jan 2017, 0:11:21 UTC

So that little sliver of dark green at the start of blc2_2bit_guppi_57423_32060_HIP53824_0017 isn't a single splitter, but all of them on the one file.

So many channels per tape it's hard to tell how many are working a given tape, and the resolution is too fine to tell visually ...

Yeah.
It works well for Arecibo files, but not so well for GBT. Once again, it would be nice if they could get the splitters to each work on a different file.
Grant
Darwin NT
ID: 1843687 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1843799 - Posted: 22 Jan 2017, 16:03:41 UTC - in response to Message 1843635.  

I do not know if it is just me, but every time I look I see 7 channels of GBT data being split???? or am I looking somewhere you are not?

Must be a different place.

Computing, Server Status page. Scroll down to splitter status, Breakthrough listen.
It shows 7 files that have completed channels in them (light green), but it shows only 1 channel in progress (actually being split- dark green).
Scroll down to Multibeam (Arecibo) and it shows 6 files having channels been completed (light green), and 4 channels in progress (dark green).

Given the amount of GBT work I got overnight and the Ready-to-send buffer remains full, I suspect that more than one channel is being split, but it just isn't being displayed in the splitter status.

I have thought that looking at the Server Status column of servers and counting the servers active each indicated a channel.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1843799 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1843815 - Posted: 22 Jan 2017, 18:08:14 UTC - in response to Message 1843799.  

I do not know if it is just me, but every time I look I see 7 channels of GBT data being split???? or am I looking somewhere you are not?

Must be a different place....

I have thought that looking at the Server Status column of servers and counting the servers active each indicated a channel.

Yeah, that too :)
ID: 1843815 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1844076 - Posted: 23 Jan 2017, 18:03:30 UTC - in response to Message 1844062.  

Take a look at your pendings for Dec 5 and 6. There's tons of WU's still waiting for validation, even though they are finished by both _0 and _1.
Must be hundreds of thousands (if not millions) of those in total over all computers on the project.

I've checked a few computers randomly, and they all have WU's finished by both _0 and _1, still waiting for validation for the dates Dec 5, and 6.

Time to run some script perhaps. Don't they have something to warn about these "hanging" WU's"?
That stuff can't be good for the size of the databases.

I'm seeing the same, after some random spot checks of my boxes.
Looks like it's all work where the last wingman reported on Dec 5-6-7.
ID: 1844076 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1844128 - Posted: 23 Jan 2017, 22:50:10 UTC - in response to Message 1844111.  

Take a look at your pendings for Dec 5 and 6. There's tons of WU's still waiting for validation, even though they are finished by both _0 and _1.
Must be hundreds of thousands (if not millions) of those in total over all computers on the project.

I've checked a few computers randomly, and they all have WU's finished by both _0 and _1, still waiting for validation for the dates Dec 5, and 6.

Time to run some script perhaps. Don't they have something to warn about these "hanging" WU's"?
That stuff can't be good for the size of the databases.

I'm seeing the same, after some random spot checks of my boxes.
Looks like it's all work where the last wingman reported on Dec 5-6-7.

Well, let's hope they can fix it during tomorrows (shall we say 12 hour long) outage.
And maybe also change the message about the 3-4 hour long outage. It was years ago since we had any 3-4 hour long outages.
The normal now, seems to be 9-12 hours.


. . Strange, out of my 3 rigs only one has tasks in this state but it has something like 45 of them. So if 1 in 3 has this problem then that, as you say, would add up to an awful lot of tasks in limbo on a global scale. I hope they are aware of the problem and again like you I hope they fix it tomorrow ...

Stephen

.
ID: 1844128 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1844146 - Posted: 24 Jan 2017, 0:50:55 UTC - in response to Message 1844128.  

. . Strange, out of my 3 rigs only one has tasks in this state but it has something like 45 of them. So if 1 in 3 has this problem then that, as you say, would add up to an awful lot of tasks in limbo on a global scale. I hope they are aware of the problem and again like you I hope they fix it tomorrow ...
Stephen
.

I'm seeing it on 3 of 5 rigs, all on jobs returned 6 Dec 2016, and all have been in that state since 6 Dec.
(i.e., both wingmen returned the work on 6 Dec)
Given that, no maintenance tomorrow will resolve this unless it is specifically addressed, I would guess.
Hopefully there's someone reading this thread that can get the report to the right folks ...
ID: 1844146 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1844154 - Posted: 24 Jan 2017, 1:43:07 UTC - in response to Message 1844146.  

. . Strange, out of my 3 rigs only one has tasks in this state but it has something like 45 of them. So if 1 in 3 has this problem then that, as you say, would add up to an awful lot of tasks in limbo on a global scale. I hope they are aware of the problem and again like you I hope they fix it tomorrow ...
Stephen
.

I'm seeing it on 3 of 5 rigs, all on jobs returned 6 Dec 2016, and all have been in that state since 6 Dec.
(i.e., both wingmen returned the work on 6 Dec)
Given that, no maintenance tomorrow will resolve this unless it is specifically addressed, I would guess.
Hopefully there's someone reading this thread that can get the report to the right folks ...

Out of ~1400 pending tasks I have ~60 from that time period pending. Out of that 14 that are awaiting validation with two results that were returned on the 6th or 7th.

There is a process that will check the tasks again when their original Report deadline passes. Then they should be sent to the validator.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1844154 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1844179 - Posted: 24 Jan 2017, 4:44:28 UTC - in response to Message 1844154.  


Out of ~1400 pending tasks I have ~60 from that time period pending. Out of that 14 that are awaiting validation with two results that were returned on the 6th or 7th.

There is a process that will check the tasks again when their original Report deadline passes. Then they should be sent to the validator.


. . Or dumped :( ... Trust no one!

Stephen

:)
ID: 1844179 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36551
Credit: 261,360,520
RAC: 489
Australia
Message 1844181 - Posted: 24 Jan 2017, 4:48:06 UTC - in response to Message 1844179.  


Out of ~1400 pending tasks I have ~60 from that time period pending. Out of that 14 that are awaiting validation with two results that were returned on the 6th or 7th.

There is a process that will check the tasks again when their original Report deadline passes. Then they should be sent to the validator.


. . Or dumped :( ... Trust no one!

Stephen

:)

Hal is correct and this has happened before, several times in fact over the years, but thankfully I have none of those here this time.

Cheers.
ID: 1844181 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36551
Credit: 261,360,520
RAC: 489
Australia
Message 1844186 - Posted: 24 Jan 2017, 5:24:29 UTC - in response to Message 1844184.  


Hal is correct and this has happened before, several times in fact over the years, but thankfully I have none of those here this time.

Cheers.

Oh yes you have, and plenty of them too. Just one example from one of your computers:
https://setiathome.berkeley.edu/workunit.php?wuid=2349156167
That one has been in limbo since Dec 6. You have many more....

I should've went back further it seems, 66 of them, but their deadlines are 25, 26, 27 January so they should clear over the next few days.

Cheers.
ID: 1844186 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1844201 - Posted: 24 Jan 2017, 7:40:26 UTC - in response to Message 1844181.  


Out of ~1400 pending tasks I have ~60 from that time period pending. Out of that 14 that are awaiting validation with two results that were returned on the 6th or 7th.

There is a process that will check the tasks again when their original Report deadline passes. Then they should be sent to the validator.


. . Or dumped :( ... Trust no one!

Stephen

:)

Hal is correct and this has happened before, several times in fact over the years, but thankfully I have none of those here this time.

Cheers.

I had one of these "stuck" tasks as an AP a couple months ago. The automatic mechanism finally did clear it after about 4 months of hanging around after it hit its deadline.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1844201 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1844245 - Posted: 25 Jan 2017, 4:28:51 UTC

I wonder at times if this is the way it will end(the project).

We go into the Tuesday outage...............and never emerge............

"Sour Grapes make a bitter Whine." <(0)>
ID: 1844245 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1844246 - Posted: 25 Jan 2017, 4:31:17 UTC - in response to Message 1844245.  

I'm getting work.
But once again it's taken changing & then changing again the application settings.
Even with all of that, getting work after this outage is proving more difficult than after the last one.
Grant
Darwin NT
ID: 1844246 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1844247 - Posted: 25 Jan 2017, 4:48:15 UTC - in response to Message 1844246.  

I'm getting work.
But once again it's taken changing & then changing again the application settings.
Even with all of that, getting work after this outage is proving more difficult than after the last one.

There's got to be something we're all missing here, though I have no clue what it could be.
As usual, all 5 boxes here are full-up (1800 total tasks) within 30 minutes after the outage.
What's different?
ID: 1844247 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1844249 - Posted: 25 Jan 2017, 4:57:14 UTC - in response to Message 1844247.  

As usual, all 5 boxes here are full-up (1800 total tasks) within 30 minutes after the outage.
What's different?


Maybe the server code has an exception for two-legged horses?
ID: 1844249 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1844250 - Posted: 25 Jan 2017, 5:06:06 UTC - in response to Message 1844247.  

I'm getting work.
But once again it's taken changing & then changing again the application settings.
Even with all of that, getting work after this outage is proving more difficult than after the last one.

There's got to be something we're all missing here, though I have no clue what it could be.
As usual, all 5 boxes here are full-up (1800 total tasks) within 30 minutes after the outage.
What's different?

Timing?
Being one of the first to request work after the servers come back online? Being geographically closer, allowing for less latency in requests?


What i'm still trying to figure out is why I have to change the Application settings once or twice a day to be able to keep getting work. Eventually, even after the post outage congestion, the usual response from the Scheduler these days is "Project has no tasks available." Change the Application settings, then it has work available. At least for 12 or more hours. Then I get to do it all over again.
Grant
Darwin NT
ID: 1844250 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1844254 - Posted: 25 Jan 2017, 5:13:00 UTC - in response to Message 1844247.  

I'm getting work.
But once again it's taken changing & then changing again the application settings.
Even with all of that, getting work after this outage is proving more difficult than after the last one.

There's got to be something we're all missing here, though I have no clue what it could be.
As usual, all 5 boxes here are full-up (1800 total tasks) within 30 minutes after the outage.
What's different?

I sure wish we could figure out what mechanism is in play and how to configure around it. I haven't bothered playing around with preferences since it never seemed to have any effect on my machines. I am still getting work in 4-6 task downloads when I request work. About 4in5 requests end up with the " no tasks are available " message from the servers. It will take me about 2 days to get back to my full quota.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1844254 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1844258 - Posted: 25 Jan 2017, 5:48:32 UTC - in response to Message 1844249.  

Maybe the server code has an exception for two-legged horses?

Anyone can be half-assed, but I'll have you know it takes real talent to be half-horsed :)
lol
Too bad GIFs aren't supported; it runs very nicely.
ID: 1844258 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 42 · Next

Message boards : Number crunching : Panic Mode On (104) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.