Panic Mode On (112) Server Problems?

Message boards : Number crunching : Panic Mode On (112) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947346 - Posted: 1 Aug 2018, 22:44:27 UTC - in response to Message 1947286.  
Last modified: 1 Aug 2018, 22:49:26 UTC


. . I do find it disconcerting how they load new tapes and these immediately pre-empt any other tapes currently splitting...

Stephen

:(


I agree... it offends my OCD to not have them finish the previous data set before starting on a new one. The splitters do seem to do them in date order so they might be able to easily reorder the list on the server status page to make it a bit more orderly.

It is especially weird right now as they seem to have two sets of 14 processes running in two different sections of the data.

. . Others have said the same about the splitter priority, but with each of the receiver sets we have had lately they have had the same date and time period (all done in the one night) yet the newer tape set still over rides the existing ... though this time some of the Blc04 tapes are still splitting (seems maybe 6 or so) ...

. . And like yourself it upsets my OCD as well :)

Stephen

:)
ID: 1947346 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1947375 - Posted: 2 Aug 2018, 0:41:03 UTC

Probably not a panic moment as splitting is happening and the ready to send queue is staying full.
More like a curiosity or a possible problem

But the split files don't seem to be finishing and going away...
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)
ID: 1947375 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1947505 - Posted: 2 Aug 2018, 14:22:31 UTC
Last modified: 2 Aug 2018, 14:23:09 UTC

We are currently processing +/- 110.000 results an hour.

What should be our processing rate to be able to match data sets creation rate?
ID: 1947505 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947543 - Posted: 2 Aug 2018, 16:16:12 UTC - in response to Message 1947505.  

I think we get in trouble when the return rate gets up over 140K. Then the splitters can't keep up. So at 110K we should be in good shape.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947543 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947622 - Posted: 2 Aug 2018, 23:52:20 UTC - in response to Message 1947375.  

Probably not a panic moment as splitting is happening and the ready to send queue is staying full.
More like a curiosity or a possible problem

But the split files don't seem to be finishing and going away...
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)

. . The lowest numbered Blc04 file (bloc04...0010) is stuck and nothing after that is "going away" as you put it. That 'tape' needs a bit of a kick ...

Stephen

:)
ID: 1947622 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947624 - Posted: 2 Aug 2018, 23:55:34 UTC - in response to Message 1947505.  

We are currently processing +/- 110.000 results an hour.

What should be our processing rate to be able to match data sets creation rate?

. . Well we have spent the last few weeks processing the output from Greenbank in one night, so I think it would need to be considerably higher than it is. But since the SETI server room could not cope with that level of processing it is a moot point.

Stephen

:)
ID: 1947624 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1947672 - Posted: 3 Aug 2018, 10:03:54 UTC - in response to Message 1947624.  

. . Well we have spent the last few weeks processing the output from Greenbank in one night, so I think it would need to be considerably higher than it is. But since the SETI server room could not cope with that level of processing it is a moot point.

Stephen

:)


So we would need 20-30x our current processing power to process everything in real time...
ID: 1947672 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1947699 - Posted: 3 Aug 2018, 14:55:21 UTC - in response to Message 1947672.  
Last modified: 3 Aug 2018, 15:05:50 UTC

. . Well we have spent the last few weeks processing the output from Greenbank in one night, so I think it would need to be considerably higher than it is. But since the SETI server room could not cope with that level of processing it is a moot point.

Stephen

:)


So we would need 20-30x our current processing power to process everything in real time...



For some reason we got an amazing load of data for this one day. Is it a special day?? was something interesting in space where the data was collected on that day?? were there just funds and opportunity??
I don't think we get data every day from each source. There is a post showing the dates we have analyzed for Arecibo ,
but I don't know if there is something similar for Greenbank.
We did have a "hiccup" with the connection and data download from Greenbank for a while not too long ago, so I can only guess we have a backlog to get through, but since they don't seem to be giving us Greenbank data in any date order that I can see it is hard to tell how much we have left to work through.
ID: 1947699 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1947704 - Posted: 3 Aug 2018, 15:06:19 UTC - in response to Message 1947699.  
Last modified: 3 Aug 2018, 15:17:45 UTC

There is a post showing the dates we have analyzed for Arecibo , https://setiathome.berkeley.edu/forum_thread.php?id=74238 but I don't know if there is something similar for Greenbank.
That was me. There are a couple more months to add since then, but not a huge amount.

The Green Bank tapes are in the same database, but the names are a horrible mess - very difficult to parse automatically. I had a look through manually to see if they were worth charting, but I could only find 25 distinct numbers in the 'modified julian date' field - which feels wrong. Maybe I should spot-check a few of the actual recording dates in the workunit xml header and see if they match the MJD.

Edit - on a sample of one (task 6858969764), it works:

 <name>blc13_2bit_guppi_58227_07595_HIP52911_0019.30670.818.22.45.188.vlar</name>
  <group_info>
    <tape_info>
      <name>blc13_2bit_guppi_58227_07595_HIP52911_0019</name>
      <start_time>2458227.5896015</start_time>

      <time_recorded>Thu Apr 19 02:09:01 2018</time_recorded>
      <time_recorded_jd>2458227.5896004</time_recorded_jd>
MJD 58227 is indeed April 19 this year.
ID: 1947704 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947731 - Posted: 3 Aug 2018, 17:16:22 UTC - in response to Message 1947704.  

There is a post showing the dates we have analyzed for Arecibo , https://setiathome.berkeley.edu/forum_thread.php?id=74238 but I don't know if there is something similar for Greenbank.
That was me. There are a couple more months to add since then, but not a huge amount.

The Green Bank tapes are in the same database, but the names are a horrible mess - very difficult to parse automatically. I had a look through manually to see if they were worth charting, but I could only find 25 distinct numbers in the 'modified julian date' field - which feels wrong. Maybe I should spot-check a few of the actual recording dates in the workunit xml header and see if they match the MJD.

Edit - on a sample of one (task 6858969764), it works:

 <name>blc13_2bit_guppi_58227_07595_HIP52911_0019.30670.818.22.45.188.vlar</name>
  <group_info>
    <tape_info>
      <name>blc13_2bit_guppi_58227_07595_HIP52911_0019</name>
      <start_time>2458227.5896015</start_time>

      <time_recorded>Thu Apr 19 02:09:01 2018</time_recorded>
      <time_recorded_jd>2458227.5896004</time_recorded_jd>
MJD 58227 is indeed April 19 this year.

. . Don't forget that to truly be "in real time" we would have to process all the data from Arecibo and ALL recorders at Greenbank (and soon Parkes as well) within a 24 hour period. Currently we running at about 100th of that rate (or less).

Stephen

:)
ID: 1947731 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1947782 - Posted: 4 Aug 2018, 0:12:04 UTC - in response to Message 1947731.  


. . Don't forget that to truly be "in real time" we would have to process all the data from Arecibo and ALL recorders at Greenbank (and soon Parkes as well) within a 24 hour period. Currently we running at about 100th of that rate (or less).

Stephen

:)


Let's just say in "close" to real time. It takes time to transmit, prepare files (is anything done to make the data ready to be split?), and sent out to Boinc/Seti machines and then returned (6 week time period given, more , if data needs more than 2 results to confirm) and analyzed.

The point I was trying to make is that if you look at Richard's graphs (Thanks!) of Arecibo data, that we don't get data every day from Arecibo, and that when we do get data, it is of varying amounts. We don't have to meet some goal of being able to process ALL, when we don't (and probably won't) get ALL from Arecibo. I'm guessing we don't get ALL the Green Bank data either. Some of the data collected may belong to the project paying for the telescope time, and not accessible until a later date to Seti. It is hard to know how much Green Bank data they have in hand to give to us, so it is hard to guess how we are doing, or how close to analyzing all the data they have to give us we are getting.

I'm looking forward to getting the Parkes datasets, but it might still be a while.

Unixchick
who still wonders about the 'oumaumau dataset that was all errors.
ID: 1947782 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947783 - Posted: 4 Aug 2018, 0:29:33 UTC - in response to Message 1947782.  

Just like with Arecibo, we get our data irregardless of what the telescope is doing for the researchers paying for the telescope time. We piggyback our own hardware on the scope. We just have no control over where the telescope is pointing unless we have input to choose what the target is going to be. We never had control at Arecibo. We might have some control over GBT because it is aligned with the Breakthrough Listen program and they are the ones that have the funds to choose targets.

And yes, we can't use the raw data coming from the recorders. There is always going to be some pre-processing of the data. Mainly for radar blanking from any data from Arecibo but I think there is also processing on data from GBT even though it is ostensibly cleaner.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947783 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1947849 - Posted: 4 Aug 2018, 9:18:00 UTC - in response to Message 1947783.  

Up to a point. For obvious reasons, we can't record our incredibly delicate data while Arecibo's high-power radar transmitter is in use. And our data recorder is currently tied to the ALFA antenna array: when the directing astronomers want a different piece of kit at the prime focus, we're not likely to record much of any use.

The Arecibo Observatory Telescope Schedule can be viewed online: not many people seem to be using ALFA at the moment, although the PALFA Galactic Plane Survey (P2030) is getting more time towards the end of the month.
ID: 1947849 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1947864 - Posted: 4 Aug 2018, 12:40:33 UTC - in response to Message 1947782.  

Let's just say in "close" to real time. It takes time to transmit, prepare files (is anything done to make the data ready to be split?), and sent out to Boinc/Seti machines and then returned (6 week time period given, more , if data needs more than 2 results to confirm) and analyzed.

. . Yes the "tapes" (as they used to be and it still makes a useful term even though they have not been on tape for quite a while) go through a preparation process before they can go to the splitters. Then to the download servers, into the "hopper" , until they get sent to a host, then back to the upload server which forwards them to the validators then into the science database where they sit until they get dealt with by the back end app (nebula?) and when all that is done they are ...

The point I was trying to make is that if you look at Richard's graphs (Thanks!) of Arecibo data, that we don't get data every day from Arecibo, and that when we do get data, it is of varying amounts. We don't have to meet some goal of being able to process ALL, when we don't (and probably won't) get ALL from Arecibo. I'm guessing we don't get ALL the Green Bank data either. Some of the data collected may belong to the project paying for the telescope time, and not accessible until a later date to Seti. It is hard to know how much Green Bank data they have in hand to give to us, so it is hard to guess how we are doing, or how close to analyzing all the data they have to give us we are getting.

. . I am pretty sure we do not get all the data from any telescope. I don't know why. I thought the SETI recorders just piggy backed the normal architecture of the scope so we would get a copy of everything they observe. But I guess if the recorders are full we don't get any more data until those recordings are transferred back to Berkeley and fresh (empty) drives are mounted to record more.

I'm looking forward to getting the Parkes datasets, but it might still be a while.

. . Oh me too, I am very curious to see if it is just like the GBT data or something very different again. Not to mention the "local" input part of it. :)

who still wonders about the 'oumaumau dataset that was all errors.

. . That could have been almost anything. Problems in the transfer process, a glitch in the prep for splitting causing them to split improperly or any of many other reasons. Maybe that will be sorted and we will see them again or maybe they are in the data trash can.

Stephen

:)
ID: 1947864 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1948091 - Posted: 6 Aug 2018, 1:07:13 UTC

For some reason my Xeon box has stopped downloading CPU tasks. It is only requesting GPU tasks.

Is there a server issue or is it that I am over run of Rosetti tasks so the scheduler doesn't want to ship any more cpu tasks my way?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1948091 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1948094 - Posted: 6 Aug 2018, 1:26:36 UTC - in response to Message 1948091.  

Set sched_op_debug in Event Log Diagnostic flags and it will tell you whether you are overcommitted on cpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1948094 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1948140 - Posted: 6 Aug 2018, 12:42:10 UTC - in response to Message 1948094.  

Set sched_op_debug in Event Log Diagnostic flags and it will tell you whether you are overcommitted on cpu tasks.


Thank you.
A proud member of the OFA (Old Farts Association).
ID: 1948140 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1948293 - Posted: 7 Aug 2018, 2:37:57 UTC - in response to Message 1947375.  
Last modified: 7 Aug 2018, 2:39:38 UTC

Probably not a panic moment as splitting is happening and the ready to send queue is staying full.
More like a curiosity or a possible problem

But the split files don't seem to be finishing and going away...
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)

There seems to be 8 9 channels that have not been removed. Totalling 419.33 GB of space. I am sure they will be removed after during tomorrow's maintenance
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)
blc04_2bit_guppi_58227_05926_HIP53486_0014 52.42 GB (128)
blc04_2bit_guppi_58227_06261_HIP52809_0015 52.42 GB (128)
blc04_2bit_guppi_58227_06595_HIP53486_0016 52.42 GB (128)
blc04_2bit_guppi_58227_06928_HIP52893_0017 52.39 GB (128)
blc04_2bit_guppi_58227_07261_HIP53486_0018 52.42 GB (128)

ID: 1948293 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1948294 - Posted: 7 Aug 2018, 2:58:40 UTC - in response to Message 1948293.  


There seems to be 8 9 channels that have not been removed. Totalling 419.33 GB of space. I am sure they will be removed after during tomorrow's maintenance

. . Hi Speedy,

. . Those are not channels, they are tapes (Disks actually) and each one holds, I believe, 128 channels ...

. . I have got no idea how many tasks are spawned from each channel though.

Stephen

? ?
ID: 1948294 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1948298 - Posted: 7 Aug 2018, 3:53:47 UTC - in response to Message 1948293.  

[quote]Probably not a panic moment as splitting is happening and the ready to send queue is staying full.
More like a curiosity or a possible problem

But the split files don't seem to be finishing and going away...
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)

There seems to be 8 9 channels that have not been removed. Totalling 419.33 GB of space. I am sure they will be removed after during tomorrow's maintenance
blc04_2bit_guppi_58227_04470_HIP53229_0010 52.42 GB (128)
blc04_2bit_guppi_58227_04819_HIP52575_0011 52.42 GB (128)
blc04_2bit_guppi_58227_05169_HIP53229_0012 52.42 GB (128)
blc04_2bit_guppi_58227_05505_HIP52675_0013 52.42 GB (128)
blc04_2bit_guppi_58227_05926_HIP53486_0014 52.42 GB (128)
blc04_2bit_guppi_58227_06261_HIP52809_0015 52.42 GB (128)
blc04_2bit_guppi_58227_06595_HIP53486_0016 52.42 GB (128)
blc04_2bit_guppi_58227_06928_HIP52893_0017 52.39 GB (128)
blc04_2bit_guppi_58227_07261_HIP53486_0018 52.42 GB (128)
[/quote

My weird theory is that we have 28 channels running instead of the usual 14. They gave us extra channels to help speed up recovery after last Tuesday's outage. At some point 14 of the processes stayed with the last file they did, trapped and not moving on to new files. Which seems fine to me as they aren't needed.

Come Tuesday's (Tomorrow's) recovery after outage time and all 28 channels will be running again to shorten recovery time. After recovery happens they might be again files in this state as at some point only 14 channels are needed. Hopefully they have come up with a more graceful way for the channels to exit when no longer needed though.
ID: 1948298 · Report as offensive
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · Next

Message boards : Number crunching : Panic Mode On (112) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.