Panic Mode On (94) Server Problems?

Message boards : Number crunching : Panic Mode On (94) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 22 · Next

AuthorMessage
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1622461 - Posted: 2 Jan 2015, 18:49:29 UTC

Are We, SETIzens, in the End of the Timeline? No new "tapes" from Arecibo?
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1622461 · Report as offensive
Dena Wiltsie
Volunteer tester

Send message
Joined: 19 Apr 01
Posts: 1628
Credit: 24,230,968
RAC: 26
United States
Message 1622615 - Posted: 3 Jan 2015, 3:07:31 UTC

The cricket graph shows something incoming. It appears somebody is spending part of their four day holiday extracting some more data.

As for me, my heat pump has a problem and is icing up so please send me some APs to help keep the house warm. In any case, the repair man will turn up tomorrow and warmer weather is on it's way to Phoenix so don't worry about me.
ID: 1622615 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1622619 - Posted: 3 Jan 2015, 4:37:46 UTC

I would love to know on our daily or weekly basis how many gigabyte/terabyte of data is new by sending data out to us crunchers and data the server to get split. If I remember correctly I recall them saying we use about a petabyte a month I think.
Yes it would appear that data is being sent at a reasonably fast rate of 240 MB/S It was up to 408 a wee while ago
ID: 1622619 · Report as offensive
Dena Wiltsie
Volunteer tester

Send message
Joined: 19 Apr 01
Posts: 1628
Credit: 24,230,968
RAC: 26
United States
Message 1622633 - Posted: 3 Jan 2015, 5:54:01 UTC - in response to Message 1622619.  

I would love to know on our daily or weekly basis how many gigabyte/terabyte of data is new by sending data out to us crunchers and data the server to get split. If I remember correctly I recall them saying we use about a petabyte a month I think.
Yes it would appear that data is being sent at a reasonably fast rate of 240 MB/S It was up to 408 a wee while ago

The green is data sent to the outside world. The blue line is returned data or raw data being moved into the system. The cricket program looks at the computer complex so the names are reversed.
ID: 1622633 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1622664 - Posted: 3 Jan 2015, 7:25:18 UTC - in response to Message 1622633.  

I would love to know on our daily or weekly basis how many gigabyte/terabyte of data is new by sending data out to us crunchers and data the server to get split. If I remember correctly I recall them saying we use about a petabyte a month I think.
Yes it would appear that data is being sent at a reasonably fast rate of 240 MB/S It was up to 408 a wee while ago

The green is data sent to the outside world. The blue line is returned data or raw data being moved into the system. The cricket program looks at the computer complex so the names are reversed.

Thanks for your explanation on the way data travels. It would appear that the majority of data that was being sent across his now come across.
ID: 1622664 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1622668 - Posted: 3 Jan 2015, 7:44:39 UTC - in response to Message 1622633.  
Last modified: 3 Jan 2015, 7:48:27 UTC

I would love to know on our daily or weekly basis how many gigabyte/terabyte of data is new by sending data out to us crunchers and data the server to get split. If I remember correctly I recall them saying we use about a petabyte a month I think.
Yes it would appear that data is being sent at a reasonably fast rate of 240 MB/S It was up to 408 a wee while ago

The green is data sent to the outside world. The blue line is returned data or raw data being moved into the system. The cricket program looks at the computer complex so the names are reversed.

Specifically, "inbound" and "outbound" are relative to the port on the router itself. "inbound" is "receive" and "outbound" is "transmit". But that router port is one hop adjacent to the servers, so the servers send data out to us..which comes IN to that port on the router and then gets forwarded to some other port that ends up coming out to us.

Reversing that path.. data that needs to be sent to the servers, comes from outside, goes through the router, and goes OUT of the port and goes to the servers.

Also don't forget that the values listed are megabits per second, not bytes. If you average 300Mbit/sec for a full 30-days, that comes out to 88.4 terabytes. ((300000000/8)*60*60*24*30/1024/1024/1024/1024) = 88.4.

TL;DR: "in" and "out" are backwards.. green is us downloading, blue is us uploading...or tapes being sent to the servers.




Other than that, those AP splitters really tear through fresh tapes in a hurry. A couple new tapes got loaded and AP is already done with them in seemingly no time at all. I didn't actually see splitting happen, I just see there are some new tapes in the list, and AP is already marked as (done).
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1622668 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1622677 - Posted: 3 Jan 2015, 8:37:10 UTC - in response to Message 1622668.  
Last modified: 3 Jan 2015, 8:46:07 UTC

...Other than that, those AP splitters really tear through fresh tapes in a hurry. A couple new tapes got loaded and AP is already done with them in seemingly no time at all. I didn't actually see splitting happen, I just see there are some new tapes in the list, and AP is already marked as (done).

I'm afraid those "fresh" AP tapes you are referring to were aborted. Just as those ~700 previously loaded AP channels were aborted. Normally, since the DB Explosion, it takes about a day to split one AP file. Not a few minutes. It appears as though the APs have been shutdown again, now we get rehashed MB files only. Hopefully someone is working to improve the AP 'system' while it is shutdown, I'd hate to think it was just sitting there.

I'm seriously considering shutting down when my remaining APs are gone. I can't see rehashing old files with the same technology, I have limited resources that can be put to better use. Hopefully the AP side will be resurrected soon so we can analyze the years of older files with the AP Apps for the first time.
ID: 1622677 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1622764 - Posted: 3 Jan 2015, 11:59:47 UTC

This was posted by Mark Sattler on the Bionc boards, he has apparently e-mailed all at the lab.

Got word back from Matt that he believes these are datasets which have already had the AP work split and sent, but not the MB work.
So the splitters run through them quickly and send no AP outbound because it has already been done.

ID: 1622764 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1622767 - Posted: 3 Jan 2015, 12:30:30 UTC

I just wonder why nobody has bothered leaving some kind of info for us as to what and why the servers keep screwing up

i mean a simple hey guys we know whats going on and are trying to fix it and maybe a quick explanation would suffice but they been quiet for some time now which is not always a good thing
I came down with a bad case of i don't give a crap
ID: 1622767 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1622770 - Posted: 3 Jan 2015, 12:40:46 UTC - in response to Message 1622764.  

This was posted by Mark Sattler on the Bionc boards, he has apparently e-mailed all at the lab.

Got word back from Matt that he believes these are datasets which have already had the AP work split and sent, but not the MB work.
So the splitters run through them quickly and send no AP outbound because it has already been done.

As a laboratory plan, that makes perfect sense.

Unfortunately, Matt's recollection may not be perfect, as according to my records, all of the tapes (20) in the last two batches deployed have been processed for MB already, between 25 April 2012 and 04 May 2013 (hence all before sah v7, so at least the autocorrs are mew). Would one of our AP specialists like to verify whether or not they have processed those tapes in the past, before we waste any more time redoing old work? I'll email my list to Matt.
ID: 1622770 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1622771 - Posted: 3 Jan 2015, 12:52:13 UTC - in response to Message 1622440.  

DG, I would estimate that I am seeing credits at about 65-75% of pre-V7 numbers, your observations seem in line with that.
Member of the 20 Year Club



ID: 1622771 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1622786 - Posted: 3 Jan 2015, 13:52:39 UTC - in response to Message 1622771.  

DG, I would estimate that I am seeing credits at about 65-75% of pre-V7 numbers, your observations seem in line with that.


Hopefully that means the analysis of AP7 jobs is being done more efficiently than AP6.

:D G

"Sour Grapes make a bitter Whine." <(0)>
ID: 1622786 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1622799 - Posted: 3 Jan 2015, 15:18:24 UTC - in response to Message 1622786.  

DG, I would estimate that I am seeing credits at about 65-75% of pre-V7 numbers, your observations seem in line with that.


Hopefully that means the analysis of AP7 jobs is being done more efficiently than AP6.

:D G

Given that we are not longer spending lots of CPU cycles performing blanking. I would imagine the efficiency has gone up.
The credit for non-blanked AP tasks still doesn't match that of APv6, but I didn't have enough in the last batch of tapes to be able to tell if it is still rising like it did at Beta.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1622799 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1622814 - Posted: 3 Jan 2015, 16:25:02 UTC

If i read the last posts correctly, we are now processing tapes already done by MB6 only now for MB7 and no APs are sent out, because they have been already processed?
Well, energy is precious and expensive... :/
Aloha, Uli

ID: 1622814 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1622824 - Posted: 3 Jan 2015, 16:58:03 UTC - in response to Message 1622814.  

If i read the last posts correctly, we are now processing tapes already done by MB6 only now for MB7 and no APs are sent out, because they have been already processed?

Well, no AP cruncher has yet verified from their logs that the tapes were split for AP, so the last bit is still speculation.
ID: 1622824 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1622834 - Posted: 3 Jan 2015, 18:04:42 UTC - in response to Message 1622770.  

This was posted by Mark Sattler on the Bionc boards, he has apparently e-mailed all at the lab.

Got word back from Matt that he believes these are datasets which have already had the AP work split and sent, but not the MB work.
So the splitters run through them quickly and send no AP outbound because it has already been done.

As a laboratory plan, that makes perfect sense.

Unfortunately, Matt's recollection may not be perfect, as according to my records, all of the tapes (20) in the last two batches deployed have been processed for MB already, between 25 April 2012 and 04 May 2013 (hence all before sah v7, so at least the autocorrs are mew). Would one of our AP specialists like to verify whether or not they have processed those tapes in the past, before we waste any more time redoing old work? I'll email my list to Matt.

My personal feeling is that any data which involved blanking by AP v6 ought to be reprocessed by AP v7. The v7 processing is better, not just faster.

OTOH, splitting data which has never had Astropulse analysis done would be even better.
                                                                  Joe
ID: 1622834 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1622849 - Posted: 3 Jan 2015, 18:37:34 UTC - in response to Message 1622835.  

Unix Time included:

1371142991 ue 5891.074747 ct 2.277615 fe 7284208457849 nm ap_03ja12ai_B3_P1_00201_20130611_06225.wu_1 et 5.403999

1371049598 ue 5891.074747 ct 451.638500 fe 7284208457849 nm ap_24no12aa_B0_P0_00265_20130609_11038.wu_1 et 3062.580000

1366202287 ue 5819.592882 ct 2.293215 fe 7284208457849 nm ap_27fe13aa_B3_P1_00318_20130416_20549.wu_0 et 4.844000

1364762446 ue 756925.444735 ct 72.353260 fe 1821052114462310 nm ap_30jn12ad_B3_P1_00178_20130331_23423.wu_0 et 73.268999

13 Jun 2013, 12 Jun 2013, 17 Apr 2013, 31 Mar 2013 respectively. I saw MB on
21 Apr 2012, 15 Dec 2012, 17 Apr 2013, 04 Apr 2013

I saw 03ja12ai and 24no12aa again on 11 Jun 2013, so they may have been round the block three times in all...
ID: 1622849 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1622850 - Posted: 3 Jan 2015, 18:37:42 UTC - in response to Message 1622834.  

This was posted by Mark Sattler on the Bionc boards, he has apparently e-mailed all at the lab.

Got word back from Matt that he believes these are datasets which have already had the AP work split and sent, but not the MB work.
So the splitters run through them quickly and send no AP outbound because it has already been done.

As a laboratory plan, that makes perfect sense.

Unfortunately, Matt's recollection may not be perfect, as according to my records, all of the tapes (20) in the last two batches deployed have been processed for MB already, between 25 April 2012 and 04 May 2013 (hence all before sah v7, so at least the autocorrs are mew). Would one of our AP specialists like to verify whether or not they have processed those tapes in the past, before we waste any more time redoing old work? I'll email my list to Matt.

My personal feeling is that any data which involved blanking by AP v6 ought to be reprocessed by AP v7. The v7 processing is better, not just faster.

OTOH, splitting data which has never had Astropulse analysis done would be even better.
                                                                  Joe


I would reprocess only tasks with considerable blanking involved perhaps, to remove all those false positives from Berkeley's random noise generator :)

Also, in MB6 => MB7 reprocessing would be good to insist on task modification to enable only those searches that really need to be redone. Any changes in Gaussians or Spikes or Triplets for example...

I don't like the idea to redo ALL those searches just to get Autocorrelation too. Seems much bigger waste than to use unoptimized apps so can't tolerate such, LoL.
ID: 1622850 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1622854 - Posted: 3 Jan 2015, 18:45:00 UTC
Last modified: 3 Jan 2015, 19:01:31 UTC

I also have a few in one of my logs;
1353045081 ue 3032.998698 ct 296.531300 fe 90255764819206 nm ap_20au12af_B0_P0_00282_20121115_17547.wu_0 et 2369.125000
1369068852 ue 2357.488962 ct 163.640600 fe 8010747491955 nm ap_26fe13ae_B1_P0_00267_20130424_29756.wu_4 et 2211.312500
1366254837 ue 2303.864894 ct 241.031300 fe 1520550829888130 nm ap_27fe13aa_B0_P0_00202_20130415_31213.wu_0 et 2261.406250
1366472822 ue 2309.197982 ct 259.453100 fe 1524070667998063 nm ap_29ja13aa_B0_P0_00132_20130418_32436.wu_0 et 2242.140625
1364785407 ue 2341.336608 ct 1.437500 fe 1545282161133730 nm ap_30jn12ad_B3_P1_00007_20130331_23423.wu_0 et 3.437500

I also have a number of MBs from those files in my log.

If it was just a matter of loading a file that hadn't been run for APs, there were about 700 channels loaded just the other day that were removed before being split. One of them could be reloaded. So, there are AP files readily available if needed.
ID: 1622854 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1622867 - Posted: 3 Jan 2015, 19:23:11 UTC

Here's my run-through of my job_log:

First one's unix stamp is Aug 08, 2012:
1344411405.758627 ue 40443.513279 ct 44801.180000 fe 100366914056685.200000 nm ap_06my12ab_B6_P1_00397_20120728_06219.wu_0
1348411651.444180 ue 41462.286580 ct 41549.820000 fe 99312357715860.406000 nm ap_28jn12ab_B1_P1_00327_20120913_22283.wu_1
1353416891 ue 41154.808134 ct 44612.900000 fe 100748165223275 nm ap_27au12ab_B4_P1_00033_20121109_21887.wu_2 et 44755.875742
1354547709 ue 43096.733593 ct 41306.480000 fe 103747720196172 nm ap_27au12ab_B3_P0_00383_20121109_08471.wu_3 et 41408.402161
1354823446 ue 42485.311151 ct 36223.400000 fe 103782242333548 nm ap_01se12ab_B2_P1_00034_20121106_10784.wu_3 et 36360.880091
1365786085 ue 42041.303007 ct 43234.320000 fe 101221168606141 nm ap_30jn12ad_B0_P0_00323_20130331_14381.wu_2 et 43327.372242
1367166402 ue 41867.606117 ct 44128.500000 fe 102913816340793 nm ap_29ja13aa_B4_P0_00011_20130418_18429.wu_0 et 44210.891695
1367705446 ue 42201.410116 ct 44351.470000 fe 103734332406452 nm ap_26fe13ae_B6_P0_00170_20130424_29841.wu_0 et 44437.584740
1371698309 ue 43973.768278 ct 35040.840000 fe 107073863886764 nm ap_24no12aa_B3_P0_00177_20130610_00904.wu_0 et 35083.251799
1371782925 ue 43491.213005 ct 37562.650000 fe 105898866616610 nm ap_03ja12ai_B5_P0_00025_20130611_18331.wu_0 et 37606.444761

Last one's timestamp is June 21, 2013.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1622867 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (94) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.