Astropulse Beam 3 Polarity 1 Errors |
![]() |
| log in |
Message boards : Number crunching : Astropulse Beam 3 Polarity 1 Errors
1 · 2 · Next
| Author | Message |
|---|---|
|
Richard Haselgrove noticed that the "generate_envelope: num_ffts_performed < 100." error noted earlier in this thread seems to be happening only on one channel of the data: B3_P1. That reminded me of the Curiously Compressible Data seen at Beta last year. In fact it seems to be the same problem recurring a year later, one bit of the two in each sample is stuck. That triggers the noise detection code, all data is blanked, and there's nothing to generate the envelope from. I've sent an email to Josh and Eric, though there's probably not much they can do. The other 13 channels aren't affected, so skipping the 'tapes' completely would be overreacting. Joe | |
| ID: 890237 · | |
|
I've checked my captured files, and yes, they are super-compressible. | |
| ID: 890246 · | |
I've checked my captured files, and yes, they are super-compressible. The initial cases on Beta were seen in 27fe08ac data, and Josh's spot check indicated it lasted at least until March 5 2008, so far the durations seem about the same. We'll get a better look at how long it lasts this time, since there's data with 20, 21, and 22mr tags. Then it skips to 30mr, Arecibo was doing planetary radar in between. I did get a second one (from 14mr09aa), but I don't process nearly as much data as you so probably won't see any more for this episode. Joe | |
| ID: 890313 · | |
|
Three people have crashed Workunit 438261330 So far. | |
| ID: 890392 · | |
Three people have crashed Workunit 438261330 So far. As Josef has identified, it's a stuck bit in the B3_P1 data channel - so a data problem, NOT an application problem. Quite where the bit is stuck awaits further investigation. Joe thinks it's seasonal (weather affecting the recorder at the observing site?), which would mess up one channel of the MB observations too (same data, different splitter). On the other hand, it might be an AP splitter issue - though I don't see why that would only affect one channel - but if so, MB should be undamaged. | |
| ID: 890488 · | |
The seasonality could as well be an annual cleanup campaign at Arecibo because someone from the NSF does an inspection each year. But I don't know what the air conditioning arrangements are where the multibeam recorder is either. The one picture I've seen is just a rack sitting against a wall. I'd give very high odds that the problem is in the recorder rack, most likely the cabling from the output of the quadrature detector for that channel. It might be on the detector card itself or the digital input card in the computer. Before the detector there's just a single channel of analog IF, after the digital input there's the usual computer data handling and any stuck bit problems there would be affecting far more than we're seeing. But the recorder was designed and put together by Berkeley SSL, they know the details better than I. When an mb_splitter does some of that data, the obvious pattern will be lost in the FFTs to produce the 256 subbands. I think the effect will simply be a bunch of quiet WUs with less noise than usual. WU names will be like NNmr09ax.xxxx.xxxxx.10.x.xxx since beam 3 polarity 1 corresponds to s4_id 10. Joe | |
| ID: 890629 · | |
|
http://setiathome.berkeley.edu/workunit.php?wuid=437881143 | |
| ID: 893302 · | |
http://setiathome.berkeley.edu/workunit.php?wuid=437881143 It's one of a whole block of demon WUs, which we identified over a week ago - see message 890237 further down this thread. It looks as if every B3_P1 wu between 12mr09 and 07ap09 suffers from the same problem. And the AP assimilator has been down for over 24 hours, which has backed up the work generation process (filled the available storage space, I guess) - no new AP work has been available for download since about 23:00 UTC last night. So I would expect that almost all of any AP work that people receive this morning will be resends of these demon WUs: you might consider opting out of AP temporarily if you have a bandwidth cap or other limited internet connection, because an 8MB download every 9 seconds isn't really worth it. | |
| ID: 893304 · | |
|
Yeah, I've gone through about 10 of those across my 4 AP crunchers so far. Looks like I have 2-3 more to go. | |
| ID: 893557 · | |
|
Well that was fun. I just burned through about 15 of those B3_P1 WUs on two systems. Was doing some "housekeeping" with my Excel spreadsheet of all the r112 tasks I've done, and noticed a few B3_P1's, but noticed that I was _4 or _5 on them, so I suspended all the other tasks and only left all of my B3_P1's to run. After 3 seconds, they errored out and I requested 600,000 seconds of work, got four more APs, 3 of which were B3_P1's, so I repeated the process until I got rid of them all. | |
| ID: 894493 · | |
Thank you for helping clear these B3_P1 tasksOn the bright side it looks as if all of your tasks are resends because looking at the Server Status all of the tapes with this erroring data shows (Done) on your host 2889590. Is there a possibility of getting more data like this in the near future? Thanks in advance. ____________ Live in NZ y not join Smile City? | |
| ID: 894505 · | |
I think Matt said that we are done splitting the bad tapes, but now have thousands of 8MB WUs that will just have to run though the system and get 6 errors. Other than that, I don't have a clue when the next tape with this "stuck bit" will surface again, but as Ozz pointed out, this was noticed over in beta this time last year, so it's possible that it may be a yearly thing, or just a total coincidence. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 894510 · | |
Thanks for info. Is there any reason why tapes with erroring data on can formatted & sent straight back to get fresh data on them, or is this data bundled with good data? Does anyone know what tapes this data is on? So we can watch the tapes move through the system. Thanks in advance. P.S If I receive any of these units I'll let them run as soon as I notice them in my job queue. ____________ Live in NZ y not join Smile City? | |
| ID: 894513 · | |
I believe we have narrowed it down to somewhere in the 20-28feb09 tapes, and it's only the B3_P1 channel of those tapes, but, I have one that is 22feb09ac and B3_P1, and is fine. It has not been determined if it is in the data recorder or in the splitters, but seems more likely to be the data recorder. As far as getting new data on those tapes, the hard drives get sent in batches and then filled up. The drives are either 500gb or 750gb, and get "split" up into ~50gb files (it's a more manageable size for both in-house and off-site data storage). ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 894523 · | |
|
[quote] | |
| ID: 894527 · | |
Actually, the current best estimate for the problematic B3_P1 recording range is 12mr09 through 07ap09 - Josef Segur has been keeping track of them at Lunatics. If anyone has any hard evidence for an example in February, I'm sure Joe would be most interested - but we need a link to a result, please. | |
| ID: 894530 · | |
|
I suppose I mis-typed. I think I meant 22mr09ac instead of 22fe09ac. I might have missed making the PDF of the task ID for that particular task though. | |
| ID: 894883 · | |
|
Well, I thought I had found this one http://setiathome.berkeley.edu/workunit.php?wuid=443174608 dated 22fe09 so I started it and let it run about 10 minutes but it didn't fail. I then looked a little closer at the other wingmen that had errored out and saw all four of them had different problems. If it messes up on me when I get back to it I will report it here then. | |
| ID: 894940 · | |
|
Ok, running it now. 23.7% in and no problems. Another wingman has already completed it successfully and is waiting on me. | |
| ID: 895944 · | |
Ok, running it now. 23.7% in and no problems. Another wingman has already completed it successfully and is waiting on me. Yeah, that task dated 22fe09 was a full month before the problem showed up. That one should be fine. I figured out why I thought I had an extra one of those B3_P1 tasks. I had one that was B3_P1 and 22fe09ac and looked at the date on it too quickly. Somewhere in my mind that was the same as 22mr09ac. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 896021 · | |
Message boards : Number crunching : Astropulse Beam 3 Polarity 1 Errors
| Copyright © 2013 University of California |