RED ALERT !!!! SHIELDS UP

Author	Message
Blurf Volunteer tester Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0	Message 695217 - Posted: 28 Dec 2007, 0:25:34 UTC Last modified: 28 Dec 2007, 0:25:51 UTC My pendings are up to 2800...anyone else's high....??? ID: 695217 ·

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7	Message 695219 - Posted: 28 Dec 2007, 0:28:41 UTC - in response to Message 695217. My pendings are up to 2800...anyone else's high....??? Mine usually hovers between 2200 & 3100, but at the moment it is fairly low at 1900. ID: 695219 ·

Dissident Send message Joined: 20 May 99 Posts: 132 Credit: 70,320 RAC: 0	Message 695224 - Posted: 28 Dec 2007, 0:39:57 UTC - in response to Message 695217. Last modified: 28 Dec 2007, 0:41:03 UTC My pendings are up to 2800...anyone else's high....??? Yep, sitting at 4022...Been in that area for a while now. ID: 695224 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 695231 - Posted: 28 Dec 2007, 1:14:17 UTC - in response to Message 695171. I had grabbed a few results the other day, and they were NOT of the short-running variety based on their AR. I noticed a couple of results that the wingman had already reported that were "noisy", so I decided to start up processing on those two and go ahead and get them back in... Well, there's a bit of a problem... LOL The two hosts that had concluded that they were too noisy were running the stock application. My host with the optimized application typically runs into the '-9' overflow faster than the stock application does. Not this time. In fact, one of the results is still running at the 9 minute mark... So, one has to ask, is there a bug in the stock application? Yes, but without links to the WUs I don't know if they are instances of the known but undiagnosed bug that causes spurious overflows on Pulses occasionally. There was also one case of that on SETI Beta from a host running lunatics 2.2B months ago, so I have to say it is probably present in 2.4[V] as well, just less likely than stock. The other possibility is -9 overflow of the marginal kind where the whole WU might only have 35 signals so the time is not much reduced from a full run. The "quit on the 31st signal" setting makes that kind fairly rare, but not exceptionally so. Joe ID: 695231 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695233 - Posted: 28 Dec 2007, 1:19:34 UTC - in response to Message 695231. Last modified: 28 Dec 2007, 1:31:35 UTC I had grabbed a few results the other day, and they were NOT of the short-running variety based on their AR. I noticed a couple of results that the wingman had already reported that were "noisy", so I decided to start up processing on those two and go ahead and get them back in... Well, there's a bit of a problem... LOL The two hosts that had concluded that they were too noisy were running the stock application. My host with the optimized application typically runs into the '-9' overflow faster than the stock application does. Not this time. In fact, one of the results is still running at the 9 minute mark... So, one has to ask, is there a bug in the stock application? Yes, but without links to the WUs I don't know if they are instances of the known but undiagnosed bug that causes spurious overflows on Pulses occasionally. There was also one case of that on SETI Beta from a host running lunatics 2.2B months ago, so I have to say it is probably present in 2.4[V] as well, just less likely than stock. The other possibility is -9 overflow of the marginal kind where the whole WU might only have 35 signals so the time is not much reduced from a full run. The "quit on the 31st signal" setting makes that kind fairly rare, but not exceptionally so. Joe I'm tryin' Joe... just that everything is so slow... From what I recall, they were both pulses. Even if that is known, my point is, it is adding insult to injury. I'll edit this post with links to the workunits, once I can finally get there... WU 194971411 their resultid WU 194971291 their resultid Both of those were from different hosts. I have WU 194971411 completed and uploaded, but reporting is no-go. I didn't stop network transfers, so the result already uploaded. I could probably dredge through Norton Protected files and restore it, but I'll just wait for the reporting to work... ID: 695233 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 695253 - Posted: 28 Dec 2007, 2:17:41 UTC - in response to Message 695233. ... Even if that is known, my point is, it is adding insult to injury. I'll edit this post with links to the workunits, once I can finally get there... WU 194971411 their resultid WU 194971291 their resultid Both of those were from different hosts. I have WU 194971411 completed and uploaded, but reporting is no-go. I didn't stop network transfers, so the result already uploaded. I could probably dredge through Norton Protected files and restore it, but I'll just wait for the reporting to work... Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info. I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations. Joe ID: 695253 ·

Dissident Send message Joined: 20 May 99 Posts: 132 Credit: 70,320 RAC: 0	Message 695269 - Posted: 28 Dec 2007, 3:30:24 UTC My well is running dry as I have about 2 hours of work left for my 4400+ and then stillness. My laptop has more thankfully. Why are we so short of work? (a rhetorical question, I know...) ID: 695269 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695270 - Posted: 28 Dec 2007, 3:34:16 UTC - in response to Message 695253. Last modified: 28 Dec 2007, 3:45:52 UTC Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info. I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations. Joe Well, the other completed as well... Here are the stderr.txt files from both (reporting is still no-go): Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows SSE2 32-bit based on seti V5.15 'Ni!' Rev: (R-2.4\|xW\|FFT:IPP_SSE2\|Ben-Joe) CPUID: 'AMD K8 Athlon 64 (San Diego)' cpus: 1 cores: 1 threads: 1 cache: L1=64K L2=1024K L3=0K features: mmx 3Dnow 3Dnow+ sse sse2 sse3 speed: 2750 MHz -- read megs/sec: L1=16615, L2=7585, RAM=3830 Work Unit Info True angle range: 0.407464 Spikes Pulses Triplets Gaussians Flops 1 0 0 0 16435446539648 Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows SSE2 32-bit based on seti V5.15 'Ni!' Rev: (R-2.4\|xW\|FFT:IPP_SSE2\|Ben-Joe) CPUID: 'AMD K8 Athlon 64 (San Diego)' cpus: 1 cores: 1 threads: 1 cache: L1=64K L2=1024K L3=0K features: mmx 3Dnow 3Dnow+ sse sse2 sse3 speed: 2749 MHz -- read megs/sec: L1=16622, L2=7603, RAM=3822 Work Unit Info True angle range: 0.407464 Spikes Pulses Triplets Gaussians Flops 0 1 6 1 16436374344601 So, one of the tasks didn't register any pulses at all. My guess is that my results will be correct (deemed "strongly similar") at some point down the road. Another aspect I just thought of is that with reporting being nigh impossible, downloading should also be having problems. Additionally, slower computers with close deadlines may be having their deadlines expire and thus tasks are being reissued, which has no immediate major db impact, but enormous download consequences and result storage consequences...not to mention the dynamics of whether the original host is able to report back in while the new host hasn't started and then whether or not the new host supports 221 aborts AND manages to contact the scheduler before they start working on it... BLAH! Unless the project stops splitting these fast-running results, I don't know when we'll ever get back to some semblance of normalcy... Maybe one of you with a better relationship with Matt could suggest some sort of plan? From where I sit, it seems like he's content to just wait it out... ID: 695270 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695278 - Posted: 28 Dec 2007, 4:18:06 UTC The two results I mentioned have now been reported. Both were reissued to new hosts, one of which runs the stock application and the other is running 2.2B. It will be interesting to note if the new hosts choke on the results or not... ID: 695278 ·

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7	Message 695282 - Posted: 28 Dec 2007, 4:25:01 UTC - in response to Message 695269. My well is running dry as I have about 2 hours of work left for my 4400+ and then stillness. My laptop has more thankfully. Why are we so short of work? (a rhetorical question, I know...) That's what I can't understand - My 4200+ d/l'ed over 60 wu's & 3800+ d/l'ed over 40 wu's last night. Not sure what my 3700+ d/l'ed as I did not check - Yet other crunchers are not getting sufficient work. Yet we're all in the same boat with regards to the servers. Some gremlins in the system? ID: 695282 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 695301 - Posted: 28 Dec 2007, 5:11:54 UTC Man....this is going from bad to worse.... The current dataset is not only mostly shorties AGAIN, but out of the last 24 WUs I was actually able to snag on Harv2K-C2QUAD, 14 of them ran in 15 seconds or less. That is the only rig I am not running a cache on, so it has been doing a lot of Einstein and Rosetta the last couple of days. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 695301 ·

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7	Message 695304 - Posted: 28 Dec 2007, 5:19:18 UTC - in response to Message 695301. 3700+ currently d/l'ing 12 wu's. That's 3 rigs with a total of 120 wu's yet others are getting none - somethings not right!!! ID: 695304 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695314 - Posted: 28 Dec 2007, 6:33:50 UTC - in response to Message 695304. Last modified: 28 Dec 2007, 6:40:33 UTC 3700+ currently d/l'ing 12 wu's. That's 3 rigs with a total of 120 wu's yet others are getting none - somethings not right!!! Yeah, well I just encountered "the last straw" for me... My Intel host, which you can look at here has had 3 download errors tonight. So, fine... In the morning that system gets set to not get new work. It'll go through what it has, then it is being attached to Cosmology at an 80% resource share, leaving only 20% here. When / if someone decides to do something about the issue, I'll evaluate the situation and see if I put any allocation back to here or not... I wanted to reach 500K here, but if it takes me a year to do it, so be it... Edit: Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working! ID: 695314 ·

Odysseus Volunteer tester Send message Joined: 26 Jul 99 Posts: 1808 Credit: 6,701,347 RAC: 6	Message 695315 - Posted: 28 Dec 2007, 6:54:44 UTC - in response to Message 695314. Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working! Nope; itÃ¢â‚¬â„¢s a general preference. The best you can do is set up a venue with a different resource share and assign the host to it. Assuming, that is, that you have a venue to spareÃ¢â‚¬â€and that the venue preferences are actually working Ã¢â‚¬Â¦ ID: 695315 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695316 - Posted: 28 Dec 2007, 7:10:07 UTC - in response to Message 695315. Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working! Nope; itÃ¢â‚¬â„¢s a general preference. The best you can do is set up a venue with a different resource share and assign the host to it. Assuming, that is, that you have a venue to spareÃ¢â‚¬â€and that the venue preferences are actually working Ã¢â‚¬Â¦ I was on my way here to edit my message, but you beat me to it. Yeah, I have venues to spare, but I recall that sometimes the venues didn't work and/or they'd get reset. Anyway, someone here needs to get the painful data out of the splitter queue. The way I interpret this event is influenced by the last event, when it was basically said "well, we're going to have to split this data sooner or later, so might as well get it done now". That's all well and good and stuff, but not for 4+ days. I'm sure it's comforting to know that the systems are holding up, but that's probably only because there's a "restrictor plate" on our side (the network bottleneck) so the systems aren't being hit as hard as they would otherwise... Also, who knows what the index rebuild is going to take now, with all the extra data? ID: 695316 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 695344 - Posted: 28 Dec 2007, 12:27:59 UTC - in response to Message 695172. Last modified: 28 Dec 2007, 12:34:25 UTC Still no mention of the source / cause of all the shorty work - just its effects. Isn't the cause well known? Isn't it simply the nature of what was recorded at the telescope?? Only if you can absolutely, categorically and 100% for all time rule out the possibility of there being any sort of a bug in the splitter code (or, indeed, any part of the pre-processing that happens prior to the issue of the task to your computer: Is the aiming point of the receiver being accurately recorded in the data stream? How much difference does the radar blanking timing signal make? Were these signals recorded with radar blanking active, or are we still pre-filtering recordings with radar pulses in, before they're issued? etc. etc.) ID: 695344 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 695346 - Posted: 28 Dec 2007, 12:56:33 UTC - in response to Message 695253. Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info. I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations. Joe Yes, it's still recording - Beta 2882015 looks like another candidate, but I won't be able to send you the file(s) until I get home. I do find it a bit odd that Matt has suddenly started entering all sorts of constructive technical dialogue with users about fiber channels, switch glitches etc. - all good stuff - but has made absolutely no comment on my queries about causes. Maybe he sees himself (and please don't think I'm being disparaging here) as an earthbound 'plumber', focussing on the mechanics of the server/network end of the project: is there a radio astronomer in the house? ID: 695346 ·

makosky Volunteer tester Send message Joined: 7 Jul 00 Posts: 56 Credit: 3,908,782 RAC: 0	Message 695351 - Posted: 28 Dec 2007, 13:33:46 UTC - in response to Message 694830. A LOT OF SERVERS NOT FUNCTIONING WITH NO NEW TECH NEWS WERE GOING DOWN CAPTAIN feeder.i686 bruno Not Running feeder.i686 ptolemy Not Running file_deleter1 bruno Not Running file_deleter2 bruno Not Running file_deleter3 bruno Not Running file_deleter4 bruno Not Running file_deleter5 bruno Not Running file_deleter6 bruno Not Running db_purge.x86_64 thumper Not Running transitioner1 vader Not Running transitioner2 vader Not Running transitioner3 bruno Not Running transitioner4 bruno Not Running vote_monitor bruno Not Running transitioner5 vader Not Running try and get tech news .. all i get is an error message , grrr transitioner6 vader Not Running sah_validate1 ptolemy Not Running sah_validate2 ptolemy Not Running sah_validate3 ptolemy Not Running sah_validate4 ptolemy Not Running sah_validate5 ptolemy Not Running sah_validate6 ptolemy Not Running fix_missing_results ptolemy Disabled sah_assimilator1 ptolemy Not Running sah_assimilator2 ptolemy Not Running sah_assimilator3 ptolemy Not Running sah_assimilator4 ptolemy Not Running mb_splitter7 lando Disabled mb_splitter8 lando Disabled mb_splitter9 bambi Disabled mb_splitter10 bambi Disabled mb_splitter11 bambi Disabled mb_splitter12 bambi Disabled mb_splitter13 bambi Disabled mb_splitter14 bambi Disabled HEHEHEHHE ID: 695351 ·

Aurora Borealis Volunteer tester Send message Joined: 14 Jan 01 Posts: 3075 Credit: 5,631,463 RAC: 0	Message 695359 - Posted: 28 Dec 2007, 16:02:06 UTC - in response to Message 695346. Last modified: 28 Dec 2007, 16:15:32 UTC I do find it a bit odd that Matt has suddenly started entering all sorts of constructive technical dialogue with users about fiber channels, switch glitches etc. - all good stuff - but has made absolutely no comment on my queries about causes. Maybe he sees himself (and please don't think I'm being disparaging here) as an earthbound 'plumber', focussing on the mechanics of the server/network end of the project: is there a radio astronomer in the house? Not odd at all actually, your analogy is not far off. As I understand it, Matt's principle role at the project is to keep the hardware/software and database working and fine tuned. His job is to keep the pipes unclogged and keep the work flowing in and out. No small feat considering the amount of data that is dealt with. Eric is in charge of most of the science related aspect of the project. This would be the science apps and the splitter software as well as the equipment/software at Arecibo. He also usually take care of the boards. I believe he deals with most of the funding aspects of the project. Boinc V7.2.42 Win7 i5 3.33G 4GB, GTX470 ID: 695359 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 695367 - Posted: 28 Dec 2007, 16:53:21 UTC - in response to Message 695359. Last modified: 28 Dec 2007, 16:55:18 UTC His job is to keep the pipes unclogged and keep the work flowing in and out. No small feat considering the amount of data that is dealt with. Well, as indicated earlier in this thread, I connected to my Intel computer with VNC and set it to no new tasks. This will alleviate a miniscule percentage of the bandwidth clogging the system. ID: 695367 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.