RED ALERT !!!! SHIELDS UP

Message boards : Number crunching : RED ALERT !!!! SHIELDS UP
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 695217 - Posted: 28 Dec 2007, 0:25:34 UTC
Last modified: 28 Dec 2007, 0:25:51 UTC

My pendings are up to 2800...anyone else's high....???


ID: 695217 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 695219 - Posted: 28 Dec 2007, 0:28:41 UTC - in response to Message 695217.  

My pendings are up to 2800...anyone else's high....???


Mine usually hovers between 2200 & 3100, but at the moment it is fairly low at 1900.
ID: 695219 · Report as offensive
Dissident
Avatar

Send message
Joined: 20 May 99
Posts: 132
Credit: 70,320
RAC: 0
Canada
Message 695224 - Posted: 28 Dec 2007, 0:39:57 UTC - in response to Message 695217.  
Last modified: 28 Dec 2007, 0:41:03 UTC

My pendings are up to 2800...anyone else's high....???


Yep, sitting at 4022...Been in that area for a while now.
ID: 695224 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 695231 - Posted: 28 Dec 2007, 1:14:17 UTC - in response to Message 695171.  

I had grabbed a few results the other day, and they were NOT of the short-running variety based on their AR. I noticed a couple of results that the wingman had already reported that were "noisy", so I decided to start up processing on those two and go ahead and get them back in...

Well, there's a bit of a problem... LOL

The two hosts that had concluded that they were too noisy were running the stock application. My host with the optimized application typically runs into the '-9' overflow faster than the stock application does. Not this time. In fact, one of the results is still running at the 9 minute mark...

So, one has to ask, is there a bug in the stock application?

Yes, but without links to the WUs I don't know if they are instances of the known but undiagnosed bug that causes spurious overflows on Pulses occasionally. There was also one case of that on SETI Beta from a host running lunatics 2.2B months ago, so I have to say it is probably present in 2.4[V] as well, just less likely than stock.

The other possibility is -9 overflow of the marginal kind where the whole WU might only have 35 signals so the time is not much reduced from a full run. The "quit on the 31st signal" setting makes that kind fairly rare, but not exceptionally so.
                                                                 Joe
ID: 695231 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695233 - Posted: 28 Dec 2007, 1:19:34 UTC - in response to Message 695231.  
Last modified: 28 Dec 2007, 1:31:35 UTC

I had grabbed a few results the other day, and they were NOT of the short-running variety based on their AR. I noticed a couple of results that the wingman had already reported that were "noisy", so I decided to start up processing on those two and go ahead and get them back in...

Well, there's a bit of a problem... LOL

The two hosts that had concluded that they were too noisy were running the stock application. My host with the optimized application typically runs into the '-9' overflow faster than the stock application does. Not this time. In fact, one of the results is still running at the 9 minute mark...

So, one has to ask, is there a bug in the stock application?

Yes, but without links to the WUs I don't know if they are instances of the known but undiagnosed bug that causes spurious overflows on Pulses occasionally. There was also one case of that on SETI Beta from a host running lunatics 2.2B months ago, so I have to say it is probably present in 2.4[V] as well, just less likely than stock.

The other possibility is -9 overflow of the marginal kind where the whole WU might only have 35 signals so the time is not much reduced from a full run. The "quit on the 31st signal" setting makes that kind fairly rare, but not exceptionally so.
                                                                 Joe


I'm tryin' Joe... just that everything is so slow... From what I recall, they were both pulses.

Even if that is known, my point is, it is adding insult to injury. I'll edit this post with links to the workunits, once I can finally get there...

WU 194971411
their resultid

WU 194971291
their resultid

Both of those were from different hosts. I have WU 194971411 completed and uploaded, but reporting is no-go. I didn't stop network transfers, so the result already uploaded. I could probably dredge through Norton Protected files and restore it, but I'll just wait for the reporting to work...
ID: 695233 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 695253 - Posted: 28 Dec 2007, 2:17:41 UTC - in response to Message 695233.  

...
Even if that is known, my point is, it is adding insult to injury. I'll edit this post with links to the workunits, once I can finally get there...

WU 194971411
their resultid

WU 194971291
their resultid

Both of those were from different hosts. I have WU 194971411 completed and uploaded, but reporting is no-go. I didn't stop network transfers, so the result already uploaded. I could probably dredge through Norton Protected files and restore it, but I'll just wait for the reporting to work...

Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info.

I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations.
                                                                 Joe
ID: 695253 · Report as offensive
Dissident
Avatar

Send message
Joined: 20 May 99
Posts: 132
Credit: 70,320
RAC: 0
Canada
Message 695269 - Posted: 28 Dec 2007, 3:30:24 UTC

My well is running dry as I have about 2 hours of work left for my 4400+ and then stillness. My laptop has more thankfully.

Why are we so short of work? (a rhetorical question, I know...)
ID: 695269 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695270 - Posted: 28 Dec 2007, 3:34:16 UTC - in response to Message 695253.  
Last modified: 28 Dec 2007, 3:45:52 UTC


Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info.

I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations.
                                                                 Joe


Well, the other completed as well... Here are the stderr.txt files from both (reporting is still no-go):

Optimized SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Ni!'
      Rev: (R-2.4|xW|FFT:IPP_SSE2|Ben-Joe)
    CPUID: 'AMD K8 Athlon 64 (San Diego)'
     cpus: 1 cores: 1 threads: 1   cache: L1=64K  L2=1024K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3  
    speed: 2750 MHz  -- read megs/sec: L1=16615, L2=7585, RAM=3830

Work Unit Info
True angle range:  0.407464

Spikes Pulses Triplets Gaussians Flops
   1      0       0        0     16435446539648

Optimized SETI@Home Enhanced application

Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
  Version: Windows SSE2 32-bit based on seti V5.15  'Ni!'
      Rev: (R-2.4|xW|FFT:IPP_SSE2|Ben-Joe)
    CPUID: 'AMD K8 Athlon 64 (San Diego)'
     cpus: 1 cores: 1 threads: 1   cache: L1=64K  L2=1024K L3=0K
 features: mmx 3Dnow 3Dnow+ sse sse2 sse3  
    speed: 2749 MHz  -- read megs/sec: L1=16622, L2=7603, RAM=3822

Work Unit Info
True angle range:  0.407464

Spikes Pulses Triplets Gaussians Flops
   0      1       6        1     16436374344601


So, one of the tasks didn't register any pulses at all. My guess is that my results will be correct (deemed "strongly similar") at some point down the road.

Another aspect I just thought of is that with reporting being nigh impossible, downloading should also be having problems. Additionally, slower computers with close deadlines may be having their deadlines expire and thus tasks are being reissued, which has no immediate major db impact, but enormous download consequences and result storage consequences...not to mention the dynamics of whether the original host is able to report back in while the new host hasn't started and then whether or not the new host supports 221 aborts AND manages to contact the scheduler before they start working on it... BLAH!

Unless the project stops splitting these fast-running results, I don't know when we'll ever get back to some semblance of normalcy... Maybe one of you with a better relationship with Matt could suggest some sort of plan? From where I sit, it seems like he's content to just wait it out...
ID: 695270 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695278 - Posted: 28 Dec 2007, 4:18:06 UTC

The two results I mentioned have now been reported. Both were reissued to new hosts, one of which runs the stock application and the other is running 2.2B. It will be interesting to note if the new hosts choke on the results or not...
ID: 695278 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 695282 - Posted: 28 Dec 2007, 4:25:01 UTC - in response to Message 695269.  

My well is running dry as I have about 2 hours of work left for my 4400+ and then stillness. My laptop has more thankfully.

Why are we so short of work? (a rhetorical question, I know...)


That's what I can't understand - My 4200+ d/l'ed over 60 wu's & 3800+ d/l'ed over 40 wu's last night. Not sure what my 3700+ d/l'ed as I did not check - Yet other crunchers are not getting sufficient work.

Yet we're all in the same boat with regards to the servers. Some gremlins in the system?
ID: 695282 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 695301 - Posted: 28 Dec 2007, 5:11:54 UTC

Man....this is going from bad to worse....
The current dataset is not only mostly shorties AGAIN, but out of the last 24 WUs I was actually able to snag on Harv2K-C2QUAD, 14 of them ran in 15 seconds or less. That is the only rig I am not running a cache on, so it has been doing a lot of Einstein and Rosetta the last couple of days.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 695301 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 695304 - Posted: 28 Dec 2007, 5:19:18 UTC - in response to Message 695301.  

3700+ currently d/l'ing 12 wu's. That's 3 rigs with a total of 120 wu's yet others are getting none - somethings not right!!!
ID: 695304 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695314 - Posted: 28 Dec 2007, 6:33:50 UTC - in response to Message 695304.  
Last modified: 28 Dec 2007, 6:40:33 UTC

3700+ currently d/l'ing 12 wu's. That's 3 rigs with a total of 120 wu's yet others are getting none - somethings not right!!!


Yeah, well I just encountered "the last straw" for me... My Intel host, which you can look at here has had 3 download errors tonight. So, fine... In the morning that system gets set to not get new work. It'll go through what it has, then it is being attached to Cosmology at an 80% resource share, leaving only 20% here. When / if someone decides to do something about the issue, I'll evaluate the situation and see if I put any allocation back to here or not... I wanted to reach 500K here, but if it takes me a year to do it, so be it...

Edit: Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working!
ID: 695314 · Report as offensive
Odysseus
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 1808
Credit: 6,701,347
RAC: 6
Canada
Message 695315 - Posted: 28 Dec 2007, 6:54:44 UTC - in response to Message 695314.  

Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working!

Nope; it’s a general preference. The best you can do is set up a venue with a different resource share and assign the host to it.

Assuming, that is, that you have a venue to spare—and that the venue preferences are actually working …


ID: 695315 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695316 - Posted: 28 Dec 2007, 7:10:07 UTC - in response to Message 695315.  

Oh, I just got to thinking that I dunno if it is possible to do resource allocations per host. If it isn't, boy isn't it swell we got Mr. Anderson to get those social networking features working!

Nope; it’s a general preference. The best you can do is set up a venue with a different resource share and assign the host to it.

Assuming, that is, that you have a venue to spare—and that the venue preferences are actually working …



I was on my way here to edit my message, but you beat me to it. Yeah, I have venues to spare, but I recall that sometimes the venues didn't work and/or they'd get reset.

Anyway, someone here needs to get the painful data out of the splitter queue. The way I interpret this event is influenced by the last event, when it was basically said "well, we're going to have to split this data sooner or later, so might as well get it done now". That's all well and good and stuff, but not for 4+ days. I'm sure it's comforting to know that the systems are holding up, but that's probably only because there's a "restrictor plate" on our side (the network bottleneck) so the systems aren't being hit as hard as they would otherwise... Also, who knows what the index rebuild is going to take now, with all the extra data?
ID: 695316 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 695344 - Posted: 28 Dec 2007, 12:27:59 UTC - in response to Message 695172.  
Last modified: 28 Dec 2007, 12:34:25 UTC

Still no mention of the source / cause of all the shorty work - just its effects.

Isn't the cause well known? Isn't it simply the nature of what was recorded at the telescope??

Only if you can absolutely, categorically and 100% for all time rule out the possibility of there being any sort of a bug in the splitter code (or, indeed, any part of the pre-processing that happens prior to the issue of the task to your computer: Is the aiming point of the receiver being accurately recorded in the data stream? How much difference does the radar blanking timing signal make? Were these signals recorded with radar blanking active, or are we still pre-filtering recordings with radar pulses in, before they're issued? etc. etc.)
ID: 695344 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 695346 - Posted: 28 Dec 2007, 12:56:33 UTC - in response to Message 695253.  

Thanks, I figured you'd be able to find them with less tries than I would have needed. Hmm, both WinXP hosts but it seems more frequent on Vista. Richard Haselgrove has about 3 cores share weighting on his Vista octo doing SETI Beta to help gather info.

I do understand the "adding insult to injury", everything about the project seems slightly sour now. I just try to focus on the 50+ MiB of work going out and being crunched in spite of the frustrations.
                                                                 Joe

Yes, it's still recording - Beta 2882015 looks like another candidate, but I won't be able to send you the file(s) until I get home.

I do find it a bit odd that Matt has suddenly started entering all sorts of constructive technical dialogue with users about fiber channels, switch glitches etc. - all good stuff - but has made absolutely no comment on my queries about causes. Maybe he sees himself (and please don't think I'm being disparaging here) as an earthbound 'plumber', focussing on the mechanics of the server/network end of the project: is there a radio astronomer in the house?
ID: 695346 · Report as offensive
Profile makosky
Volunteer tester

Send message
Joined: 7 Jul 00
Posts: 56
Credit: 3,908,782
RAC: 0
United States
Message 695351 - Posted: 28 Dec 2007, 13:33:46 UTC - in response to Message 694830.  

A LOT OF SERVERS NOT FUNCTIONING WITH NO NEW TECH NEWS
WERE GOING DOWN CAPTAIN

feeder.i686 bruno Not Running
feeder.i686 ptolemy Not Running
file_deleter1 bruno Not Running
file_deleter2 bruno Not Running
file_deleter3 bruno Not Running
file_deleter4 bruno Not Running
file_deleter5 bruno Not Running
file_deleter6 bruno Not Running
db_purge.x86_64 thumper Not Running
transitioner1 vader Not Running
transitioner2 vader Not Running
transitioner3 bruno Not Running
transitioner4 bruno Not Running
vote_monitor bruno Not Running
transitioner5 vader Not Running


try and get tech news .. all i get is an error message , grrr
transitioner6 vader Not Running
sah_validate1 ptolemy Not Running
sah_validate2 ptolemy Not Running
sah_validate3 ptolemy Not Running
sah_validate4 ptolemy Not Running
sah_validate5 ptolemy Not Running
sah_validate6 ptolemy Not Running
fix_missing_results ptolemy Disabled
sah_assimilator1 ptolemy Not Running
sah_assimilator2 ptolemy Not Running
sah_assimilator3 ptolemy Not Running
sah_assimilator4 ptolemy Not Running
mb_splitter7 lando Disabled
mb_splitter8 lando Disabled
mb_splitter9 bambi Disabled
mb_splitter10 bambi Disabled
mb_splitter11 bambi Disabled
mb_splitter12 bambi Disabled
mb_splitter13 bambi Disabled
mb_splitter14 bambi Disabled


HEHEHEHHE

ID: 695351 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 695359 - Posted: 28 Dec 2007, 16:02:06 UTC - in response to Message 695346.  
Last modified: 28 Dec 2007, 16:15:32 UTC

I do find it a bit odd that Matt has suddenly started entering all sorts of constructive technical dialogue with users about fiber channels, switch glitches etc. - all good stuff - but has made absolutely no comment on my queries about causes. Maybe he sees himself (and please don't think I'm being disparaging here) as an earthbound 'plumber', focussing on the mechanics of the server/network end of the project: is there a radio astronomer in the house?

Not odd at all actually, your analogy is not far off. As I understand it, Matt's principle role at the project is to keep the hardware/software and database working and fine tuned. His job is to keep the pipes unclogged and keep the work flowing in and out. No small feat considering the amount of data that is dealt with.

Eric is in charge of most of the science related aspect of the project. This would be the science apps and the splitter software as well as the equipment/software at Arecibo. He also usually take care of the boards. I believe he deals with most of the funding aspects of the project.

Boinc V7.2.42
Win7 i5 3.33G 4GB, GTX470
ID: 695359 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695367 - Posted: 28 Dec 2007, 16:53:21 UTC - in response to Message 695359.  
Last modified: 28 Dec 2007, 16:55:18 UTC

His job is to keep the pipes unclogged and keep the work flowing in and out. No small feat considering the amount of data that is dealt with.


Well, as indicated earlier in this thread, I connected to my Intel computer with VNC and set it to no new tasks. This will alleviate a miniscule percentage of the bandwidth clogging the system.
ID: 695367 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : RED ALERT !!!! SHIELDS UP


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.