Thousand Island (Feb 23 2009)


log in

Advanced search

Message boards : Technical News : Thousand Island (Feb 23 2009)

Previous · 1 · 2
Author Message
Profile Madas91
Send message
Joined: 7 Apr 04
Posts: 1
Credit: 127,253
RAC: 0
United Kingdom
Message 868961 - Posted: 24 Feb 2009, 8:11:37 UTC

Just finished my last cuda unit :( still have 14 complete to upload :(

Dont let my graphics card drop below 85C or it will think its holiday time. My upload list is finally looking smaller but no new work yet.

Hope the indigestion goes soon.......keep up all the good work anyway.

At least now i know its not just me i'm happier......how does that work
____________

KWSN Sir Clark
Volunteer tester
Avatar
Send message
Joined: 17 Aug 02
Posts: 128
Credit: 216,696
RAC: 0
United Kingdom
Message 868966 - Posted: 24 Feb 2009, 8:40:33 UTC

I'm doing my bit.

Successfully crunching some WUs but got SETI and SETI beta on NNW until this dies down.

I've got 17 other projects to crunch for (all on one PC) so I'm not gonna run out of work.

Hope it gets sorted soon....
____________

Profile littlegreenmanfrommars
Volunteer tester
Avatar
Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 868975 - Posted: 24 Feb 2009, 10:02:42 UTC

Thanks for filling us in on the latest hiccup, Matt.

I have a little to add, though:

My laptop received a message to the effect that a newer version of BOINC was avalailable.

I downloaded the new version, and upon installing it, lost my entire cache of WU's, including a "long distance" AP unit which had almost finished.

Not only that, I found my laptop had to be re-attached to the project.

All the lost WU's will have to be resent again, adding to the bottleneck.

It might be a good idea to try two things here:

1) During times of high bandwidth usage, turn off the downloading to clients of updated BOINC versions.

2) Installing a newer version of BOINC never used to cause a PC to detach from the project. Can we please return to this state?

Cheers

lgm
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4225
Credit: 1,041,735
RAC: 344
United States
Message 868978 - Posted: 24 Feb 2009, 10:24:03 UTC - in response to Message 868911.

gomeyer wrote:
...
[edit] BTW, I still think these rollouts should only be launched earlier in the week. [/edit]

Astropulse v5 was rolled out Wednesday February 11 at about 5 pm Berkeley time. No obvious problem for 9 days, then whammo!
Joe

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12284
Credit: 2,570,564
RAC: 593
Netherlands
Message 868987 - Posted: 24 Feb 2009, 10:49:28 UTC - in response to Message 868975.

2) Installing a newer version of BOINC never used to cause a PC to detach from the project. Can we please return to this state?

Upgrading to a new BOINC does not detach you from your projects. But what can happen is that the data is not migrated to the correct place, or you put in a wrong place for the Data directory to be in, one that BOINC is not allowed to read.

Please post a question of help about that anywhere else, let's not do that in this thread. I am sure that the old work is still on your computer and it could possibly be gotten back.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,178,062
RAC: 192
Australia
Message 868993 - Posted: 24 Feb 2009, 11:19:18 UTC - in response to Message 868960.

But then there's problem 2. An application download checksum error (a) doesn't cause exponential backoff and (b) causes all workunits also requested by this particular client to be errored out and resent.



This would explain why I have processed virtually nothing for the past month while my computer has been running almost 24x7?! After a week of processing an AP WU it gets errored out?!


@ Batman

Not the same problem as yours.

In the problem Matt was referring to, no processing could be done because the application which does the crunching had not been downloaded yet.

If you did a week of processing then you must have already downloaded the application!

Kaylie
Send message
Joined: 26 Jul 08
Posts: 39
Credit: 332,100
RAC: 0
United States
Message 868999 - Posted: 24 Feb 2009, 12:13:02 UTC

All uploaded but still having troubled downloading...

Would it be possible to move the servers closer to the node. $80,000 would buy a small trailer to put the equipment in if there is a place to locate it.

( My dumb ideas are flowing far more freely than WU's lately. )

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 869022 - Posted: 24 Feb 2009, 14:23:16 UTC - in response to Message 868978.

gomeyer wrote:
...
[edit] BTW, I still think these rollouts should only be launched earlier in the week. [/edit]

Astropulse v5 was rolled out Wednesday February 11 at about 5 pm Berkeley time. No obvious problem for 9 days, then whammo!
Joe

Thanks Joe.

I certainly agree with the whammo bit, but what changed? Did we just hit some limit or tipping point in the system after 9 days that suddenly pushed things over the edge? Things just don't suddenly go bump in the night by themselves.

Oh well, as has been correctly pointed out by many, there is no lack of other projects to run.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8458
Credit: 48,636,538
RAC: 80,739
United Kingdom
Message 869025 - Posted: 24 Feb 2009, 14:42:38 UTC - in response to Message 869022.

'whammo' happened at about 23:00 lab time on Friday 20 February (07:00 UTC Saturday 21 Feb). The '-200' errors started much earlier - e.g. task 1166928853, 18 Feb 2009 1:45:56 UTC.

So I suspect it wasn't a deliberate change, but a tipping point - the only other possibility is that I think I've seen an increase in the amount of Astropulse_v5 v5.03 being sent out, perhaps as Astropulse v5.00 WUs clear out of the database, which might have nudged the system into instability.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,272,539
RAC: 11,824
United States
Message 869026 - Posted: 24 Feb 2009, 14:55:43 UTC - in response to Message 869025.

Maybe that was the length of time it took for the problem machines to get to them. When they tried to work on the new WUs and found the client missing and started returning them as -200.
____________


PROUD MEMBER OF Team Starfire World BOINC

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 869032 - Posted: 24 Feb 2009, 15:06:52 UTC - in response to Message 869026.
Last modified: 24 Feb 2009, 15:07:45 UTC

Maybe that was the length of time it took for the problem machines to get to them. When they tried to work on the new WUs and found the client missing and started returning them as -200.

Considering the number of 10 day queues out there, "9 days later" would make a lot of sense. Interesting observation!

[edit] and Murphy would account for the fact that it happened on a weekend [/edit]

Jesse Viviano
Send message
Joined: 27 Feb 00
Posts: 95
Credit: 474,230
RAC: 0
United States
Message 869038 - Posted: 24 Feb 2009, 15:25:27 UTC
Last modified: 24 Feb 2009, 16:09:02 UTC

Maybe if CNS finishes up drawing up the plan to get Gigabit Ethernet between Hurricane Electric and the Space Sciences Laboratory, maybe it could submit the plan to compete for the stimulus package money :-D . (The closest SONET speed near one gigabit would be OC-24, but that speed is apparently not commercially viable. The closest commercially viable speed apparently is OC-48.)

However, it would be a long shot, because California probably needs all that money to stay out of needing to file for bankruptcy due to the subprime mortgage crisis that started in California and Florida that is currently destroying the tax base.

Profile Neil Blaikie
Volunteer tester
Avatar
Send message
Joined: 17 May 99
Posts: 142
Credit: 6,466,200
RAC: 0
Canada
Message 869076 - Posted: 24 Feb 2009, 17:05:53 UTC
Last modified: 24 Feb 2009, 17:08:00 UTC

2/24/2009 12:04:52 PM|SETI@home|Message from server: No work available for the applications you have selected. Please check your settings on the web site.


Erm, not seem that one before, took 3 update requests to finally get some new work and even then some of them had http errors and are download pending.

Will be patient and wait and see what happens after the outage which should be starting sometime soon California time.
____________

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,275,946
RAC: 542
United States
Message 869114 - Posted: 24 Feb 2009, 17:33:25 UTC - in response to Message 869022.
Last modified: 24 Feb 2009, 17:37:16 UTC

I certainly agree with the whammo bit, but what changed?


In my view, the tipping point came right after Lunatics released the new optimised AP 5.03 app, then everyone started downloading new AP WUs to run on the new (r112) opt app.

I myself am making a specialty of running only the older AP opt app (r103) and cleaning up many of the old-version WUs that are being returned late or aborted. I am keeping my 4-day cache filled easily.

Three cheers and thanks to those wonderful, skilled programmers who can take full advantage of the CPU hardware features and get both MB and AP WUs to complete in half the time. THANK YOU !!!

Whit

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8458
Credit: 48,636,538
RAC: 80,739
United Kingdom
Message 869128 - Posted: 24 Feb 2009, 22:55:59 UTC - in response to Message 869114.

I certainly agree with the whammo bit, but what changed?

In my view, the tipping point came right after Lunatics released the new optimised AP 5.03 app, then everyone started downloading new AP WUs to run on the new (r112) opt app.

The release posts for Lunatics r112 (v5.03 equivalent) are timed at 23:32 UTC on 21 Feb (Lunatics site), and 00:13 UTC on 22 Feb (here) - at least 16 hours after 'whammo'.

Although some of us have been running r112 since 13 February in an attempt to confirm accurate validations (unfortunately v5.03 was never tested at Beta, so no pre-release validation tests could be carried out there), I don't think you can say that the Lunatics release 'caused' this event.

[We did think of claiming that the release timing was deliberate - "download from Lunatics if you can't get an official copy from Berkeley" - but that would have untrue: the timing was pure coincidence].

BTW, congratulations on the effort to clean up v5.00 - but it is possible to run both versions together until the cleanup is complete.

Profile Westsail and *Pyxey*
Volunteer tester
Avatar
Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,538,216
RAC: 0
United States
Message 869350 - Posted: 25 Feb 2009, 15:29:32 UTC

Over 1500 Mahalos to You! Tanks yea for all you guys doing. Also hope we can help for make more bandwidth. Everything seems 5x5 now. Got all work uploaded and got nearly full caches on all 3 machines. Woo hoo, life is good. Take care everyone! -Brandon
____________
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov

Previous · 1 · 2

Message boards : Technical News : Thousand Island (Feb 23 2009)

Copyright © 2014 University of California