Ricochet (Jun 02 2011)


log in

Advanced search

Message boards : Technical News : Ricochet (Jun 02 2011)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1112205 - Posted: 1 Jun 2011, 23:09:44 UTC

Long time no speak. I've been out of town and/or busy and/or admittedly falling out of the habit of posting to the forums.

So I was gone last week (camping in various remote corners of Utah, mostly) and like clockwork a lot of server problems hit the fan once I was out of contact. Among other things, the raw data storage server died (but has since been recovered), oscar wedged up for no reason (a power cycle fixed that) and Jeff's desktop had some issues as well (nothing a replacement power supply couldn't handle).

Then we had the holiday weekend of course, but we all returned here yesterday and continued handling the fallout from all that, as well as the usual weekly outage stuff. We're still using thumper as the active raw data storage server and worf is now where we're keeping the science backups. Basically they switched roles for the time being, until we let this all incubate and decide what to do next, if anything.

This morning we brought the projects down to replace some DIMMs (the have been sending complaints to the OS) on thumper. One thing I kinda loathe about professional computing in general is poor documentation - a problem compounded by chronic zero-index vs. one-index confusion, and physical hardware labels vs. how they are depicted in the software. Long story short despite all kinds of effort to determine exactly which DIMMs were broken, it wasn't until after we did the surgery and brought everything back on line that we found out we probably replaced the wrong ones. Oops. We'll have to do this again sometime soon.

There are some broken astropulse results clogging one of the validators (which is why it shows up on red on the status page). We'll have to figure out an automated way to detect these results and push them through (it's a real pain to do by hand). In the meantime, this is causing our workunit storage server to be quite full, and might hamper other workunit development sooner than later.

Gripes and server issues aside, there is continuing happy progress. I'm still tinkering with visualization stuff for web based analysis of our candidates (for private and potential public use), and we have tons of data from the Kepler mission arriving here any day now which will be fun to play with.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4876
Credit: 82,952,874
RAC: 39,703
United States
Message 1112208 - Posted: 1 Jun 2011, 23:17:45 UTC

Thank you for the news Matt! We love hearing what goes on at your end. The new data from the Kepler mission sounds very interesting.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4139
Credit: 33,443,021
RAC: 20,083
United Kingdom
Message 1112210 - Posted: 1 Jun 2011, 23:18:11 UTC - in response to Message 1112205.

Thanks for the update Matt,

Claggy

Profile Bill GProject donor
Avatar
Send message
Joined: 1 Jun 01
Posts: 348
Credit: 42,964,981
RAC: 50,972
United States
Message 1112211 - Posted: 1 Jun 2011, 23:19:47 UTC - in response to Message 1112207.
Last modified: 1 Jun 2011, 23:20:10 UTC

Sounds like you are keeping a close rein on things. Thanks for the efforts.
____________

Profile SliverProject donor
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 7,191,152
RAC: 738
United States
Message 1112232 - Posted: 2 Jun 2011, 1:15:26 UTC - in response to Message 1112211.

Thanks for insight on what has been giving you guys trouble. It's quite interesting to hear all the curve balls you are thrown and how you are quickly able to adapt! Well done, and thanks for the update ;)
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 30,972,903
RAC: 20,685
United States
Message 1112242 - Posted: 2 Jun 2011, 3:17:20 UTC - in response to Message 1112205.

I'm beginning to learn how to read your news with a British accent using Rowan Atkinson's voice as I read it.

Certainly makes things far more entertaining and a right bit hilarious as I read about the server issues. Hope you don't mind at all.

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24481
Credit: 33,788,035
RAC: 24,167
Germany
Message 1112275 - Posted: 2 Jun 2011, 7:58:05 UTC

Thanks for the update Matt.

____________

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32036
Credit: 13,715,084
RAC: 27,889
United Kingdom
Message 1112299 - Posted: 2 Jun 2011, 10:26:27 UTC

Thanks Matt.

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11932
Credit: 14,609,767
RAC: 12,441
United States
Message 1112343 - Posted: 2 Jun 2011, 15:15:49 UTC - in response to Message 1112242.

I'm beginning to learn how to read your news with a British accent using Rowan Atkinson's voice as I read it.

Certainly makes things far more entertaining and a right bit hilarious as I read about the server issues. Hope you don't mind at all.

LOL! How many people besides me tried it after reading this reply? I think I was drifting between British, Australian, and Boston, but yes it was interesting.

Thanks for the update, Matt. And I must say, I really REALLY like the new stats in the header and footer of the All Tasks for Computer _______ pages. Since I got my new computer, it got extremely difficult to add these numbers up by hand. I gave up it after a couple days; now I know again.

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. The card is an nvidia GT 440 (not the latest and greatest, but adequate to my primary need). I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it. I have different preferences for this machine to allow it to use most of its potential, whereas I restrict my others a bit for their overall health and performance of other apps.

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,423,701
RAC: 50,565
United Kingdom
Message 1112351 - Posted: 2 Jun 2011, 15:37:07 UTC - in response to Message 1112343.

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. The card is an nvidia GT 440 (not the latest and greatest, but adequate to my primary need). I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it. I have different preferences for this machine to allow it to use most of its potential, whereas I restrict my others a bit for their overall health and performance of other apps.

David

Yes, it would be better to continue the conversation in Number Crunching if this doesn't answer the point.

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1958
Credit: 10,429,152
RAC: 8,467
United States
Message 1112355 - Posted: 2 Jun 2011, 15:43:34 UTC

A few things to work on, now that the major fires have been stomped out:

1) The "client connection stats" page hasn't been updated since before the big black-out last year.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...

These assume, of course that the BOINC server software hasn't been changed so that they are unavailable. Also, if the "Pending Credit" page is permanently gone, could someone delete the link?
____________
.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1112436 - Posted: 2 Jun 2011, 19:00:47 UTC - in response to Message 1112355.

1) The "client connection stats" page hasn't been updated since before the big black-out last year.


For some reason this is a big pain to keep working (and obviously low priority to keep kicking back into working mode). Will try to look into that again soon.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...


Oh yeah that. There was a cluster of power/security concern issues at Arecibo a few weeks back and lots of things haven't been adjusted to work with new networking/security regimes yet. So we haven't gotten telescope info up here in real time for a while, hence the big delays.. Also will look into that again soon.

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?


Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12704
Credit: 7,193,176
RAC: 15,526
United States
Message 1112460 - Posted: 2 Jun 2011, 19:51:56 UTC - in response to Message 1112436.

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?


Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

There are two places for pending credit now. The one on your account page is now dead. I think a BOINC server update killed it. IIRC there is now a server side switch for it to be active. The other pending credit is on your tasks page and works.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,423,701
RAC: 50,565
United Kingdom
Message 1112461 - Posted: 2 Jun 2011, 19:53:48 UTC - in response to Message 1112436.

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?

Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

Back on or around 8 March (this year), David Anderson made a change in the back-end BOINC server code - specifically, changeset [23118] for sched_result.cpp - which meant that no calculated value for "claimed credit" was put in the database when a result was reported - David wants us all to use CreditNew instead.

The 'pending credit' list, as its name suggests, only shows pending tasks which have a non-zero claimed credit - so no newly-reported results have appeared on the pages since 8 March. Most people, like you, now have empty lists - we've been comparing notes in Pending Credit List Has Almost Disappeared - but I've still got five left.....

Yes, the link is pretty much redundant now, and unless David has a major change of heart, won't be coming back. You might as well save the space.

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11932
Credit: 14,609,767
RAC: 12,441
United States
Message 1112707 - Posted: 3 Jun 2011, 16:01:15 UTC - in response to Message 1112351.

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. <snip> I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it.<snip>

David

Yes, it would be better to continue the conversation in Number Crunching if this doesn't answer the point.

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.

Thanks for the advice. I did check it again this morning and I still had all the "GPU missing"s. Based on your comments, what I'll do is remove BOINC as a scheduled task (even though it runs under the same user name), restart the computer, and start BOINC manually, then not log off (I log off for security and so I can use Remote Desktop Connection, but this computer has a Home version so I can't Remote in anyway). If this doesn't work, I'll ask in Crunching.

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11932
Credit: 14,609,767
RAC: 12,441
United States
Message 1113902 - Posted: 6 Jun 2011, 15:03:12 UTC - in response to Message 1112707.

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress.<snip>

David

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.

Thanks for the advice. I did check it again this morning and I still had all the "GPU missing"s. Based on your comments, what I'll do is remove BOINC as a scheduled task (even though it runs under the same user name), restart the computer, and start BOINC manually, then not log off.

David

The situation seems to be resolved. I removed the Scheduled task and stopped and restarted BOINC without restarting the computer. All the Seti tasks then showed ready to run.

But then I got a bit stupid. I looked some more and found about 10 Einstein units due in 2-3 days (ONE of which was running in high-priority mode), so I suspended Seti so it would finish the Einstains. What I failed to do was set Einstein to no new tasks, so when it finished the 10 and Seti was suspended, it downloaded 100+ new Einsteins, also with short deadlines. -sigh-

How does BOINC decide what to work on, anyway? I see it nearly finished with tasks that are due in a month and a half, while others that are due in a week aren't started.

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,423,701
RAC: 50,565
United Kingdom
Message 1113918 - Posted: 6 Jun 2011, 15:26:22 UTC - in response to Message 1113902.

How does BOINC decide what to work on, anyway? I see it nearly finished with tasks that are due in a month and a half, while others that are due in a week aren't started.

David

Unless BOINC is under extreme deadline pressure, it runs tasks (within one project) in the order they're received from the project's servers. Any more explanation than that is for NC.

Glad you got your GPU back, anyway.

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1958
Credit: 10,429,152
RAC: 8,467
United States
Message 1113994 - Posted: 6 Jun 2011, 17:59:13 UTC - in response to Message 1112436.

1) The "client connection stats" page hasn't been updated since before the big black-out last year.


For some reason this is a big pain to keep working (and obviously low priority to keep kicking back into working mode). Will try to look into that again soon.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...


Oh yeah that. There was a cluster of power/security concern issues at Arecibo a few weeks back and lots of things haven't been adjusted to work with new networking/security regimes yet. So we haven't gotten telescope info up here in real time for a while, hence the big delays.. Also will look into that again soon.

[[snip]
- Matt


I know both are low priority problems, that's why I don't bother posting them when there are bigger fish to fry...

____________
.

Twisted
Send message
Joined: 27 May 99
Posts: 81
Credit: 1,878,607
RAC: 34
United States
Message 1115138 - Posted: 9 Jun 2011, 17:48:42 UTC - in response to Message 1112205.


This morning we brought the projects down to replace some DIMMs (the have been sending complaints to the OS) on thumper. One thing I kinda loathe about professional computing in general is poor documentation - a problem compounded by chronic zero-index vs. one-index confusion, and physical hardware labels vs. how they are depicted in the software. Long story short despite all kinds of effort to determine exactly which DIMMs were broken, it wasn't until after we did the surgery and brought everything back on line that we found out we probably replaced the wrong ones. Oops. We'll have to do this again sometime soon.


Hey Matt,

Do you guys ever use the service processor on Thumper? I've seen erroneous errors
thrown on the x4500's and x4600's. If you look in the logs in the service processor it will give exact dimms...if you have a hard memory failure there
is a little button on the system board near the dimms...it will light up the
failed dimm when this is pressed, it's a small cap and doesn't last long, but
long enough.

Let me know if I can be of assistance!

Kevin


____________
"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

Message boards : Technical News : Ricochet (Jun 02 2011)

Copyright © 2014 University of California