Ricochet (Jun 02 2011)

Message boards : Technical News : Ricochet (Jun 02 2011)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1112205 - Posted: 1 Jun 2011, 23:09:44 UTC

Long time no speak. I've been out of town and/or busy and/or admittedly falling out of the habit of posting to the forums.

So I was gone last week (camping in various remote corners of Utah, mostly) and like clockwork a lot of server problems hit the fan once I was out of contact. Among other things, the raw data storage server died (but has since been recovered), oscar wedged up for no reason (a power cycle fixed that) and Jeff's desktop had some issues as well (nothing a replacement power supply couldn't handle).

Then we had the holiday weekend of course, but we all returned here yesterday and continued handling the fallout from all that, as well as the usual weekly outage stuff. We're still using thumper as the active raw data storage server and worf is now where we're keeping the science backups. Basically they switched roles for the time being, until we let this all incubate and decide what to do next, if anything.

This morning we brought the projects down to replace some DIMMs (the have been sending complaints to the OS) on thumper. One thing I kinda loathe about professional computing in general is poor documentation - a problem compounded by chronic zero-index vs. one-index confusion, and physical hardware labels vs. how they are depicted in the software. Long story short despite all kinds of effort to determine exactly which DIMMs were broken, it wasn't until after we did the surgery and brought everything back on line that we found out we probably replaced the wrong ones. Oops. We'll have to do this again sometime soon.

There are some broken astropulse results clogging one of the validators (which is why it shows up on red on the status page). We'll have to figure out an automated way to detect these results and push them through (it's a real pain to do by hand). In the meantime, this is causing our workunit storage server to be quite full, and might hamper other workunit development sooner than later.

Gripes and server issues aside, there is continuing happy progress. I'm still tinkering with visualization stuff for web based analysis of our candidates (for private and potential public use), and we have tons of data from the Kepler mission arriving here any day now which will be fun to play with.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1112205 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1112207 - Posted: 1 Jun 2011, 23:15:36 UTC

Thank you for once again informing us of the trials and tribulations in the Seti server closet.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1112207 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6658
Credit: 121,090,076
RAC: 0
United States
Message 1112208 - Posted: 1 Jun 2011, 23:17:45 UTC

Thank you for the news Matt! We love hearing what goes on at your end. The new data from the Kepler mission sounds very interesting.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1112208 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1112210 - Posted: 1 Jun 2011, 23:18:11 UTC - in response to Message 1112205.  

Thanks for the update Matt,

Claggy
ID: 1112210 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1112211 - Posted: 1 Jun 2011, 23:19:47 UTC - in response to Message 1112207.  
Last modified: 1 Jun 2011, 23:20:10 UTC

Sounds like you are keeping a close rein on things. Thanks for the efforts.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1112211 · Report as offensive
Profile Akio
Avatar

Send message
Joined: 18 May 11
Posts: 375
Credit: 32,129,242
RAC: 0
United States
Message 1112232 - Posted: 2 Jun 2011, 1:15:26 UTC - in response to Message 1112211.  

Thanks for insight on what has been giving you guys trouble. It's quite interesting to hear all the curve balls you are thrown and how you are quickly able to adapt! Well done, and thanks for the update ;)
ID: 1112232 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1112242 - Posted: 2 Jun 2011, 3:17:20 UTC - in response to Message 1112205.  

I'm beginning to learn how to read your news with a British accent using Rowan Atkinson's voice as I read it.

Certainly makes things far more entertaining and a right bit hilarious as I read about the server issues. Hope you don't mind at all.
ID: 1112242 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34381
Credit: 79,922,639
RAC: 80
Germany
Message 1112275 - Posted: 2 Jun 2011, 7:58:05 UTC

Thanks for the update Matt.



With each crime and every kindness we birth our future.
ID: 1112275 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1112343 - Posted: 2 Jun 2011, 15:15:49 UTC - in response to Message 1112242.  

I'm beginning to learn how to read your news with a British accent using Rowan Atkinson's voice as I read it.

Certainly makes things far more entertaining and a right bit hilarious as I read about the server issues. Hope you don't mind at all.

LOL! How many people besides me tried it after reading this reply? I think I was drifting between British, Australian, and Boston, but yes it was interesting.

Thanks for the update, Matt. And I must say, I really REALLY like the new stats in the header and footer of the All Tasks for Computer _______ pages. Since I got my new computer, it got extremely difficult to add these numbers up by hand. I gave up it after a couple days; now I know again.

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. The card is an nvidia GT 440 (not the latest and greatest, but adequate to my primary need). I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it. I have different preferences for this machine to allow it to use most of its potential, whereas I restrict my others a bit for their overall health and performance of other apps.

David
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1112343 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112351 - Posted: 2 Jun 2011, 15:37:07 UTC - in response to Message 1112343.  

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. The card is an nvidia GT 440 (not the latest and greatest, but adequate to my primary need). I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it. I have different preferences for this machine to allow it to use most of its potential, whereas I restrict my others a bit for their overall health and performance of other apps.

David

Yes, it would be better to continue the conversation in Number Crunching if this doesn't answer the point.

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.
ID: 1112351 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1112355 - Posted: 2 Jun 2011, 15:43:34 UTC

A few things to work on, now that the major fires have been stomped out:

1) The "client connection stats" page hasn't been updated since before the big black-out last year.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...

These assume, of course that the BOINC server software hasn't been changed so that they are unavailable. Also, if the "Pending Credit" page is permanently gone, could someone delete the link?
.

Hello, from Albany, CA!...
ID: 1112355 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1112436 - Posted: 2 Jun 2011, 19:00:47 UTC - in response to Message 1112355.  

1) The "client connection stats" page hasn't been updated since before the big black-out last year.


For some reason this is a big pain to keep working (and obviously low priority to keep kicking back into working mode). Will try to look into that again soon.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...


Oh yeah that. There was a cluster of power/security concern issues at Arecibo a few weeks back and lots of things haven't been adjusted to work with new networking/security regimes yet. So we haven't gotten telescope info up here in real time for a while, hence the big delays.. Also will look into that again soon.

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?


Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1112436 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31013
Credit: 53,134,872
RAC: 32
United States
Message 1112460 - Posted: 2 Jun 2011, 19:51:56 UTC - in response to Message 1112436.  

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?


Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

There are two places for pending credit now. The one on your account page is now dead. I think a BOINC server update killed it. IIRC there is now a server side switch for it to be active. The other pending credit is on your tasks page and works.

ID: 1112460 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112461 - Posted: 2 Jun 2011, 19:53:48 UTC - in response to Message 1112436.  

Also, if the "Pending Credit" page is permanently gone, could someone delete the link?

Wait... what's the situation here (I have zero pending credit, so the page link works - it just says pending credit: 0.00 and shows no tasks)?

- Matt

Back on or around 8 March (this year), David Anderson made a change in the back-end BOINC server code - specifically, changeset [trac]changeset:23118[/trac] for sched_result.cpp - which meant that no calculated value for "claimed credit" was put in the database when a result was reported - David wants us all to use CreditNew instead.

The 'pending credit' list, as its name suggests, only shows pending tasks which have a non-zero claimed credit - so no newly-reported results have appeared on the pages since 8 March. Most people, like you, now have empty lists - we've been comparing notes in Pending Credit List Has Almost Disappeared - but I've still got five left.....

Yes, the link is pretty much redundant now, and unless David has a major change of heart, won't be coming back. You might as well save the space.
ID: 1112461 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1112707 - Posted: 3 Jun 2011, 16:01:15 UTC - in response to Message 1112351.  

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress. Does this mean the computer's GPU failed? Obviously, I'm still getting video output. <snip> I restarted the computer a couple of times while I was messing with it and did not check the BOINC manager again, so maybe it recovered and I don't know it.<snip>

David

Yes, it would be better to continue the conversation in Number Crunching if this doesn't answer the point.

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.

Thanks for the advice. I did check it again this morning and I still had all the "GPU missing"s. Based on your comments, what I'll do is remove BOINC as a scheduled task (even though it runs under the same user name), restart the computer, and start BOINC manually, then not log off (I log off for security and so I can use Remote Desktop Connection, but this computer has a Home version so I can't Remote in anyway). If this doesn't work, I'll ask in Crunching.

David
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1112707 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1113902 - Posted: 6 Jun 2011, 15:03:12 UTC - in response to Message 1112707.  

Maybe I should ask this in Number Crunching... I looked at my new machine's BOINC manager yesterday for the first time since the day I installed it, I think. Almost half of the tasks listed on the tasks tab had a status of "GPU missing; Ready to run." None of them showed any progress.<snip>

David

"GPU missing; ..." (in Windows) is most likely the result of using Fast User Switching or Remote Desktop without stopping and restarting BOINC: GPUs only run if the user who started the BOINC session, and the user currently active at the console, are one and the same. If that doesn't apply to you, ask again in NC.

Thanks for the advice. I did check it again this morning and I still had all the "GPU missing"s. Based on your comments, what I'll do is remove BOINC as a scheduled task (even though it runs under the same user name), restart the computer, and start BOINC manually, then not log off.

David

The situation seems to be resolved. I removed the Scheduled task and stopped and restarted BOINC without restarting the computer. All the Seti tasks then showed ready to run.

But then I got a bit stupid. I looked some more and found about 10 Einstein units due in 2-3 days (ONE of which was running in high-priority mode), so I suspended Seti so it would finish the Einstains. What I failed to do was set Einstein to no new tasks, so when it finished the 10 and Seti was suspended, it downloaded 100+ new Einsteins, also with short deadlines. -sigh-

How does BOINC decide what to work on, anyway? I see it nearly finished with tasks that are due in a month and a half, while others that are due in a week aren't started.

David
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1113902 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1113918 - Posted: 6 Jun 2011, 15:26:22 UTC - in response to Message 1113902.  

How does BOINC decide what to work on, anyway? I see it nearly finished with tasks that are due in a month and a half, while others that are due in a week aren't started.

David

Unless BOINC is under extreme deadline pressure, it runs tasks (within one project) in the order they're received from the project's servers. Any more explanation than that is for NC.

Glad you got your GPU back, anyway.
ID: 1113918 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1113994 - Posted: 6 Jun 2011, 17:59:13 UTC - in response to Message 1112436.  

1) The "client connection stats" page hasn't been updated since before the big black-out last year.


For some reason this is a big pain to keep working (and obviously low priority to keep kicking back into working mode). Will try to look into that again soon.

2) The "Multi-Beam Data Recorder Status" shows "34206m ago" - that's ~ 24 days...


Oh yeah that. There was a cluster of power/security concern issues at Arecibo a few weeks back and lots of things haven't been adjusted to work with new networking/security regimes yet. So we haven't gotten telescope info up here in real time for a while, hence the big delays.. Also will look into that again soon.

[[snip]
- Matt


I know both are low priority problems, that's why I don't bother posting them when there are bigger fish to fry...

.

Hello, from Albany, CA!...
ID: 1113994 · Report as offensive
justsomeguy

Send message
Joined: 27 May 99
Posts: 84
Credit: 6,084,595
RAC: 11
United States
Message 1115138 - Posted: 9 Jun 2011, 17:48:42 UTC - in response to Message 1112205.  


This morning we brought the projects down to replace some DIMMs (the have been sending complaints to the OS) on thumper. One thing I kinda loathe about professional computing in general is poor documentation - a problem compounded by chronic zero-index vs. one-index confusion, and physical hardware labels vs. how they are depicted in the software. Long story short despite all kinds of effort to determine exactly which DIMMs were broken, it wasn't until after we did the surgery and brought everything back on line that we found out we probably replaced the wrong ones. Oops. We'll have to do this again sometime soon.


Hey Matt,

Do you guys ever use the service processor on Thumper? I've seen erroneous errors
thrown on the x4500's and x4600's. If you look in the logs in the service processor it will give exact dimms...if you have a hard memory failure there
is a little button on the system board near the dimms...it will light up the
failed dimm when this is pressed, it's a small cap and doesn't last long, but
long enough.

Let me know if I can be of assistance!

Kevin


"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

ID: 1115138 · Report as offensive

Message boards : Technical News : Ricochet (Jun 02 2011)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.