Up (Dec 07 2010)


log in

Advanced search

Message boards : Technical News : Up (Dec 07 2010)

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1054106 - Posted: 8 Dec 2010, 22:56:50 UTC

Another quick note instead of a new thread:

We did finally uncover the one configurable necessary to make use of the vast amount of extra memory on oscar. Just to start we doubled the informix read/write buffer space, and that seems to be working. The processes are larger in size, and for the first time since we turned everything back on our bandwidth is totally maxed out, which leads me to believe things are pretty darn smooth.

Tomorrow we'll quadruple the buffer space from where they are now. Mwha ha ha ha.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 1054112 - Posted: 8 Dec 2010, 23:10:03 UTC
Last modified: 8 Dec 2010, 23:10:45 UTC

LOL nice 1 - starting to get a bit of VFM...

Profile APCyberax
Volunteer tester
Send message
Joined: 6 Jun 01
Posts: 29
Credit: 2,000,348
RAC: 0
United Kingdom
Message 1054114 - Posted: 8 Dec 2010, 23:15:04 UTC

got some WU for my pc. shame my server missed them all as thats now idle. i've left it with no tasks now so i can get right back to the search :)

Thanks guys for your hard work. Hope you had some money left in the donations for a drink. Its on us.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,910,660
RAC: 76,239
United Kingdom
Message 1054131 - Posted: 9 Dec 2010, 0:09:11 UTC - in response to Message 1054106.
Last modified: 9 Dec 2010, 0:10:27 UTC

Hope you had some money left in the donations for a drink. Its on us.

Tomorrow we'll quadruple the buffer space from where they are now. Mwha ha ha ha.

Sounds like they've started already - and as a donor, I don't begrudge them a drop. Cheers, Matt!

Profile droctagon
Avatar
Send message
Joined: 11 Jul 10
Posts: 1
Credit: 8,543,805
RAC: 0
Bulgaria
Message 1054134 - Posted: 9 Dec 2010, 0:24:34 UTC

Thanks for the info guys, I'm finally back in the project, at full speed :). Cheers!

KZ3AB
Send message
Joined: 1 Mar 00
Posts: 6
Credit: 2,441,861
RAC: 1,359
United States
Message 1054148 - Posted: 9 Dec 2010, 1:45:41 UTC

Hot damn! I finally received some numbers to crunch.
Thankx all the hard work.


____________

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11162
Credit: 13,946,888
RAC: 12,367
United States
Message 1054266 - Posted: 9 Dec 2010, 14:31:58 UTC - in response to Message 1054021.

No need for an announcement. The BOINC software was written to take care of sites that have no work for an extended period. Look at LHC@home.


The problem is that many people (including myself) had set Seti to NNT during the downtime. They might not change it back to ANT until they know the project is back up and new work is being created.

Justin


The thing is, there was no reason to set it to NNT. As stated, BOINC is designed to automatically handle times of no work available. I set both Seti and Einstein to NNT when I started having trouble, but that was because I didn't really know what my problem was (and in fact I still don't know if Seti will also be a problem, but tonight I have to go down to the basement to do some laundry, so I'll start Seti then).

That said, I am glad an announcement was made, almost as much as I'm glad the project is back up!

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Whitesnake
Volunteer tester
Avatar
Send message
Joined: 28 Aug 99
Posts: 12
Credit: 9,923,878
RAC: 0
Switzerland
Message 1054273 - Posted: 9 Dec 2010, 15:15:38 UTC

Thanks to every body for this very good job.
ap.wu start again
Avec toutes ces dernières avancées sur les systèmes et l'équation de Drake remis à jour. Et cela donne une toute autre mesure des choses.
____________

Profile rflulling
Send message
Joined: 2 Mar 02
Posts: 6
Credit: 627,423
RAC: 0
United States
Message 1054436 - Posted: 9 Dec 2010, 23:31:14 UTC

What incredible timing. I was just talking to Cisco engineer yesterday. He was telling me about a really kick but server they have developed that has more ram slots than you can shake a stick at. The idea was a low cost system that let an administrator load all the slots with lower cost memory (smaller sizes) then upgrade later as costs fall. I really don't know much about the hardware but when he started tell me about the blades, the first thing that came to mind was SETI and it's update status.

The Cisco Engineer said that most likely SETI has already spoken with a Cisco rep. I'll trust that's true, but if not, you guys might be missing out. Maybe next upgrade? He said the hardware was 100% compatible with all existing architecture without any alterations needed. The hardware also favors Virtualized environments where the Hyper Visor watches over all.

Anyway, I am totally stoked to see the servers back up and running. My work unit average has been dropping hard for weeks. Looking forward to the Rocket trail take off.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1054438 - Posted: 10 Dec 2010, 0:03:53 UTC

One more minor update: We continue to beat up on oscar - long story short we're finding our biggest hurdle in utilizing the server to its maximum potential is probably the stripe size on the raid subsystem (which is set to the factory default as it's hard to predict these bottlenecks until everything is turned on). I think I can adjust it live - and we'll try these sort of tests/updates early next week. In the meantime, more testing with what we got...

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12395
Credit: 6,705,195
RAC: 8,660
United States
Message 1054457 - Posted: 10 Dec 2010, 1:46:57 UTC - in response to Message 1054438.

One more minor update: We continue to beat up on oscar - long story short we're finding our biggest hurdle in utilizing the server to its maximum potential is probably the stripe size on the raid subsystem (which is set to the factory default as it's hard to predict these bottlenecks until everything is turned on). I think I can adjust it live - and we'll try these sort of tests/updates early next week. In the meantime, more testing with what we got...

- Matt

If you can change it live, that's quite a trick.

Thanks for the progress report.

____________

Profile DanLM
Avatar
Send message
Joined: 2 Mar 04
Posts: 4
Credit: 490,568
RAC: 0
United States
Message 1054499 - Posted: 10 Dec 2010, 3:38:58 UTC

Thank you for all the work. I had just built three new FreeBSD(old) machines for seti when the project went off line. Lol, timing is everything. I thought I had done something wrong with setting up seti because I had no wu's for those machines.

But all is well for both the seti project and all the machines I have crunching numbers.

Again, thank you.
____________

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 94
United States
Message 1054501 - Posted: 10 Dec 2010, 3:43:24 UTC - in response to Message 1054438.

Matt. if you need to take anything down, no worries. It is if it crashes unexpectedly we start to cringe. We spend a lot of time and energy re-arranging those 0's and 1's.. hate to see them scattered on the floor.
____________

Janice

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,138
RAC: 415
United States
Message 1054509 - Posted: 10 Dec 2010, 4:26:16 UTC

Okay, I know I'll get a lot of flack for this question, but since I don't know the WU system flow, I'll ask it.

From looking at the Server Status Page, I believe that there are (at least) two problems in evidence: 1.) Low WU creation rate, and 2.) Slow assimiliation rate.

My logic tells me that the BOINC database server (Carolyn) is where the WUs are assigned among all the client computers (and probably affects the creation rates), while the Science database (Oscar) only comes into play at the time of assimiliation.

So, am I correct in the belief that tinkering with Oscar's DB settings would affect the assimiliation speed, and tinkering with Carolyn's DB settings would affect the WU creation rate (and deletions, purges, etc)?

I realize that the internal file server, Ptolemy, is probably in the mix too, but, to my knowledge, that server has not yet been changed over to it's replacement, which will be Thumper. So, no settings on that machine have been altered as yet.

Thanks in advance for enlightening me.

Whit

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 94
United States
Message 1054511 - Posted: 10 Dec 2010, 4:40:10 UTC - in response to Message 1054509.

Actually I do not see a slow creation or assimilation rate. The "tapes" are only going to put out work units so fast. The assimilation is not seriously backlogged, the queries are not out of line on Carolyn, and Jocelyn seems to be keeping up just fine. The results in the field are going up at a fair rate, and turn around time is not out of line. Can they do better? Probably, but give them a chance to tune it. The biggest thing is, we are working again, have not crashed, do not seem to be over-stressing the servers yet.
Once they get it all tuned, Oscar should be able to handle anything thumper did with lots of room to spare.

The most impressive change I have seen is we are getting our units through the transfers MUCH faster than I have seen. This seems to be saving bandwidth.

Further questions are probably best in number crunching. We pick everything apart in there.
____________

Janice

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11162
Credit: 13,946,888
RAC: 12,367
United States
Message 1054721 - Posted: 10 Dec 2010, 17:36:54 UTC - in response to Message 1053986.


Thanks to everyone in the lab for all their hard work. I guess I'll have to find an excuse to go down to the basement and wake up BOINC. (I removed it from Windows scheduled tasks when I started having Einstein trouble, so it only runs if I log on.) I will only let it have 1 task until I know if SETI will also cause a crash, though.

David

Greetings David,

Not to veer off topic but...

I have been reading so much about others having a problem running Einstein. This has really go me stumped now. I have been having problems with my i7 PC ever since I re-attached to Einstein 30 some days ago. I cannot get to the bottom of the problem. Here's a link to a thread I started on the Einstein forum. Please read that thread and let me know if any of it pertains to your problem with Einstein.

Thank you! :)

Keep on BOINCing...! :)

Siran,

I've spent most of a day and a half of "work" time reading your thread at Einstein. I'll make a more detailed response there when I finish it (almost done). However, let me say here that I finally started BOINC last night. I sat and watched it get no tasks for about half an hour before I gave up and went upstairs. My status shows I got three tasks 10 minutes after that, one 5 minutes later, and two more in the wee hours of the morning. None has been returned yet, but I just checked my radioreference.com feed and it's still online after over 16 hours of crunching Seti. That's not a definitive proof of no problem, though. I'll be satisfied when it exceeds 36 hours of uptime.

One thing jumped out at me in your Einstein thread: you use Avira as your antivirus. I use Avira on this one computer, because it came preinstalled. I do, however, have an unused license for Bitdefender (which I use on my 2 laptops), so maybe I'll install that.

I have a theory that Seti sort of regulates Einstein. That is, the Einstein problem builds up over time, and when BOINC switches from Einstein to Seti, it clears up whatever the problem is so it never builds up to a large enough degree to cause a freeze/crash. I think tomorrow morning, I'll try letting it have another Einstein unit to test this theory (assuming it hasn't crashed on Seti before then). This theory is based on the crashes first starting to happen at about the time Seti went offline. (Otoh.... I never had a problem during the regular 3-day Seti outages, so maybe this theory is not even worth the effort I've put into typing it.)

David
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,138
RAC: 415
United States
Message 1054747 - Posted: 10 Dec 2010, 19:29:54 UTC - in response to Message 1054511.
Last modified: 10 Dec 2010, 19:31:12 UTC

Actually I do not see a slow creation or assimilation rate. The "tapes" are only going to put out work units so fast. The assimilation is not seriously backlogged, the queries are not out of line on Carolyn, and Jocelyn seems to be keeping up just fine. The results in the field are going up at a fair rate, and turn around time is not out of line. Can they do better? Probably, but give them a chance to tune it. The biggest thing is, we are working again, have not crashed, do not seem to be over-stressing the servers yet.
Once they get it all tuned, Oscar should be able to handle anything thumper did with lots of room to spare.

The most impressive change I have seen is we are getting our units through the transfers MUCH faster than I have seen. This seems to be saving bandwidth.

Further questions are probably best in number crunching. We pick everything apart in there.


Sorry to disagree, but the current WU creation rate of 10 or 12 per second is much slower than the 30 to 40 per second that we used to see.

As for assimiliation, it has always been slow, unable to keep up with validated work, and catch-up assimiliation took about a week after SETI went to sleep for the month. Look at the Scarecrow graph. Pending assimiliations are growing unabated, and SETI isn't even going full speed yet.

The benefit we are getting from the slow WU creation (and the longer period between "connect" (5 minutes instead of 10 seconds)) is that the internet link is not saturated, and downloads are proceeding extremely well. Probably, there are no (or very few) ghosts being created.

But no one answered the main thrust of my question. All you did was pick apart my contention that there is some slowness to work out.

My question was, "My logic tells me that the BOINC database server (Carolyn) is where the WUs are assigned among all the client computers (and probably affects the creation rates), while the Science database (Oscar) only comes into play at the time of assimiliation.

So, am I correct in the belief that tinkering with Oscar's DB settings would affect the assimiliation speed, and tinkering with Carolyn's DB settings would affect the WU creation rate (and deletions, purges, etc)?"

Thanks again for your responses.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 1054769 - Posted: 10 Dec 2010, 21:28:44 UTC - in response to Message 1054747.
Last modified: 10 Dec 2010, 21:29:15 UTC

The benefit we are getting from the slow WU creation (and the longer period between "connect" (5 minutes instead of 10 seconds)) is that the internet link is not saturated, and downloads are proceeding extremely well. Probably, there are no (or very few) ghosts being created.

Don't underestimate the value of this particular benefit.

If I worked in the lab, I would have spent the last month trying to find a way to slow things down so that 200,000 hosts don't max out the bandwidth just hammering for more work.

BOINC-equipped computers make an incredible botnet, and there is no real difference between hungry hosts and a DoS attack.

It seems like holding down the work creation rate is doing that nicely.

It could just be a happy accident, or it could be intentional. Either way, it seems to be working.

... and I think there is more to the answer to your other questions than just pointing at Carolyn or Oscar.

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3238
Credit: 31,690,868
RAC: 6,138
Netherlands
Message 1054776 - Posted: 10 Dec 2010, 21:55:16 UTC - in response to Message 1054769.

Apparently all seems to work well, since 1 work request, all 3 QUAD's
+ CUDA, GTS250 & GTX480, atm. got mostly CUDA (6.08) WU's.

A good job and the SERVERS seem to work OK.

Although I never had troubles with DownLoading WU's or executables.
Got alot of work for 3-4 days, thats my cache size.


____________

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,138
RAC: 415
United States
Message 1054828 - Posted: 11 Dec 2010, 1:48:39 UTC
Last modified: 11 Dec 2010, 2:12:33 UTC

The Berkeley Boys have found the magic cure!

They put Oscar on WU splitting duty, freeing up Vader to concentrate his efforts on the assimiliation task.

The result: Splitters are cranking out about 30+ WU per second and the Ready-To-Send queue is building nicely.

And the Waiting-to-Assimilate queue is dropping like a rock. Paradise at last!

Now you are getting your money's worth! Congratulations yet again !!! (Now I have server envy)

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Up (Dec 07 2010)

Copyright © 2014 University of California