Feature Rich (Jan 14 2008)

Message boards : Technical News : Feature Rich (Jan 14 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 700088 - Posted: 14 Jan 2008, 22:23:56 UTC

Things ran quite well over the weekend. Looks like we added the right index to the mysql database to reduce the slow "validator fix" queries. A note about general BOINC/mysql implementation/design: there are a lot of features in BOINC that are seemingly excessive from a single-project perspetive, but are there as every project has different needs. Project-specific factors (server power, workunit processing times, number of active users, min quorum, etc.) make some features less helpful. In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since. There were assumptions this would lead to greater problems down the line (fearing many results will be sitting on disk longer waiting for their redundant pairing to return) but in fact our "results returned and waiting for validation" number has been stable (if not slowly decreasing) since I made the change. Nevertheless, at some point soon we will see if we could optimize/reimplement this code, and Eric is actually making adjustments to the splitter which will perhaps create less "fast runners."

Our new-hardware-to-obtain priorities are shifting. Namely, we need a router (we're not ignoring discussion about this on other threads but we are limited to what we can use for various configuration/policy reasons). We also need a new KVM - our current one in the closet is maxed out and we'd like to get more stuff in the there ASAP. We also need three new desktop systems. Dan's using an old, sloooow solaris system which is out of support. Bob is on a slightly faster solaris system, but needs a safe mysql test sandbox. Josh's old super-cheap windows/intel box is basically a glorified console server.

Had some minor issues due to the root drive on bruno filling up on Sunday. I scanned the drive and found only 4GB of stuff, while "df" was showing 40GB. Eric eventually found a deleted-yet-open file - an infinitely growing httpd log. Apparently httpd log rotation broke at some point, but we cleaned this up. Annoying, but harmless.

Due to increased load in general, I changed the server db stats to update every hour (instead of half hour). Actually it's becoming clearer as we increase active user load and I'm populating credited_job, etc. that the mysql database might be our bottleneck du jour any jour now. There were also some issues with the user-of-the-day selection process which I tracked down and fixed this morning.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 700088 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 700105 - Posted: 14 Jan 2008, 23:43:38 UTC


Thanks for the Post Matt - good job being done by All others @ Berkeley too . . .

< best of luck with the router prob. - hopefully it's a fix in the nearby future


BOINC Wiki . . .

Science Status Page . . .
ID: 700105 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 700161 - Posted: 15 Jan 2008, 4:23:55 UTC - in response to Message 700088.  
Last modified: 15 Jan 2008, 4:29:23 UTC

In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since. There were assumptions this would lead to greater problems down the line (fearing many results will be sitting on disk longer waiting for their redundant pairing to return) but in fact our "results returned and waiting for validation" number has been stable (if not slowly decreasing) since I made the change. Nevertheless, at some point soon we will see if we could optimize/reimplement this code, and Eric is actually making adjustments to the splitter which will perhaps create less "fast runners."


I assume you missed my "complaint" about a problem with 22fe07ah? I had 23 results from that dataset that all 23 completed in under 90 seconds (3.7ish seconds a piece). The reduction in the stat you're referencing would be greatly influenced by something such as this, as they were all "fast runners" anyway (they already had short deadlines), and thus probably bumped people with larger caches into "High Priority" / "Earliest Deadline First", so you'd get a whole slew of results coming back in pretty quickly, thus dropping that figure and giving a transient drop in the turnaround times.

Also, the resends only matter if someone has had a problem and blown away their local data files, while the server still knows that they are supposed to have a result, either that or there were blips in the download process like what was experienced back in May/June '07 here where the server tagged a host as having a task, but the download server never sent it to the host.

IOW, the stability and/or drop in that stat may not mean anything at all in the grand scheme of things... Way too early to tell...IMO.

I decided to grab some more work on my AMD today and out of 16, 5 of them are guaranteed "fast runners" (short deadline). Beyond that, who knows how many may be noisy and overflow? It could be 0, or it could be 5-10 more... (I've finished 1 out of the set so far).


Our new-hardware-to-obtain priorities are shifting. Namely, we need a router (we're not ignoring discussion about this on other threads but we are limited to what we can use for various configuration/policy reasons).


The 3800 series would be good, but you may want the 7200 series... I wouldn't stay within the 2800 series, but politics may come into play...


Due to increased load in general, I changed the server db stats to update every hour (instead of half hour). Actually it's becoming clearer as we increase active user load and I'm populating credited_job, etc. that the mysql database might be our bottleneck du jour any jour now. There were also some issues with the user-of-the-day selection process which I tracked down and fixed this morning.


The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-)
ID: 700161 · Report as offensive
Profile Neil Blaikie
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 143
Credit: 6,652,341
RAC: 0
Canada
Message 700168 - Posted: 15 Jan 2008, 4:47:03 UTC

Since getting new work in the early hours of today, I have returned 44 results all with completion times of just over an hour with deadlines on January 22nd, out of 165 tasks still to process a lot of them are all going to finish in just over an hour on my machine.

Looking briefly at some of them as well, my wingman on some of them will not complete before the 22nd January deadline.

A little annoying on a fast machine that is on 24/7 running BOINC that it is not sending work that will at least tax the dual processors a bit. Not that I really care much about RAC but that is dropping like a stone as well with so many fast results.

Anyways the results are all from 12ja07ad 8995 and 12ja07af 8995 if anyone cares.

Nite nite from Montreal, off to get some shut eye and then be probed by a doctor during a medical exam tomorrow. Maybe if aliens came and took me away tonight they could save me the hassle of driving so damn far to the docs and beam the results to my doc.
ID: 700168 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 700258 - Posted: 15 Jan 2008, 15:38:01 UTC

What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus?
ID: 700258 · Report as offensive
Tony Li

Send message
Joined: 21 May 01
Posts: 6
Credit: 1,337,747
RAC: 0
United States
Message 700318 - Posted: 16 Jan 2008, 1:39:44 UTC - in response to Message 700258.  

What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus?


Indeed. Please let me know what could work and let me see what I can do.

ID: 700318 · Report as offensive
seti@elrcastor.com
Volunteer tester

Send message
Joined: 30 Jan 00
Posts: 35
Credit: 4,879,559
RAC: 0
United States
Message 700404 - Posted: 16 Jan 2008, 7:08:53 UTC - in response to Message 700318.  

What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus?


Indeed. Please let me know what could work and let me see what I can do.


A Cisco 3825 would probably be a step in the right direction
ID: 700404 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 700452 - Posted: 16 Jan 2008, 11:44:16 UTC - in response to Message 700404.  

What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus?


Indeed. Please let me know what could work and let me see what I can do.


A Cisco 3825 would probably be a step in the right direction


> best price: CISCO 3825 INTEGRATED SERVICE ROUTER W/AC PWR (MPN: CISCO3825) $5,518.00 - No Tax - Free Shipping from Corporate Computer Solutions

right?

BOINC Wiki . . .

Science Status Page . . .
ID: 700452 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 700500 - Posted: 16 Jan 2008, 14:24:04 UTC - in response to Message 700404.  

What kind of router would be a suitable replacement? Anything particular in mind that would be "approved" by campus?


Indeed. Please let me know what could work and let me see what I can do.


A Cisco 3825 would probably be a step in the right direction


But is that "campus approved"?
ID: 700500 · Report as offensive
seti@elrcastor.com
Volunteer tester

Send message
Joined: 30 Jan 00
Posts: 35
Credit: 4,879,559
RAC: 0
United States
Message 700533 - Posted: 16 Jan 2008, 16:18:15 UTC

probably, if they liked the 2811 they should defenitly like the 3825
ID: 700533 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 700552 - Posted: 16 Jan 2008, 17:15:21 UTC

Cisco's 3800 series model comparison shows that the 3800 series of routers would be inadequate for someone who needs at least 100Mbps of routing speed, because the 3825 maxes out at 1/2 the speed of a T3(22.5Mbps), and the 3845 maxes out at the speed of a full T3 (45Mbps). I think that you may want to investigate the 7200 series of routers. The 7600 series of routers seem to be overkill for your application, unless you have plans to get Internet speeds beyond OC-48.
ID: 700552 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 700589 - Posted: 16 Jan 2008, 18:20:46 UTC

For the record:

1. We have a 7301 on the other end of the tunnel. It's handling the load just fine (average 40% cpu load, compared to 90% of the 2811). So the 7301 may seem like overkill but we plan on eventually doubling (tripling? quadrupling?) our bandwidth capabilities at some point.

2. Part of the consideration is that in the closet we have a 2811, a switch for machines in the closet going into the 2811, and a switch for inter-closet traffic with an uplink to the campus LAN. One hefty unit can combine 2 of these functions, if not all three (needs at lest 36 ports, though, if not 48).

3. Just so we're clear campus isn't the entity holding us back in our selection process. They have suggestions which brandname/specs of hardware to use in any given situation, but so far has been quite willing to work with whatever we come up with. In some cases they are more strict about what to do and what to use, but we tend to agree with them.

4. If there's any politics, policy, etc. it has to do with stuff I'm less comfortable about discussing publicly - namely our various benefactors who helped us in the past and may perhaps again in the future.

I (actually all of us) have been bogged down with other fires lately, but we'll address this issue at some point, and will let you know exactly what we want if we need help. Until then, I vastly appreciate the informative comments. I have zero time to do any research.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 700589 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 700699 - Posted: 16 Jan 2008, 23:57:07 UTC - in response to Message 700589.  

For the record:

1. We have a 7301 on the other end of the tunnel. It's handling the load just fine (average 40% cpu load, compared to 90% of the 2811). So the 7301 may seem like overkill but we plan on eventually doubling (tripling? quadrupling?) our bandwidth capabilities at some point.


Yes, always a balancing act between too much and too little info... :-)

I was leaning more towards the 7200 series with a possible "overkill" of the 7600 series, but since you're already talking those classes, you might want to go ahead and go for 7600s and/or Catalysts...
ID: 700699 · Report as offensive
whawn

Send message
Joined: 11 Apr 00
Posts: 18
Credit: 1,053,191
RAC: 2
United States
Message 700767 - Posted: 17 Jan 2008, 6:30:37 UTC - in response to Message 700161.  

The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-)


Well, I dunno if Matt will want to, but Sun Microsystems Will probably have a little to say on the subject, since it looks like that company is buying mySQL:
press release
ID: 700767 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 700787 - Posted: 17 Jan 2008, 9:38:59 UTC - in response to Message 700767.  

The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-)


Well, I dunno if Matt will want to, but Sun Microsystems Will probably have a little to say on the subject, since it looks like that company is buying mySQL:
press release


full text: Sun Microsystems Announces Agreement to Acquire MySQL


BOINC Wiki . . .

Science Status Page . . .
ID: 700787 · Report as offensive
Profile Jan Schotsmans
Avatar

Send message
Joined: 27 Oct 00
Posts: 98
Credit: 92,693
RAC: 0
Belgium
Message 700809 - Posted: 17 Jan 2008, 14:42:39 UTC - in response to Message 700767.  

The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-)


Considering Google, Nokia, Facebook and alot of other big names use MySQL, I think your professor is full of the brown stuff and talking out of his old outdated behind.
ID: 700809 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 715108 - Posted: 19 Feb 2008, 2:26:24 UTC - in response to Message 700088.  
Last modified: 19 Feb 2008, 2:27:10 UTC

In the case of "resend lost workunits" (see last thread) this feature, implemented mostly for the benefit of Einstein@home, was most definitely weighing down our database server. We turned this off and have been running smoothly since... Nevertheless, at some point soon we will see if we could optimize/reimplement this code...

I realise the whole team is always under a lot of stress and heavy workloads, but I was just wondering if this feature could be switched back on? Or is the database server still in a tricky state? I found this feature really useful back when it was still active, and right now I have eight 'lost' workunits which are otherwise going to end with a "No Reply"... :(

It's not a tragedy if you can't switch it back on right now, but I would appreciate it if you would, even for a moment, consider the possibility. Thanks.
Soli Deo Gloria
ID: 715108 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 715334 - Posted: 19 Feb 2008, 16:23:59 UTC - in response to Message 700161.  
Last modified: 19 Feb 2008, 16:42:18 UTC

The professor at school said that "nobody" would run an enterprise operation on anything but either SQL Server 2005, Oracle, DB2, or equivalent...and that mySQL would be "Mom and Pop" types... I don't know enough to debate it, but perhaps you could do some rebuttal? :-)


He is wrong.

While good for "Mom and Pop" types, it is also good for industrial enterprise operations and is more accessible than MSSQL etc. There are a large number of programmers familiar with it so driving down development costs and making it more attractive. However, real MySQL gurus are still worth their weight in platinum...

I am CTO of a company developing social networking applications, and set up a mysql cluster of multiple masters dealing with thousands of queries a second. The company's future is staked on it, and like Facebook et al, it is hardly a Mom and Pop operation!
ID: 715334 · Report as offensive
Yellow Horror

Send message
Joined: 10 Jun 03
Posts: 3
Credit: 10,157,045
RAC: 7
Russia
Message 722778 - Posted: 6 Mar 2008, 22:24:26 UTC - in response to Message 715108.  

I was just wondering if this feature could be switched back on? Or is the database server still in a tricky state? I found this feature really useful back when it was still active, and right now I have eight 'lost' workunits which are otherwise going to end with a "No Reply"... :(

It's not a tragedy if you can't switch it back on right now, but I would appreciate it if you would, even for a moment, consider the possibility. Thanks.
+1.
I've returned to the SETI@home project after a long gap and, being unfamiliar with the new interface and too curious, press "Reset Project" button before the pop-up warning about losing my workunits appears. Then i press "Yes" in the plain Yes/No dialogue (why in the world there is no BIG RED WARNING symbol it this dialogue?!) and whoa - all my 26 downloaded tasks have disappeared in the thin Æther and therefore are stuck on the server awaiting their deadlines. I know well that is not a tragedy to the project nor to me. But why don't just allow me to correct the subsequences of my inadvertence somehow?

ID: 722778 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 724186 - Posted: 10 Mar 2008, 14:13:02 UTC


. . . Here's a Prayer for Matt (he's been out sick) and the Hope that he's Better Today


BOINC Wiki . . .

Science Status Page . . .
ID: 724186 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Feature Rich (Jan 14 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.