Posts by SunMicrosystemsLLG

1) Message boards : Number crunching : s@h gets a new Sun server (Message 382192)
Posted 30 Jul 2006 by Profile SunMicrosystemsLLG

Oh, I thought that somebody at Sun had a weak spot for Seti@Home and then offered a "beta test" server. I've never heard about a "beta test" server before, beta test software yes, but no beta test hardware. Either it works or it doesn't...

It's not really a beta test for the hardware, just a 'try and buy' type scheme. It allows the customer to find out if the hardware is suited to their job (beta test it's usefulness) rather than find out if the hardware actually works ...
Since SETI is research/academia, Sun (and other manufacturers) usually look favourably on these tests to take place. Hopefully for (massively) extended periods of time.
2) Message boards : Number crunching : s@h gets a new Sun server (Message 379684)
Posted 27 Jul 2006 by Profile SunMicrosystemsLLG
I haven't even had a chance to play with one of these 'Thumpers' yet...

Hopefully SETI will get to keep it - I think Sun's current Try & Buy program allows for some machines to stay with testers who put them to good use (the good use benefits Sun AND the end user).
3) Message boards : Number crunching : Sun Grid (Message 283087)
Posted 16 Apr 2006 by Profile SunMicrosystemsLLG

We were part of the engineering team that was involved in some of the early work on the SunGrid and from time to time we used to put the grids (2x 256 CPU Opteron clusters) onto SETI while it wasn't being used.

There is little or no modification required to get BOINC/SETI to run on the SunGrid, you just need a scheduler, such as N1 Grid Engine, to distribute the jobs across the nodes.
Each node runs it's own instance of the app - much the same way as S@H member's SETI farms run the app.
Splitting an individual work unit and sharing at across the grid would have been a big task, so it makes sense to just run the app the way BOINC intended.

SunGrid doesn't force the customer to use it's own distribution method.
There are 2 flavours of SunGrid - the Utility Grid ( and the Commercial Grid. The commerical grid offers more flexability, where after some initial configuration work by Sun and the customer a re-buildable environment is created which the customer can return to time and time again on a contract basis to run their apps. This environment can be Solaris or Linux, MPI enabled, N1 Grid Engine enable, etc, etc.
Sun will work with the customer to create the grid framework they need.

But to go back to your first point, using SunGrid for S@H (if you have to pay for it ...) is probably had to justify.
We earned the majority of our credit over a period of 2-3 months last year, we worked out that if we had to pay for the time we used it would have been around $100,000. The pricing model and infrastructure is designed to offer a competitive alternative for a company investing in and running a datacentre - not a publically accessible and ultimately free infrastructure such as BOINC.

(formerly SunGrid_LLG_ORT)
4) Questions and Answers : Unix/Linux : Problem running Boinc on T2000 and Solaris 10 ! (Message 261630)
Posted 13 Mar 2006 by Profile SunMicrosystemsLLG

We also had some problems trying to get BOINC to benchmark on T2000s.

Since the UltraSPARC T1 CPU has upto 8 physical cores and 4 threads executing per core; 32 'virtual CPUs' are presented to Solaris.
This means the BOINC application tries to lauch 32 instances of the benchmarking process.
Unfortunatley, the US T1 only has 1 FPU shared between the 8 physical cores, so all 32 benchmarking processes are fighting over 1 floating point unit.

The only way around it that I got to work was to set up a zone on the system and add a processor set to the zone which only contained a single CPU - This way the BOINC app would only launch one benchmarking process and there would be no contention at the FPU.
Once the benchmarking process had finished I could then add more CPUs to the processor set - be careful not to add too many, I added CPUs 0, 4, 8 & 12 (virtual CPUs from different cores) so that each vCPU assigned to SETI had it's own L1 cache.

I didn't get as far as assigning 1 vCPU from each core, so I'm not sure how performance degrades as you increase the vCPU count.

Hope this helps.
5) Message boards : Number crunching : CPU Benchmarking on multi-core CPUs (Message 238853)
Posted 28 Jan 2006 by Profile SunMicrosystemsLLG
An idea might be to go into your account on the web page and set it to only 1 cpu. Then after that cpu is benchmarked set it to 2 cpus, then 3 cpus, etc.

I don't think BOINC looks at the CPU count in your preferences when benchmarking. I can see 32 benchmarking processes from the 'prstat' output under Solaris when I start BOINC and it tells me it's running benchmarks, but my preferences limit S@H to a maximum of 2 processors.

Have you tried using the generic, non optimized, version of the client first then loading the optimized version?

Not yet, I already had Stefan's clients after using them on a another SPARC box. I'll have a go with another client...
6) Message boards : Number crunching : CPU Benchmarking on multi-core CPUs (Message 238820)
Posted 28 Jan 2006 by Profile SunMicrosystemsLLG
I'm trying to run S@H on a few SunFire T2000s we have here in the lab but I'm having some problems getting past the benchmarking stage of the application.

I'm trying to run Stefan Urbat's boinc_4.19_sparc-sun-solaris2.7 and his boinc_4.43_ultrasparc3VIS-sun-solaris2.9 clients, both with no luck ..

The CPU inside the server is the new UltraSPARC T1, which has 8 cores and 4 concurrent threads per core - meaning the OS sees the processor as 32 CPUs ... Unfortunately this means 32 instances of the benchmarking program are fired up when I first connect to SETI, but because the CPU only has one FPU shared between 8 cores the benchmarking continually times out as 32 benchmarking threads are fighting over 1 FPU ...

The single time the benchmarking completed on 2 of the boxes it rated the CPU at:

Measured floating point speed 10 million ops/sec
Measured integer speed 10 million ops/sec

Unfortunately I lost the .xml on those 2 hosts due to another problem and I now have 3 machines which have been running and restarting benchmarks for the past 12 hours ...

Is there anything I can do to either;

1) Make the BOINC/S@H app run the benchmark against less 'CPUs' ?
(I've tried reducing the 'p_ncpus' field in client_state.xml to a lower number but the client changes it back everything I start BOINC).

2) Add something to the .xml config files so it skips the benchmarking completely. Even if it means putting the 10 MFLOPS rating the machine previously got back into the files.

The plan, when these machines are eventually up and running is to only run 4 , maybe 8, instances of SETI across the 32 virtual CPUs, but I can't get that far because the benchmarking program can't complete.

I'm considering setting up processor pools and Solaris Zones to restrict boinc to few CPUs and see if the I can get the benchmarking to run sucessfully, but I'm wondering if anyone knows of an easier way ?
7) Message boards : Number crunching : Intel “Paxville” Dual Core Xeon (Message 182543)
Posted 26 Oct 2005 by Profile SunMicrosystemsLLG

Why would you want something that slow?

XT3s are slow ?

Crikey, what are you running ?
8) Message boards : Number crunching : RAC Race again!!! (Message 181176)
Posted 22 Oct 2005 by Profile SunMicrosystemsLLG

Well our application is perhaps a little worse, but processing the same input over and over when we could be contributing a little bit for the same load seemed a little crazy. Assuming things go well this weekend, we'll start ramp-up on our app at the beginning of next week, then seti will be a thing of the past for these machines.

So far we've had only a couple of machines early life fail from this load, very impressive.

We are in a similar situation, except we don't know what applications we'll be running from one project to the next.

So far no hard failures but it has allowed us to do some thermal profiling, check our power requirements and flag some minor issues we may not have found until we started running some 'proper' work.

Hopefully we'll be getting some brand new machines in a month or so, so we'll probably put them to work on S@H to burn them in ...
9) Message boards : Number crunching : RAC Race again!!! (Message 181063)
Posted 22 Oct 2005 by Profile SunMicrosystemsLLG

I'm guessing that seti provides a good simulation of a large scale render farm for things like heat output, particularly if they're rack mounted servers that may be prone to overheating.

Yep, out of all the jobs we have ran on our grid in the last 6-7 months (since we built it), S@H/BOINC has sustained the highest CPU usage and therefore thermal load of any of the applications.
We figure if it can survive SETI it will pretty much survive anything ...
10) Message boards : Number crunching : Thinking about a stack... Would this work? (Message 172569)
Posted 28 Sep 2005 by Profile SunMicrosystemsLLG
For those out there with experience replacing farm machines, would this be a "new" computer to the project?

I think so.

Everytime we rebuild the 'SETI OS Image' for the grid (which boots diskless) we get new host IDs in the computer list.
(That's why I merged all of my old, inactive hosts and messed up the computer stats ...)
11) Message boards : Number crunching : Thinking about a stack... Would this work? (Message 172371)
Posted 27 Sep 2005 by Profile SunMicrosystemsLLG
You could use a combination of PXE boot and flash drives.
Boot the OS over the network, and install BOINC, the science apps and WUs on the flash drive, then you won't loose work.

If you are going to boot the OS over a network you might as well NFS mount a working directory on each node form the boot server and let the nodes run BOINC and save work to that directory.

Then everything in centralised, which saves trying to manage (and restore) the nodes from many flash drives.
12) Message boards : Number crunching : CPU loading and thermal stress (Message 171813)
Posted 25 Sep 2005 by Profile SunMicrosystemsLLG
Well the whole point of us subscribing to S@H is to perform reliability testing.

Although we normally only have the nodes running Seti for a week at a time the boxes are in near constant use on a variety of jobs. This does mean we are more likely to see early life failures rather than long term 'wear and tear', but we have yet to see any problems at all caused by heat and constant 100% CPU loading.

The idle temps of the CPUs are normally ~40DegC and during crunching it varies between 48-56DegC, depending on the servers location in the rack. That is with passive heatsinks and system fans.

Long term it might have a slight effect, but probably no more than powering your machine on and off (heat and power cycling) rather than leaving it on to crunch.
13) Message boards : Number crunching : What is the Fastest RAC for a Single PC you have seen? (Message 171617)
Posted 25 Sep 2005 by Profile SunMicrosystemsLLG

I worked out why! (I think).

By any chance, did you merge them ALL into that 1 host?

It seems as though it ought to be possible.

Yeah, thats what I said ... looking back it even seemed like a stupid idea at the time ... it cleaned up our list, but it has obviously messed up the S@H stats for the top hosts.
14) Message boards : Number crunching : What is the Fastest RAC for a Single PC you have seen? (Message 171615)
Posted 25 Sep 2005 by Profile SunMicrosystemsLLG
The first two seem a bit fishy to me, but the best machines average about 1700 credits a day.


The top 'host' in that list is mine ... Everytime we put the grid back onto S@H it generates new host IDs for each machine (Each node boots diskless, a new copy of it's diskless image is created each time we re-attach to the project and S@H thinks it's a new machine).
We merged all of the old, redunant hosts to clean up our computer list - there were hundreds and hundreds of inactive host IDs ... unfortunately we didn't think it through properly and now we appear to have one dual processor Opteron box which has been online less than a week and crunched well over one million credits with a RAC of over 45,000.
Obviously it hasn't, but I'm not sure there is an easy way to fix it.

So, yes, you can certainly ignore the RAC from that host.
15) Message boards : Number crunching : Seti Farms / Stacks Gallery & Questions! (Message 171299)
Posted 24 Sep 2005 by Profile SunMicrosystemsLLG

Sempron= AthlonXP.

LOL no, I wish!

GO to AMD's website and read about it! At the same cock speed (~2ghz) my XP2600 Crunches work untits in about 2hrs5min and the Sempron in 3hrs 10min so there is quite a difference!

Semprons are AthlonXPs with either the Thoroughbred B or Barton cores and the same cache.
But while the Barton XPs had a 200Mhz FSB all Semprons are 166Mhz.

Also, the Sempron numbering scheme isn't directly comparable to the AthlonXP number scheme - if you check benchmarks of AthlonXP 3000+ against Sempron 3000+ there is a significant difference. The AthlonXPs number relates to a performance rating for mainstream desktop CPUs (P4, Athlon XP) and the Sempron's ratings relates to performance ratings for budget CPUs (Celeron, Sempron).

There is a CPU History Chart on Tom's Hardware.
16) Message boards : Number crunching : Closed*** SETI/BOINC Milestones (TM) III ***Closed (Message 171260)
Posted 24 Sep 2005 by Profile SunMicrosystemsLLG
Now I have reached the very first place in the Spanish Ranking.
My current credits are above 790K

You can check it here


I've just passed 1,000,000 SETI credits and 30,000 RAC overnight.
17) Message boards : Number crunching : How long does it take you? (Message 171152)
Posted 24 Sep 2005 by Profile SunMicrosystemsLLG
On average, approx 1 hr 30 mins per WU on a 2.6 Ghz AMD Opteron 252 (similar clockspeed to an Athlon64 FX-55) running 100% S@H.

Although that is on a standard Solaris 10 x86 client from Stefan Urbat's Site and I'm not sure how much optimisation has gone into it - so I don't know if that is confusing the list rather than confirming anything ...
18) Message boards : Number crunching : Seti Farms / Stacks Gallery & Questions! (Message 170379)
Posted 21 Sep 2005 by Profile SunMicrosystemsLLG

My Unix skills have gotten mighty rusty because I now work for a company with a more-than-one-person IT department, and I'm not the server guy... but how do you have several machines boot off the same NFS file system yet retain their own individual settings (they would all think they have the same IP address, current work unit, etc.)? Is there an instance of this NFS image for each machine spawned dynamically on the server? Or does the NFS hold the relatively static stuff and a RAMdisk holds the "personalized" information?

Very basically, the bits of the OS that are common to each host (/opt, /usr for example) can be mounted from one, read only, directory on the NFS server. All the bits that need to be specific to each host (/etc, /var for example) are cloned, configured with each host's information and then each host mounts their own host specific directories.
Each host does not mount one entire OS image.

That way BOINC can be placed in /opt (with it's data directory elsewhere since it's ro) and an update to the BOINC install on the NFS server means all hosts immediatley receive the update and just require a restart.

It's similar to the way RedHat's redhat-config-netboot utility works with 'snapshot' NFS mounts for each client supplementing a 'global' OS image.
19) Message boards : Number crunching : Seti Farms / Stacks Gallery & Questions! (Message 170305)
Posted 21 Sep 2005 by Profile SunMicrosystemsLLG

If the server has such an image available, it assigns an address to the booting machine and sends it the image. The image is copied into RAM and then executed like a virtual boot drive.

You can also configure PXE to allow your machines to boot off an NFS server rather than use a RAMDisk which is useful if you don't have a huge amount of RAM in your machine or if you want to be able to change the 'live' image your compute nodes are booted from.
We boot 128 nodes over NFS from one server, performance is still good and all the system RAM is still available for the application.
20) Message boards : Number crunching : Closed*** SETI/BOINC Milestones (TM) III ***Closed (Message 169969)
Posted 20 Sep 2005 by Profile SunMicrosystemsLLG
750,000 credits on the same day that I'm User of the Day.

Next 20

©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.