Posts by AlphaLaser

1) Message boards : Number crunching : ok GPU geeks explain something to this geek. (Message 1213712) Posted 3 Apr 2012 by AlphaLaser Post: There are already "GCPU" available. It's AMD APU devices. They can run SETI on both CPU and GPU parts. OpenCL supported for GPU part (and for CPU too btw). Shared memory bus allows effective implementation on GPU for some of algorithms that suffer badly from relatively slow PCIe bus transfers before (our current SETI apps use bus transfers not much, but some benefit possible for them too, especially for heavely blanked AP tasks). Intel's Sandybridge as well. I don't think it has OpenCL support though. That will probably change very soon. CPU supports OpenCL, at least I can see it as a OpenCL device. Look at Sandybridge, Ivybridge, Haswell, and Broadwell on Wikipedia to see where they are headed. They're going towards System on a Chip, or SoC. I don't know how much they can put on a single die, but they do have superior process technology. SoC will help with the memory transfer rates. A big fat nVidia 580 would be nice, but you still got to shovel the data back and forth. OpenCL should be supported in Intel's Ivy Bridge IGPU (source), which should be available in a few months or so. Haswell should see more exciting gains in GPU performance and especially with AVX2 which adds more vector processing capability to the CPU core itself.
2) Message boards : Number crunching : ok GPU geeks explain something to this geek. (Message 1213522) Posted 3 Apr 2012 by AlphaLaser Post: In one way CPUs are latency-oriented architecture. They are focused on executing a small number of instruction streams as quickly as possible. It can be done with speculative out of order execution, the downside is that there is alot of transistors needed to make it work. GPUs tackle the compute problem by being throughput-oriented. Rather than make a few complex but fast cores, GPU consists of many much simpler "cores" (some people prefer to call them as vector or SIMD lanes instead by virtue of them being so simple). As mentioned they are so simple that they all can't execute independent instruction streams, and there are a number of other restrictions that can make GPUs ill-suited for certain tasks. It's why systems ultimately still need a CPU even if they are only executing GPU tasks--the CPU still needs to coordinate the GPUs and interact with other system components such as disks, network, etc.
3) Message boards : Number crunching : Is this a Boinc bug?????? (Message 1201394) Posted 1 Mar 2012 by AlphaLaser Post: Connectivity detection in Vista/7 is described here. http://technet.microsoft.com/en-us/library/cc766017%28WS.10%29.aspx Based on this article, loss of internet connectivity would not be detected until the OS re-polls the NCIS server, I am not sure how often it does this.
4) Message boards : Number crunching : Smartphone crunching (Message 1200260) Posted 27 Feb 2012 by AlphaLaser Post: also the 3960 has a much higher Mflops than you think. In other words you read the wrong part of the test. What in Gods name would make you think a handle held device had as much or more crunching power as a 6 core CPU Nowhere do I even suggest that, in fact I was hoping to show the opposite. Despite the recent innovations in smartphone SoC's there's still a large gap between them and desktop CPUs. What is interesting though is that those chips are quickly advancing and may eventually become powerful enough to be relevant for volunteer computing. Maybe not SETI at first as other projects may be less FP-intensive or have smaller, more manageable WU lengths. I think its odd that the amd FX-8150 out ran the 3960X on the FPU. Considering that the 8150 only has 4 FPU's and shares them. That is weird. In most benchmarks I've seen Bulldozer suffer due to the FPU contention even compared to Phenom II. For example:
5) Message boards : Number crunching : Smartphone crunching (Message 1200123) Posted 26 Feb 2012 by AlphaLaser Post: You are right that LINPACK may not offer the best method for comparison. I just looked up more benchmark results on x86 from LINPACK and found another two wildly different numbers. Core i7-3960X (Stock, HT On) - 64.9275 GFLOPS [url=http://www.xbitlabs.com/articles/cpu/display/core-i7-3960x-3930k_12.html]Core i7-3960X (4.5 GHz, HT On) - 133.9831 GFLOPS There is an OC in the second case but it wouldn't account for that much difference. I suspect different setups or benchmark settings prevent it from being a good apples-to-apples comparison to use when different people are running the benchmark. I'm not even counting your added cost for your bandwidth usage, etc that you'd incur because of the download/uploads that you'd encounter. figuring about 14 WU's a month comes out to 5.1Mb which is pretty much beyond the standard smartphone contracted bandwidth usage. For all that money wasted on a smartphone you could just as easily bought a cheap laptop and beaten your smartphone handily. 5.1Mb is not a big deal for many data plans from major carriers in the U.S., but even if so, the smartphone client could give the option to restrict data transfers to only when a WiFi connection is available. In my case my phone uses the router in the house when it's in range and the cell data plan when I'm elsewhere. It probably gets a WiFi connection often enough that a reasonable cache will keep it busy when there is no WiFi.
6) Message boards : Number crunching : Smartphone crunching (Message 1199860) Posted 26 Feb 2012 by AlphaLaser Post: Recently AnandTech posted a review of Qualcomm's Krait CPU, a 1.5 GHz ARM-compatible dual-core CPU on 28nm process that should appear in handsets later this year. Benchmarks show a substantial jump in scores from current smartphone chips. Linpack results (stresses FPU and memory subsystem): Single threaded: 106.794 MFLOPS Multi threaded: 218.197 MFLOPS To put into perspective with x86 CPUs. Phenom II @ 3.0 GHz: 1412.83 MFLOPS Core i7 860: 2004.31 MFLOPS (source)
7) Message boards : Number crunching : Smartphone crunching (Message 1196478) Posted 17 Feb 2012 by AlphaLaser Post: I think you mistake Moore law with the inability to make a CPU that is thermally stable without massive HSF or Watercooling. Current trends are to keep the HSF size as is for stock use. After market HSF/Watercooling allows users a much higher CPU speed. Likewise the reduction in current running through a processor is necessary because of the decreasing size of circuitry on a chip which has ever decreasingly sized of the space between the micro circuitry. So decreasing voltage and power is a logical progression to solve a problem. Reducing the TDP to 77W is not because keeping it the same would have not been technically feasible. From AnandTech review: I would say the reduction in TDP is the main reason why the specifications (not performance) are so similar to Sandy Bridge. If Intel had kept the 90W TDP, higher frequencies would have been likely and we might have even seen a hex-core part without a loss in frequency... The driver for reducing TDP is almost certainly about having sufficient market demand for such a hex-core chip vs. a similarly performing chip that's significantly more energy efficient.
8) Message boards : Number crunching : Smartphone crunching (Message 1196456) Posted 17 Feb 2012 by AlphaLaser Post: The grand assumption with smart phones is that suddenly CPU and GPU technology will suddenly stop advancing. That's hardly evident from current CPU types. Sandybridge and Bulldozer alike. GPU's also are advancing faster than ever before. To say that a smartphone is suddenly going to leapfrog over these technologies is silly. I wouldn't want a smartphone and only use my recently purchased laptop for traveling. The recent inundation of tablets and ipads are pretty much a fad. It's new and every status seeker is out to have the latest greatest gadget regardless of how outrageously its priced or how limited its capability. It looks flash so the little child must have that flashy item. Bah Not to say smartphones are going to leap over larger form factor computers, but I think if you look at technology trends, the focus going into the future is going to be a lot more about minimizing power consumption rather than increased performance. Just look at upcoming Intel Ivy Bridge CPU's--the top TDP bins have dropped down to 77W from current 95W Sandy Bridges, and current leaks only show marginal gains in stock frequency and IPC. Looking further ahead there is Haswell, which continues to incorporate more power saving features important for portable devices. Mainstream CPUs have not seen a "Moore's law"-like increase in core counts because the need for it is not there in the typical desktop market.
9) Message boards : Number crunching : Smartphone crunching (Message 1192899) Posted 9 Feb 2012 by AlphaLaser Post: Take that and the lack of the FPU and you get a very very slow CPU. Not true. ARM CPUs with VFP or NEON (SIMD) have hardware FP support. Also, Android will not be exclusively ARM in the future. Don't forget Intel's Medfield x86 platform, which looks very promising.
10) Message boards : Number crunching : Smartphone crunching (Message 1192688) Posted 9 Feb 2012 by AlphaLaser Post: Let's not forget Nvidia claims Tegra 3 to have roughly mobile Core 2 Duo performance. Not i7 performance but definitely significant.
11) Message boards : Technical News : Upwards and Onwards (May 28 2009) (Message 906572) Posted 12 Jun 2009 by AlphaLaser Post: Yeah, its not so much a problem with DCF itself but rather that it should initialize to near 1.0. Isn't there not a built-in mechanism on the server side to automatically adjust the data used for estimation sent out to newly attached hosts based on previously returned work by other hosts? Or perhaps doing that adds to much load? Otherwise, it would seem like a nice feature to have for some projects. Though, DCF would work much better on a per-app basis. I would go even further and say that perhaps projects should be able to define categories of work (for SETI, that would mean for each group of ARs) -- that would provide flexibility for projects which might have apps with different modes of operation or different categories of input.
12) Message boards : Number crunching : Optimization (Message 906571) Posted 12 Jun 2009 by AlphaLaser Post: One thing that can help on the processes front is to have BOINC running at the login screen, where you have fewer processes running such as explorer and those system tray icons. To be honest I don't recall the specific BOINC 6.x installer settings to enable that or if it's even default behavior now.
13) Message boards : Number crunching : Optimization (Message 906321) Posted 11 Jun 2009 by AlphaLaser Post: Could someone give some suggestions on programs to use to better optimize your computer to run while running SETI. I already use something called game booster that stops unnecessary windows processes etc. But, was wondering what else is being used to make my computer run leaner and meaner, I'm running vista and I know vista has been a real pain at times it seems like its hogging resources out of nowhere. Hi, for Vista one thing I recommend doing on a dedicated cruncher is to disable Aero interface. Everything will look more like the "Windows 95" UI but you will notice more free RAM and a more responsive interface. It might also impact CUDA crunching (I haven't verified this though). This can be done by: 1) Go to Control Panel 2) Click System and Maintenance 3) Click Performance Information and Tools 4) On the left pane, click Adjust visual effects 5) On the Visual effects tab, click the button next to Adjust for best performance
14) Message boards : Technical News : Upwards and Onwards (May 28 2009) (Message 906083) Posted 11 Jun 2009 by AlphaLaser Post: Don't forget that the first AP task that user will get will be with the stock app (so ~80 hours for that machine), but it will be estimated by BOINC at DCF=1.0, rather than the DCF=~0.4 typical for stock AP. If stock apps don't have DCF=~1, it means the run time estimate is WRONG, and the admins should fix it. Actually, while I agree that DCF should be around 1.0, there is an advantage: A brand-new computer defaults to 1.0, which means the first work requests will be small, and grow as DCF converges on the right number. Agreed, but with some concern about AP tasks. A new host would see the AP tasks at about 2.5 times the actual time. As the tasks are long to start with this could lead the owner who only wants to do a limited number of hours to assume it is more time than they wish to contribute within the deadline limit. I have to agree with Nicolas. Most new users aren't gonna know about the existence of DCF among other things and it is reasonable for them to expect for the initial estimates to be nearly correct from the get-go. For SETI the runtimes are generally predictable ahead of time and in the worse case your tasks end sooner rather than later (-9 overflows and such). Keeping in mind this is a "first impressions" moment, users ought to get intuitive feedback from the client with minimal surprises. Using the DCF to deal with overfetch from new and "untrusted" users sounds like a kludge when we should be thinking about maybe using some other dedicated mechanisms for handling it.
15) Message boards : Technical News : CPU count (May 13 2009) (Message 904669) Posted 7 Jun 2009 by AlphaLaser Post: How does a CPU just die and the server continue to function? Some very high end servers can also implement Lockstep processing. It's somewhat analogous to RAID 1 but for CPU's. Likewise, similar technology exists for RAM redundacy such as Chipkill, which enables recovery from a hardware failure of an entire RAM module. SETI doesn't necessarily use these but it goes to show that sometimes what makes server platforms unique and not just a rebranding of consumer chips is their certified ability to recover from disaster scenarios that are usually showstoppers for a normal desktop. In IT world it is called Reliability, Availability, and Serviceablity (RAS) and is important in mission critical and business mainframes.
16) Message boards : Technical News : New Toys (Jun 01 2009) (Message 903832) Posted 5 Jun 2009 by AlphaLaser Post: 6 cores? Sounds like the Penryn-derived Dunnington. If you look at the die in the linked article, its monstrous--basically three Core 2 dies glued together along with some extra L3 cache. It should serve the project well!
17) Message boards : Technical News : Comment Control (Jun 03 2009) (Message 903829) Posted 5 Jun 2009 by AlphaLaser Post: It would also be possible to generate a local timezone date using JavaScript...and of course use some standard like GMT time as a fallback when JavaScript is not available.
18) Message boards : Number crunching : I guess the burn-in is over? (Message 903472) Posted 4 Jun 2009 by AlphaLaser Post: Here's the docs on available prefetch instructions specified by SSE: - prefetcht0 - prefetcht1 - prefetcht2 - prefetchnta
19) Message boards : Number crunching : I guess the burn-in is over? (Message 902759) Posted 2 Jun 2009 by AlphaLaser Post: since for example the L3 did not exist on most x86 machines until recently. I believe the first appearance of L3 cache on the x86 architecture is when the K6-III was introduced with having L2 cache built into the processor, the L2 cache on the motherboard became the L3 cache. The first sighting of L3 cache on Intel chips as far as I can remember was the Pentium 4 Extreme Edition on the Socket 478 platform. These chips had 16K L1, 512K L2 and 2MB L3. Back then, no additional coding needed to be done to existing applications to take advantage of the third level cache. Interesting you note the K6-III. L3 was also used on some high-end Xeons and very large amounts are used on Itanium's. With AMD's new Phenoms and the Core i5/i7, L3's is becoming more of a "standard" feature and not just exclusive to enthusiast parts like the P4EE or server chips.
20) Message boards : Number crunching : I guess the burn-in is over? (Message 902715) Posted 2 Jun 2009 by AlphaLaser Post: I believe there's limited cache control in the x86 ISA, for example the OS can prevent certain memory regions from being cached or it can change the writeback caching policy. My speculation is that very few, if any, of these options are available for user-mode programs like SETI to control. Also, its probable that those features keep a great deal of the actual hardware implementation abstracted away, since for example the L3 did not exist on most x86 machines until recently. Most cache-related optimizations can be made by various loop optimizations and using data access patterns that help the CPU prefetch and cache the data. This Locality of reference article explains the principles behind caching pretty well.

Next 20

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.