Message boards :
Number crunching :
Please rise the limits... just a little...
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations. That is the case. When we had the glitch where VLARs were going to GPUs a few months back, my GTX460 & 560Ti had no problems with them & i had no system slow downs or sluggish response issues. Grant Darwin NT |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
I didn't have any GUI unresponsiveness or anything like that whenever VLAR WUs went on NV GPUs, pre-Fermi, but run-times were horribly long. With the current lack of WUs (maybe I'm just unlucky), I've redirected a single VLAR WU to an NV GPU, just to see what happens - also no lack of responsiveness, but the run-time looks to be very long. Judging by current progress, it'll be nearly an hour before it's completed, compared with 2 or 8 minutes for non-VLAR WUs. Ouch. Soli Deo Gloria |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
I'm aware of the crashing issue on lower-end GPUs. But obviously the concept of opportunity cost is completely lost. No matter, it's still the case that GPU calculations are not quite as precise as those made on the CPU. Precision is definitely not a problem. All actual OpenCL apps are at 99,9% accuracy. I`m almost certain its the same with all cuda builds. With each crime and every kindness we birth our future. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
I was referring to the inherent limitations of the hardware, not the applications. But I know a lot of hard work has gone into improving the accuracy of the applications - many thanks to all involved in that. Soli Deo Gloria |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
What on earth has a) VLAR not going to NV and b)the accuracy of the different apps to do with the current limits on task in progress?! i.e. with the topic of this thread?! But since the OP seems to be fine with it... So, after I've had a good scream in a dedicated place, I'll be with you and try to sift fact from fiction. A person who won't read has no advantage over one who can't read. (Mark Twain) |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Ok... Let me do the easier one first - precision. [on second thought it's not easy at all...] Disclaimer: I'm neither omniscient nor do I speak ex cathedra. Some points may end up slightly wrong. Don't blame me if your eyes glaze over. Any of you ever heard of the law of error propagation? Or of the different precisions computers can work at? How simple do I need to put it? Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result. For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place. Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision. That invariably leads to small discrepancies. This is most notable in the number of signals reported - when the signal is very close to the reporting threshold it may be reported in one app but not the other. The validator has been designed to cope with the inherent variance of computational calculations. In such cases it will get another opinion - and if the mismatch is only one signal (to be precise if more than 50% of the results match) both (all three) results will be valid. Obviously this shortcoming is annoying to an excellent engineer and programmer. That lead to improvements in the precision of the optimised app. x32f was an initial release - x38g already came with precision improvements that were further refined in x41g and x41zc. The aim was not so much to get better precision but to improve cross-platform accuracy. At this point it became clear that any further improvements to bring CPU and GPU results closer had to come from the CPU side. This led to 6.98 and 6.99 going into beta testing. (well looks like 6.99 is going to finally go out as 7.00 or so.) Sorry I neglected to mention ATI/OpenCL. Precision improvements similar to those developed for NV were incorporated a bit later. So when V7 finally reaches main all the applications will be as close in precision as the hardware allows and we should see far less cross-platform inconclusives. Some other issues that lead to inconclusives (such as the processing order) will be looked at in detail and hopefully addressed in the next development cycle. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
At risk of being declared off-topic... Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' "BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer." (New Scientist magazine) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place. Actually, my understanding is that the vast majority of the SETI search mathematics can be performed perfectly adequately at single precision: modern CPU floating point units (anyone else remember having to buy a separate 8087 or 80287 chip to do maths?) can all do IEEE 754 double precision maths, but it's slower. Single precision was chosen for speed where it was 'good enough'. But there are some (small) parts of the code where the errors build up to such a degree that they absolutely have to be done at double precision. No problem on a CPU, but as William says, the early GPUs didn't have double precision hardware support. Instead, an algorithm was used to simulate double precision where needed, but to start with it didn't fully reach the IEEE specification. Later CUDA optimisations (1) used double precision hardware where available, or (2) used a better (and fully IEEE-compliant) algorithm when emulation was needed on older cards. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' In short- Precision & accuracy aren't the same thing. A very rough definition- precision relates to the number of places you measure to. Accuracy is how close to the actual answer you are. eg a result of 103456.1 is very low in precision, but if the correct value is 103456.0 it is very accurate. A result of 102137.76532987643 is very precise, but with the correct value actually being 104356.0 it isn't very accurate. Grant Darwin NT |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
|
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 . . . processor : 243 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 apicid : 243 initial apicid : 243 fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr ht syscall lm rep_good nopl lahf_lm bogomips : 2190.18 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
$ cat /proc/cpuinfo Your point being? A person who won't read has no advantage over one who can't read. (Mark Twain) |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
$ cat /proc/cpuinfo Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...). |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
I can return GPU tasks as fast as every 20 seconds, queue drains in 30 minutes!!! |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65736 Credit: 55,293,173 RAC: 49 |
The 100 wu's I get last Me about 2 hours and 35 minutes on this Zotac closed loop liquid cooled Infinity Edition GTX580 and I have equipment for 2 more of these cards, that would be about 51 minutes each, so I hear ya, if it was 100 wu's per gpu I sure wouldn't mind... The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
I empathise and agree that the limits are to low and need to go up in value. As for the scheduled maintenance period, I can't get through that without things running dry. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
$ cat /proc/cpuinfo In that case you would simply run multiple instances of BOINC with each configured for a specific number of processors. I have considered running two instances on my 24 core box to give it a more then a few hours cache, but this way it gets some PG work done too. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.