Message boards :
Number crunching :
Please rise the limits... just a little...
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
![]() Volunteer tester Send message Joined: 25 Jun 00 Posts: 126 Credit: 4,968,173 RAC: 0 ![]() |
My two cents worth for this topic...... I think an increase of wu alloted is a good idea, HOWEVER!! i think is should only be implimented for and based on systems and users them meet X criteria where X is a numerical value based on how many wu said user and machines have done and how long said user and machine have been crunching. example a+b+c=x a = number of years user has been actively crunching (if you started crunching in 2000 but have only been actively crunching since 2008 a would = 5) b = number of years or months of active crunching machine has been crunching c = number of processors usable for crunching this way only people with proven records get more work. anybody else think this is a good way to go? I RTFM and it was WYSIWYG then i found out it was a PEBKAC error |
Wedge009 Volunteer tester ![]() Send message Joined: 3 Apr 99 Posts: 448 Credit: 302,497,264 RAC: 91,374 ![]() ![]() |
In that case GPUs shouldn't be used all then. ~sigh~ This is not a fair conclusion to come to. GPUs are less precise than CPUs, but in most cases they are still 'good enough'. A problem arises when there are results returned to the server that match fairly closely, but not closely enough for the validator to make a call on their validity. Sometimes this may be due to the reduced precision a GPU may have compared with a CPU. This is all that I meant when I said that CPUs are still valuable in the science of this project. I've done hundreds of VLARs on my GPUs at every angle that they sent me. By 'quicker MBs' I'm guessing you mean the plain WUs not labelled as VLAR. This is certainly an interesting choice. There doesn't appear to be a substantial difference between the run-times for VLAR and non-VLAR MB WUs on a CPU (excluding 'shorties'), yet there is certainly a huge difference in run-times between VLAR and non-VLAR MB WUs on NV GPUs - at least for ones that you're not using. But if you choose to run the VLAR WUs on the GPU, that's certainly your right. The problem is in people giving out false information to other people that don't know it's not true. That you and others have not been able to run VLARS on NVIDIA GPUs does not make it true in all instances. Who says it is false information? I never said that I could not run VLAR WUs on NV GPUs, only that there is a severe performance penalty in doing so. It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations. If so, then that is good news. But it's not fair to flat out declare that this is false information. Certainly, a great many contributors - including myself - found it beneficial for stability and performance reasons to redirect VLAR WUs to the CPU (despite the hit to the APR count), before the server did this automatically, and even went so far as to write scripts to automate this process. Soli Deo Gloria |
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 9618 Credit: 123,892,611 RAC: 83,211 ![]() ![]() |
It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations. That is the case. When we had the glitch where VLARs were going to GPUs a few months back, my GTX460 & 560Ti had no problems with them & i had no system slow downs or sluggish response issues. Grant Darwin NT |
Wedge009 Volunteer tester ![]() Send message Joined: 3 Apr 99 Posts: 448 Credit: 302,497,264 RAC: 91,374 ![]() ![]() |
I didn't have any GUI unresponsiveness or anything like that whenever VLAR WUs went on NV GPUs, pre-Fermi, but run-times were horribly long. With the current lack of WUs (maybe I'm just unlucky), I've redirected a single VLAR WU to an NV GPU, just to see what happens - also no lack of responsiveness, but the run-time looks to be very long. Judging by current progress, it'll be nearly an hour before it's completed, compared with 2 or 8 minutes for non-VLAR WUs. Ouch. Soli Deo Gloria |
![]() ![]() Volunteer tester ![]() Send message Joined: 17 Feb 01 Posts: 31003 Credit: 61,795,819 RAC: 30,445 ![]() ![]() |
I'm aware of the crashing issue on lower-end GPUs. But obviously the concept of opportunity cost is completely lost. No matter, it's still the case that GPU calculations are not quite as precise as those made on the CPU. Precision is definitely not a problem. All actual OpenCL apps are at 99,9% accuracy. I`m almost certain its the same with all cuda builds. With each crime and every kindness we birth our future. |
Wedge009 Volunteer tester ![]() Send message Joined: 3 Apr 99 Posts: 448 Credit: 302,497,264 RAC: 91,374 ![]() ![]() |
I was referring to the inherent limitations of the hardware, not the applications. But I know a lot of hard work has gone into improving the accuracy of the applications - many thanks to all involved in that. Soli Deo Gloria |
![]() Volunteer tester ![]() Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
What on earth has a) VLAR not going to NV and b)the accuracy of the different apps to do with the current limits on task in progress?! i.e. with the topic of this thread?! But since the OP seems to be fine with it... So, after I've had a good scream in a dedicated place, I'll be with you and try to sift fact from fiction. A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() Volunteer tester ![]() Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Ok... Let me do the easier one first - precision. [on second thought it's not easy at all...] Disclaimer: I'm neither omniscient nor do I speak ex cathedra. Some points may end up slightly wrong. Don't blame me if your eyes glaze over. Any of you ever heard of the law of error propagation? Or of the different precisions computers can work at? How simple do I need to put it? Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result. For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place. Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision. That invariably leads to small discrepancies. This is most notable in the number of signals reported - when the signal is very close to the reporting threshold it may be reported in one app but not the other. The validator has been designed to cope with the inherent variance of computational calculations. In such cases it will get another opinion - and if the mismatch is only one signal (to be precise if more than 50% of the results match) both (all three) results will be valid. Obviously this shortcoming is annoying to an excellent engineer and programmer. That lead to improvements in the precision of the optimised app. x32f was an initial release - x38g already came with precision improvements that were further refined in x41g and x41zc. The aim was not so much to get better precision but to improve cross-platform accuracy. At this point it became clear that any further improvements to bring CPU and GPU results closer had to come from the CPU side. This led to 6.98 and 6.99 going into beta testing. (well looks like 6.99 is going to finally go out as 7.00 or so.) Sorry I neglected to mention ATI/OpenCL. Precision improvements similar to those developed for NV were incorporated a bit later. So when V7 finally reaches main all the applications will be as close in precision as the hardware allows and we should see far less cross-platform inconclusives. Some other issues that lead to inconclusives (such as the processing order) will be looked at in detail and hopefully addressed in the next development cycle. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Richard Haselgrove ![]() Volunteer tester Send message Joined: 4 Jul 99 Posts: 11892 Credit: 115,705,144 RAC: 70,350 ![]() ![]() |
At risk of being declared off-topic... Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' "BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer." (New Scientist magazine) |
![]() ![]() ![]() Volunteer tester ![]() Send message Joined: 19 Nov 00 Posts: 40475 Credit: 41,351,963 RAC: 1,803 ![]() ![]() |
I think that Williams recent post was very helpful indeed, thank you. excessive accuracy of numbers.' @Richard, I can think of some around here ..... |
Richard Haselgrove ![]() Volunteer tester Send message Joined: 4 Jul 99 Posts: 11892 Credit: 115,705,144 RAC: 70,350 ![]() ![]() |
For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place. Actually, my understanding is that the vast majority of the SETI search mathematics can be performed perfectly adequately at single precision: modern CPU floating point units (anyone else remember having to buy a separate 8087 or 80287 chip to do maths?) can all do IEEE 754 double precision maths, but it's slower. Single precision was chosen for speed where it was 'good enough'. But there are some (small) parts of the code where the errors build up to such a degree that they absolutely have to be done at double precision. No problem on a CPU, but as William says, the early GPUs didn't have double precision hardware support. Instead, an algorithm was used to simulate double precision where needed, but to start with it didn't fully reach the IEEE specification. Later CUDA optimisations (1) used double precision hardware where available, or (2) used a better (and fully IEEE-compliant) algorithm when emulation was needed on older cards. |
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 9618 Credit: 123,892,611 RAC: 83,211 ![]() ![]() |
Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.' In short- Precision & accuracy aren't the same thing. A very rough definition- precision relates to the number of places you measure to. Accuracy is how close to the actual answer you are. eg a result of 103456.1 is very low in precision, but if the correct value is 103456.0 it is very accurate. A result of 102137.76532987643 is very precise, but with the correct value actually being 104356.0 it isn't very accurate. Grant Darwin NT |
![]() ![]() ![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 15 May 99 Posts: 1941 Credit: 441,424,073 RAC: 460,977 ![]() ![]() |
|
![]() Volunteer tester ![]() Send message Joined: 5 Mar 01 Posts: 782 Credit: 284,654,624 RAC: 105,987 ![]() ![]() |
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 . . . processor : 243 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 apicid : 243 initial apicid : 243 fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr ht syscall lm rep_good nopl lahf_lm bogomips : 2190.18 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ![]() ![]() ![]() |
![]() Volunteer tester ![]() Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
$ cat /proc/cpuinfo Your point being? A person who won't read has no advantage over one who can't read. (Mark Twain) |
![]() Volunteer tester ![]() Send message Joined: 5 Mar 01 Posts: 782 Credit: 284,654,624 RAC: 105,987 ![]() ![]() |
$ cat /proc/cpuinfo Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...). ![]() ![]() ![]() |
![]() ![]() Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,274,184 RAC: 0 ![]() |
I can return GPU tasks as fast as every 20 seconds, queue drains in 30 minutes!!! ![]() ![]() |
![]() Volunteer tester ![]() Send message Joined: 30 Nov 03 Posts: 60532 Credit: 44,444,609 RAC: 6,876 ![]() ![]() |
The 100 wu's I get last Me about 2 hours and 35 minutes on this Zotac closed loop liquid cooled Infinity Edition GTX580 and I have equipment for 2 more of these cards, that would be about 51 minutes each, so I hear ya, if it was 100 wu's per gpu I sure wouldn't mind... What is BSG Robotech-Saga-Wiki SW-wiki |
Lionel Send message Joined: 25 Mar 00 Posts: 672 Credit: 432,354,537 RAC: 37,018 ![]() ![]() |
I empathise and agree that the limits are to low and need to go up in value. As for the scheduled maintenance period, I can't get through that without things running dry. |
![]() Volunteer tester ![]() Send message Joined: 11 Sep 99 Posts: 6529 Credit: 183,025,873 RAC: 50,502 ![]() ![]() |
$ cat /proc/cpuinfo In that case you would simply run multiple instances of BOINC with each configured for a specific number of processors. I have considered running two instances on my 24 core box to give it a more then a few hours cache, but this way it gets some PG work done too. SETI@home classic workunits: 93,865 CPU time: 863,447 hours |
©2018 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.