Please rise the limits... just a little...

Message boards : Number crunching : Please rise the limits... just a little...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1346046 - Posted: 13 Mar 2013, 7:39:13 UTC - in response to Message 1346045.  

It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations.

That is the case.
When we had the glitch where VLARs were going to GPUs a few months back, my GTX460 & 560Ti had no problems with them & i had no system slow downs or sluggish response issues.
Grant
Darwin NT
ID: 1346046 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1346053 - Posted: 13 Mar 2013, 7:51:09 UTC

I didn't have any GUI unresponsiveness or anything like that whenever VLAR WUs went on NV GPUs, pre-Fermi, but run-times were horribly long. With the current lack of WUs (maybe I'm just unlucky), I've redirected a single VLAR WU to an NV GPU, just to see what happens - also no lack of responsiveness, but the run-time looks to be very long. Judging by current progress, it'll be nearly an hour before it's completed, compared with 2 or 8 minutes for non-VLAR WUs. Ouch.
Soli Deo Gloria
ID: 1346053 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34250
Credit: 79,922,639
RAC: 80
Germany
Message 1346071 - Posted: 13 Mar 2013, 8:28:37 UTC - in response to Message 1346014.  

I'm aware of the crashing issue on lower-end GPUs. But obviously the concept of opportunity cost is completely lost. No matter, it's still the case that GPU calculations are not quite as precise as those made on the CPU.


Precision is definitely not a problem.
All actual OpenCL apps are at 99,9% accuracy.
I`m almost certain its the same with all cuda builds.




With each crime and every kindness we birth our future.
ID: 1346071 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1346081 - Posted: 13 Mar 2013, 9:00:24 UTC

I was referring to the inherent limitations of the hardware, not the applications. But I know a lot of hard work has gone into improving the accuracy of the applications - many thanks to all involved in that.
Soli Deo Gloria
ID: 1346081 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1346113 - Posted: 13 Mar 2013, 10:51:46 UTC

What on earth has

a) VLAR not going to NV

and

b)the accuracy of the different apps

to do with the current limits on task in progress?! i.e. with the topic of this thread?!

But since the OP seems to be fine with it...
So, after I've had a good scream in a dedicated place, I'll be with you and try to sift fact from fiction.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1346113 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1346138 - Posted: 13 Mar 2013, 12:22:43 UTC

Ok...

Let me do the easier one first - precision. [on second thought it's not easy at all...]

Disclaimer: I'm neither omniscient nor do I speak ex cathedra. Some points may end up slightly wrong. Don't blame me if your eyes glaze over.

Any of you ever heard of the law of error propagation? Or of the different precisions computers can work at? How simple do I need to put it?

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies. This is most notable in the number of signals reported - when the signal is very close to the reporting threshold it may be reported in one app but not the other. The validator has been designed to cope with the inherent variance of computational calculations. In such cases it will get another opinion - and if the mismatch is only one signal (to be precise if more than 50% of the results match) both (all three) results will be valid.

Obviously this shortcoming is annoying to an excellent engineer and programmer.
That lead to improvements in the precision of the optimised app. x32f was an initial release - x38g already came with precision improvements that were further refined in x41g and x41zc.
The aim was not so much to get better precision but to improve cross-platform accuracy. At this point it became clear that any further improvements to bring CPU and GPU results closer had to come from the CPU side. This led to 6.98 and 6.99 going into beta testing. (well looks like 6.99 is going to finally go out as 7.00 or so.)

Sorry I neglected to mention ATI/OpenCL. Precision improvements similar to those developed for NV were incorporated a bit later.

So when V7 finally reaches main all the applications will be as close in precision as the hardware allows and we should see far less cross-platform inconclusives. Some other issues that lead to inconclusives (such as the processing order) will be looked at in detail and hopefully addressed in the next development cycle.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1346138 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14645
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1346144 - Posted: 13 Mar 2013, 12:45:51 UTC - in response to Message 1346138.  

At risk of being declared off-topic...

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

"BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer." (New Scientist magazine)
ID: 1346144 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14645
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1346153 - Posted: 13 Mar 2013, 13:16:39 UTC - in response to Message 1346138.  

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies.

Actually, my understanding is that the vast majority of the SETI search mathematics can be performed perfectly adequately at single precision: modern CPU floating point units (anyone else remember having to buy a separate 8087 or 80287 chip to do maths?) can all do IEEE 754 double precision maths, but it's slower. Single precision was chosen for speed where it was 'good enough'.

But there are some (small) parts of the code where the errors build up to such a degree that they absolutely have to be done at double precision. No problem on a CPU, but as William says, the early GPUs didn't have double precision hardware support. Instead, an algorithm was used to simulate double precision where needed, but to start with it didn't fully reach the IEEE specification.

Later CUDA optimisations (1) used double precision hardware where available, or (2) used a better (and fully IEEE-compliant) algorithm when emulation was needed on older cards.
ID: 1346153 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1346240 - Posted: 13 Mar 2013, 18:13:31 UTC - in response to Message 1346138.  

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

In short- Precision & accuracy aren't the same thing.
A very rough definition- precision relates to the number of places you measure to. Accuracy is how close to the actual answer you are.


eg a result of 103456.1 is very low in precision, but if the correct value is 103456.0 it is very accurate. A result of 102137.76532987643 is very precise, but with the correct value actually being 104356.0 it isn't very accurate.
Grant
Darwin NT
ID: 1346240 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1346309 - Posted: 13 Mar 2013, 21:02:49 UTC

For those of us who think visually... :^)


ID: 1346309 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1357523 - Posted: 16 Apr 2013, 13:06:59 UTC

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 11
model           : 1
model name      : 0b/01
stepping        : 1
cpu MHz         : 1090.908
cache size      : 512 KB
physical id     : 0
siblings        : 244
core id         : 60
cpu cores       : 61
.
.
.

processor       : 243
vendor_id       : GenuineIntel
cpu family      : 11
model           : 1
model name      : 0b/01
stepping        : 1
cpu MHz         : 1090.908
cache size      : 512 KB
physical id     : 0
siblings        : 244
core id         : 60
cpu cores       : 61
apicid          : 243
initial apicid  : 243
fpu             : yes
fpu_exception   : yes
cpuid level     : 4
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr ht syscall lm rep_good nopl lahf_lm
bogomips        : 2190.18
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

ID: 1357523 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1357549 - Posted: 16 Apr 2013, 14:30:49 UTC - in response to Message 1357523.  

$ cat /proc/cpuinfo

Your point being?
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1357549 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1357573 - Posted: 16 Apr 2013, 15:41:27 UTC - in response to Message 1357549.  
Last modified: 16 Apr 2013, 15:41:46 UTC

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).
ID: 1357573 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1357661 - Posted: 17 Apr 2013, 3:15:56 UTC

I can return GPU tasks as fast as every 20 seconds, queue drains in 30 minutes!!!
ID: 1357661 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65690
Credit: 55,293,173
RAC: 49
United States
Message 1357665 - Posted: 17 Apr 2013, 4:01:29 UTC

The 100 wu's I get last Me about 2 hours and 35 minutes on this Zotac closed loop liquid cooled Infinity Edition GTX580 and I have equipment for 2 more of these cards, that would be about 51 minutes each, so I hear ya, if it was 100 wu's per gpu I sure wouldn't mind...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1357665 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1357666 - Posted: 17 Apr 2013, 4:13:21 UTC


I empathise and agree that the limits are to low and need to go up in value. As for the scheduled maintenance period, I can't get through that without things running dry.








ID: 1357666 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1358039 - Posted: 18 Apr 2013, 12:25:02 UTC - in response to Message 1357573.  

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).

In that case you would simply run multiple instances of BOINC with each configured for a specific number of processors. I have considered running two instances on my 24 core box to give it a more then a few hours cache, but this way it gets some PG work done too.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1358039 · Report as offensive
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Please rise the limits... just a little...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.