Please rise the limits... just a little...


log in

Advanced search

Message boards : Number crunching : Please rise the limits... just a little...

Previous · 1 · 2 · 3 · 4
Author Message
Profile trader
Volunteer tester
Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1346039 - Posted: 13 Mar 2013, 7:10:38 UTC - in response to Message 1346018.

My two cents worth for this topic......

I think an increase of wu alloted is a good idea, HOWEVER!! i think is should only be implimented for and based on systems and users them meet X criteria where X is a numerical value based on how many wu said user and machines have done and how long said user and machine have been crunching.

example a+b+c=x

a = number of years user has been actively crunching (if you started crunching in 2000 but have only been actively crunching since 2008 a would = 5)

b = number of years or months of active crunching machine has been crunching

c = number of processors usable for crunching

this way only people with proven records get more work. anybody else think this is a good way to go?



____________
I RTFM and it was WYSIWYG then i found out it was a PEBKAC error

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 356
Credit: 152,977,432
RAC: 72,464
Australia
Message 1346045 - Posted: 13 Mar 2013, 7:28:18 UTC - in response to Message 1346034.
Last modified: 13 Mar 2013, 7:29:06 UTC

In that case GPUs shouldn't be used all then.

~sigh~ This is not a fair conclusion to come to. GPUs are less precise than CPUs, but in most cases they are still 'good enough'. A problem arises when there are results returned to the server that match fairly closely, but not closely enough for the validator to make a call on their validity. Sometimes this may be due to the reduced precision a GPU may have compared with a CPU. This is all that I meant when I said that CPUs are still valuable in the science of this project.

I've done hundreds of VLARs on my GPUs at every angle that they sent me.
I crunch them there to allow my old slow CPU to crunch the quicker MBs.
That's how I judge my setup to be most efficient as to how I use it.

By 'quicker MBs' I'm guessing you mean the plain WUs not labelled as VLAR. This is certainly an interesting choice. There doesn't appear to be a substantial difference between the run-times for VLAR and non-VLAR MB WUs on a CPU (excluding 'shorties'), yet there is certainly a huge difference in run-times between VLAR and non-VLAR MB WUs on NV GPUs - at least for ones that you're not using. But if you choose to run the VLAR WUs on the GPU, that's certainly your right.

The problem is in people giving out false information to other people that don't know it's not true. That you and others have not been able to run VLARS on NVIDIA GPUs does not make it true in all instances.

Who says it is false information? I never said that I could not run VLAR WUs on NV GPUs, only that there is a severe performance penalty in doing so. It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations. If so, then that is good news. But it's not fair to flat out declare that this is false information. Certainly, a great many contributors - including myself - found it beneficial for stability and performance reasons to redirect VLAR WUs to the CPU (despite the hit to the APR count), before the server did this automatically, and even went so far as to write scripts to automate this process.
____________
Soli Deo Gloria

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,701,540
RAC: 25,044
Australia
Message 1346046 - Posted: 13 Mar 2013, 7:39:13 UTC - in response to Message 1346045.

It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations.

That is the case.
When we had the glitch where VLARs were going to GPUs a few months back, my GTX460 & 560Ti had no problems with them & i had no system slow downs or sluggish response issues.
____________
Grant
Darwin NT.

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 356
Credit: 152,977,432
RAC: 72,464
Australia
Message 1346053 - Posted: 13 Mar 2013, 7:51:09 UTC

I didn't have any GUI unresponsiveness or anything like that whenever VLAR WUs went on NV GPUs, pre-Fermi, but run-times were horribly long. With the current lack of WUs (maybe I'm just unlucky), I've redirected a single VLAR WU to an NV GPU, just to see what happens - also no lack of responsiveness, but the run-time looks to be very long. Judging by current progress, it'll be nearly an hour before it's completed, compared with 2 or 8 minutes for non-VLAR WUs. Ouch.
____________
Soli Deo Gloria

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24895
Credit: 34,404,954
RAC: 11,229
Germany
Message 1346071 - Posted: 13 Mar 2013, 8:28:37 UTC - in response to Message 1346014.

I'm aware of the crashing issue on lower-end GPUs. But obviously the concept of opportunity cost is completely lost. No matter, it's still the case that GPU calculations are not quite as precise as those made on the CPU.


Precision is definitely not a problem.
All actual OpenCL apps are at 99,9% accuracy.
I`m almost certain its the same with all cuda builds.


____________

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 356
Credit: 152,977,432
RAC: 72,464
Australia
Message 1346081 - Posted: 13 Mar 2013, 9:00:24 UTC

I was referring to the inherent limitations of the hardware, not the applications. But I know a lot of hard work has gone into improving the accuracy of the applications - many thanks to all involved in that.
____________
Soli Deo Gloria

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1610
Credit: 9,470,168
RAC: 16
Message 1346113 - Posted: 13 Mar 2013, 10:51:46 UTC

What on earth has

a) VLAR not going to NV

and

b)the accuracy of the different apps

to do with the current limits on task in progress?! i.e. with the topic of this thread?!

But since the OP seems to be fine with it...
So, after I've had a good scream in a dedicated place, I'll be with you and try to sift fact from fiction.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1610
Credit: 9,470,168
RAC: 16
Message 1346138 - Posted: 13 Mar 2013, 12:22:43 UTC

Ok...

Let me do the easier one first - precision. [on second thought it's not easy at all...]

Disclaimer: I'm neither omniscient nor do I speak ex cathedra. Some points may end up slightly wrong. Don't blame me if your eyes glaze over.

Any of you ever heard of the law of error propagation? Or of the different precisions computers can work at? How simple do I need to put it?

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies. This is most notable in the number of signals reported - when the signal is very close to the reporting threshold it may be reported in one app but not the other. The validator has been designed to cope with the inherent variance of computational calculations. In such cases it will get another opinion - and if the mismatch is only one signal (to be precise if more than 50% of the results match) both (all three) results will be valid.

Obviously this shortcoming is annoying to an excellent engineer and programmer.
That lead to improvements in the precision of the optimised app. x32f was an initial release - x38g already came with precision improvements that were further refined in x41g and x41zc.
The aim was not so much to get better precision but to improve cross-platform accuracy. At this point it became clear that any further improvements to bring CPU and GPU results closer had to come from the CPU side. This led to 6.98 and 6.99 going into beta testing. (well looks like 6.99 is going to finally go out as 7.00 or so.)

Sorry I neglected to mention ATI/OpenCL. Precision improvements similar to those developed for NV were incorporated a bit later.

So when V7 finally reaches main all the applications will be as close in precision as the hardware allows and we should see far less cross-platform inconclusives. Some other issues that lead to inconclusives (such as the processing order) will be looked at in detail and hopefully addressed in the next development cycle.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,106
RAC: 23,928
United Kingdom
Message 1346144 - Posted: 13 Mar 2013, 12:45:51 UTC - in response to Message 1346138.

At risk of being declared off-topic...

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

"BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer." (New Scientist magazine)

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32315
Credit: 14,271,948
RAC: 10,052
United Kingdom
Message 1346150 - Posted: 13 Mar 2013, 12:57:38 UTC

I think that Williams recent post was very helpful indeed, thank you.

excessive accuracy of numbers.'

@Richard, I can think of some around here .....

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,711,106
RAC: 23,928
United Kingdom
Message 1346153 - Posted: 13 Mar 2013, 13:16:39 UTC - in response to Message 1346138.

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies.

Actually, my understanding is that the vast majority of the SETI search mathematics can be performed perfectly adequately at single precision: modern CPU floating point units (anyone else remember having to buy a separate 8087 or 80287 chip to do maths?) can all do IEEE 754 double precision maths, but it's slower. Single precision was chosen for speed where it was 'good enough'.

But there are some (small) parts of the code where the errors build up to such a degree that they absolutely have to be done at double precision. No problem on a CPU, but as William says, the early GPUs didn't have double precision hardware support. Instead, an algorithm was used to simulate double precision where needed, but to start with it didn't fully reach the IEEE specification.

Later CUDA optimisations (1) used double precision hardware where available, or (2) used a better (and fully IEEE-compliant) algorithm when emulation was needed on older cards.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,701,540
RAC: 25,044
Australia
Message 1346240 - Posted: 13 Mar 2013, 18:13:31 UTC - in response to Message 1346138.

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

In short- Precision & accuracy aren't the same thing.
A very rough definition- precision relates to the number of places you measure to. Accuracy is how close to the actual answer you are.


eg a result of 103456.1 is very low in precision, but if the correct value is 103456.0 it is very accurate. A result of 102137.76532987643 is very precise, but with the correct value actually being 104356.0 it isn't very accurate.
____________
Grant
Darwin NT.

Profile Mr. KevvyProject donor
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 731
Credit: 78,422,444
RAC: 42,288
Canada
Message 1346309 - Posted: 13 Mar 2013, 21:02:49 UTC

For those of us who think visually... :^)


____________
“Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it's the only thing that ever has.”
--- Margaret Mead

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 639
Credit: 146,894,445
RAC: 78,154
United Kingdom
Message 1357523 - Posted: 16 Apr 2013, 13:06:59 UTC

$ cat /proc/cpuinfo

processor : 0 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 . . . processor : 243 vendor_id : GenuineIntel cpu family : 11 model : 1 model name : 0b/01 stepping : 1 cpu MHz : 1090.908 cache size : 512 KB physical id : 0 siblings : 244 core id : 60 cpu cores : 61 apicid : 243 initial apicid : 243 fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr ht syscall lm rep_good nopl lahf_lm bogomips : 2190.18 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:

____________

Profile WilliamProject donor
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1610
Credit: 9,470,168
RAC: 16
Message 1357549 - Posted: 16 Apr 2013, 14:30:49 UTC - in response to Message 1357523.

$ cat /proc/cpuinfo

Your point being?
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 639
Credit: 146,894,445
RAC: 78,154
United Kingdom
Message 1357573 - Posted: 16 Apr 2013, 15:41:27 UTC - in response to Message 1357549.
Last modified: 16 Apr 2013, 15:41:46 UTC

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).
____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 995
Credit: 208,407,137
RAC: 30,629
United States
Message 1357661 - Posted: 17 Apr 2013, 3:15:56 UTC

I can return GPU tasks as fast as every 20 seconds, queue drains in 30 minutes!!!
____________

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46765
Credit: 36,999,735
RAC: 3,350
United States
Message 1357665 - Posted: 17 Apr 2013, 4:01:29 UTC

The 100 wu's I get last Me about 2 hours and 35 minutes on this Zotac closed loop liquid cooled Infinity Edition GTX580 and I have equipment for 2 more of these cards, that would be about 51 minutes each, so I hear ya, if it was 100 wu's per gpu I sure wouldn't mind...
____________
My Facebook, War Commander, 2015

Lionel
Send message
Joined: 25 Mar 00
Posts: 583
Credit: 240,672,800
RAC: 95,396
Australia
Message 1357666 - Posted: 17 Apr 2013, 4:13:21 UTC


I empathise and agree that the limits are to low and need to go up in value. As for the scheduled maintenance period, I can't get through that without things running dry.








____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4595
Credit: 121,582,008
RAC: 49,532
United States
Message 1358039 - Posted: 18 Apr 2013, 12:25:02 UTC - in response to Message 1357573.

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).

In that case you would simply run multiple instances of BOINC with each configured for a specific number of processors. I have considered running two instances on my 24 core box to give it a more then a few hours cache, but this way it gets some PG work done too.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Please rise the limits... just a little...

Copyright © 2014 University of California