Please rise the limits... just a little...


log in

Advanced search

Message boards : Number crunching : Please rise the limits... just a little...

Previous · 1 · 2 · 3 · 4
Author Message
Profile trader
Volunteer tester
Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1346039 - Posted: 13 Mar 2013, 7:10:38 UTC - in response to Message 1346018.

My two cents worth for this topic......

I think an increase of wu alloted is a good idea, HOWEVER!! i think is should only be implimented for and based on systems and users them meet X criteria where X is a numerical value based on how many wu said user and machines have done and how long said user and machine have been crunching.

example a+b+c=x

a = number of years user has been actively crunching (if you started crunching in 2000 but have only been actively crunching since 2008 a would = 5)

b = number of years or months of active crunching machine has been crunching

c = number of processors usable for crunching

this way only people with proven records get more work. anybody else think this is a good way to go?



____________
I RTFM and it was WYSIWYG then i found out it was a PEBKAC error

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 306
Credit: 130,526,314
RAC: 249,578
Australia
Message 1346045 - Posted: 13 Mar 2013, 7:28:18 UTC - in response to Message 1346034.
Last modified: 13 Mar 2013, 7:29:06 UTC

In that case GPUs shouldn't be used all then.

~sigh~ This is not a fair conclusion to come to. GPUs are less precise than CPUs, but in most cases they are still 'good enough'. A problem arises when there are results returned to the server that match fairly closely, but not closely enough for the validator to make a call on their validity. Sometimes this may be due to the reduced precision a GPU may have compared with a CPU. This is all that I meant when I said that CPUs are still valuable in the science of this project.

I've done hundreds of VLARs on my GPUs at every angle that they sent me.
I crunch them there to allow my old slow CPU to crunch the quicker MBs.
That's how I judge my setup to be most efficient as to how I use it.

By 'quicker MBs' I'm guessing you mean the plain WUs not labelled as VLAR. This is certainly an interesting choice. There doesn't appear to be a substantial difference between the run-times for VLAR and non-VLAR MB WUs on a CPU (excluding 'shorties'), yet there is certainly a huge difference in run-times between VLAR and non-VLAR MB WUs on NV GPUs - at least for ones that you're not using. But if you choose to run the VLAR WUs on the GPU, that's certainly your right.

The problem is in people giving out false information to other people that don't know it's not true. That you and others have not been able to run VLARS on NVIDIA GPUs does not make it true in all instances.

Who says it is false information? I never said that I could not run VLAR WUs on NV GPUs, only that there is a severe performance penalty in doing so. It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations. If so, then that is good news. But it's not fair to flat out declare that this is false information. Certainly, a great many contributors - including myself - found it beneficial for stability and performance reasons to redirect VLAR WUs to the CPU (despite the hit to the APR count), before the server did this automatically, and even went so far as to write scripts to automate this process.
____________
Soli Deo Gloria

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,418,149
RAC: 48,859
Australia
Message 1346046 - Posted: 13 Mar 2013, 7:39:13 UTC - in response to Message 1346045.

It may well be that Fermi and Kepler GPUs don't suffer as much with previous NV generations.

That is the case.
When we had the glitch where VLARs were going to GPUs a few months back, my GTX460 & 560Ti had no problems with them & i had no system slow downs or sluggish response issues.
____________
Grant
Darwin NT.

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 306
Credit: 130,526,314
RAC: 249,578
Australia
Message 1346053 - Posted: 13 Mar 2013, 7:51:09 UTC

I didn't have any GUI unresponsiveness or anything like that whenever VLAR WUs went on NV GPUs, pre-Fermi, but run-times were horribly long. With the current lack of WUs (maybe I'm just unlucky), I've redirected a single VLAR WU to an NV GPU, just to see what happens - also no lack of responsiveness, but the run-time looks to be very long. Judging by current progress, it'll be nearly an hour before it's completed, compared with 2 or 8 minutes for non-VLAR WUs. Ouch.
____________
Soli Deo Gloria

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23383
Credit: 31,828,283
RAC: 24,029
Germany
Message 1346071 - Posted: 13 Mar 2013, 8:28:37 UTC - in response to Message 1346014.

I'm aware of the crashing issue on lower-end GPUs. But obviously the concept of opportunity cost is completely lost. No matter, it's still the case that GPU calculations are not quite as precise as those made on the CPU.


Precision is definitely not a problem.
All actual OpenCL apps are at 99,9% accuracy.
I`m almost certain its the same with all cuda builds.


____________

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 306
Credit: 130,526,314
RAC: 249,578
Australia
Message 1346081 - Posted: 13 Mar 2013, 9:00:24 UTC

I was referring to the inherent limitations of the hardware, not the applications. But I know a lot of hard work has gone into improving the accuracy of the applications - many thanks to all involved in that.
____________
Soli Deo Gloria

Profile William
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1572
Credit: 9,253,455
RAC: 12,773
Message 1346113 - Posted: 13 Mar 2013, 10:51:46 UTC

What on earth has

a) VLAR not going to NV

and

b)the accuracy of the different apps

to do with the current limits on task in progress?! i.e. with the topic of this thread?!

But since the OP seems to be fine with it...
So, after I've had a good scream in a dedicated place, I'll be with you and try to sift fact from fiction.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Profile William
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1572
Credit: 9,253,455
RAC: 12,773
Message 1346138 - Posted: 13 Mar 2013, 12:22:43 UTC

Ok...

Let me do the easier one first - precision. [on second thought it's not easy at all...]

Disclaimer: I'm neither omniscient nor do I speak ex cathedra. Some points may end up slightly wrong. Don't blame me if your eyes glaze over.

Any of you ever heard of the law of error propagation? Or of the different precisions computers can work at? How simple do I need to put it?

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies. This is most notable in the number of signals reported - when the signal is very close to the reporting threshold it may be reported in one app but not the other. The validator has been designed to cope with the inherent variance of computational calculations. In such cases it will get another opinion - and if the mismatch is only one signal (to be precise if more than 50% of the results match) both (all three) results will be valid.

Obviously this shortcoming is annoying to an excellent engineer and programmer.
That lead to improvements in the precision of the optimised app. x32f was an initial release - x38g already came with precision improvements that were further refined in x41g and x41zc.
The aim was not so much to get better precision but to improve cross-platform accuracy. At this point it became clear that any further improvements to bring CPU and GPU results closer had to come from the CPU side. This led to 6.98 and 6.99 going into beta testing. (well looks like 6.99 is going to finally go out as 7.00 or so.)

Sorry I neglected to mention ATI/OpenCL. Precision improvements similar to those developed for NV were incorporated a bit later.

So when V7 finally reaches main all the applications will be as close in precision as the hardware allows and we should see far less cross-platform inconclusives. Some other issues that lead to inconclusives (such as the processing order) will be looked at in detail and hopefully addressed in the next development cycle.
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,749,348
RAC: 22,069
United Kingdom
Message 1346144 - Posted: 13 Mar 2013, 12:45:51 UTC - in response to Message 1346138.

At risk of being declared off-topic...

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

"BANKS are under orders to tighten their accounting, but we fear HSBC may be paying too much attention to the small things rather than the gigapounds. Andrew Beggs wanted to know the distance from his home to the bank's nearest branch. He consulted the bank's website, which came up with the answer 0.9904670356841079 miles (1.5940021810076005 kilometres). This is supposedly accurate to around 10-13 metres, much less than the radius of a hydrogen atom. Moisture condensing on the branch's door would bring it several significant digits closer." (New Scientist magazine)

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 31145
Credit: 11,323,057
RAC: 20,519
United Kingdom
Message 1346150 - Posted: 13 Mar 2013, 12:57:38 UTC

I think that Williams recent post was very helpful indeed, thank you.

excessive accuracy of numbers.'

@Richard, I can think of some around here .....

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,749,348
RAC: 22,069
United Kingdom
Message 1346153 - Posted: 13 Mar 2013, 13:16:39 UTC - in response to Message 1346138.

For us, precision is vital. Minimal errors expand to large ones. So, we'd like to have as much precision as possible in the calculations. In computer terms that comes at a cost - double precision takes up double the space. So there is a tradeoff between the accuracy of a calculation and the memory it requires. For that reason sometimes lower precision is used. Also higher precision can sometimes result in higher error. And last but not least, GPUs often don't support double precision in the first place.

Now, let's look at the stock application level. The CPU app was developed first and AFAIK uses varying precision. The NV app was ported from there and due to the hardware available at that time (end of 2008) was single precision.
That invariably leads to small discrepancies.

Actually, my understanding is that the vast majority of the SETI search mathematics can be performed perfectly adequately at single precision: modern CPU floating point units (anyone else remember having to buy a separate 8087 or 80287 chip to do maths?) can all do IEEE 754 double precision maths, but it's slower. Single precision was chosen for speed where it was 'good enough'.

But there are some (small) parts of the code where the errors build up to such a degree that they absolutely have to be done at double precision. No problem on a CPU, but as William says, the early GPUs didn't have double precision hardware support. Instead, an algorithm was used to simulate double precision where needed, but to start with it didn't fully reach the IEEE specification.

Later CUDA optimisations (1) used double precision hardware where available, or (2) used a better (and fully IEEE-compliant) algorithm when emulation was needed on older cards.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,418,149
RAC: 48,859
Australia
Message 1346240 - Posted: 13 Mar 2013, 18:13:31 UTC - in response to Message 1346138.

Let me cite Gauss: 'Nothing shows mathematical illiterateness as much as the excessive accuracy of numbers.'

As a scientist, the rule is to carry as much precision as feasible through the whole calculation and only reduce to a sensible number of decimal points when presenting the result.

In short- Precision & accuracy aren't the same thing.
A very rough definition- precision relates to the number of places you measure to. Accuracy is how close to the actual answer you are.


eg a result of 103456.1 is very low in precision, but if the correct value is 103456.0 it is very accurate. A result of 102137.76532987643 is very precise, but with the correct value actually being 104356.0 it isn't very accurate.
____________
Grant
Darwin NT.

Profile Mr. Kevvy
Volunteer tester
Avatar
Send message
Joined: 15 May 99
Posts: 678
Credit: 70,028,497
RAC: 77,962
Canada
Message 1346309 - Posted: 13 Mar 2013, 21:02:49 UTC

For those of us who think visually... :^)


____________
“Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it's the only thing that ever has.”
--- Margaret Mead

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 579
Credit: 131,500,042
RAC: 112,910
United Kingdom
Message 1357523 - Posted: 16 Apr 2013, 13:06:59 UTC

$ cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 11
model : 1
model name : 0b/01
stepping : 1
cpu MHz : 1090.908
cache size : 512 KB
physical id : 0
siblings : 244
core id : 60
cpu cores : 61
.
.
.

processor : 243
vendor_id : GenuineIntel
cpu family : 11
model : 1
model name : 0b/01
stepping : 1
cpu MHz : 1090.908
cache size : 512 KB
physical id : 0
siblings : 244
core id : 60
cpu cores : 61
apicid : 243
initial apicid : 243
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr mca pat fxsr ht syscall lm rep_good nopl lahf_lm
bogomips : 2190.18
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

____________

Profile William
Volunteer tester
Avatar
Send message
Joined: 14 Feb 13
Posts: 1572
Credit: 9,253,455
RAC: 12,773
Message 1357549 - Posted: 16 Apr 2013, 14:30:49 UTC - in response to Message 1357523.

$ cat /proc/cpuinfo

Your point being?
____________
A person who won't read has no advantage over one who can't read. (Mark Twain)

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 579
Credit: 131,500,042
RAC: 112,910
United Kingdom
Message 1357573 - Posted: 16 Apr 2013, 15:41:27 UTC - in response to Message 1357549.
Last modified: 16 Apr 2013, 15:41:46 UTC

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).
____________

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 992
Credit: 207,654,623
RAC: 1
United States
Message 1357661 - Posted: 17 Apr 2013, 3:15:56 UTC

I can return GPU tasks as fast as every 20 seconds, queue drains in 30 minutes!!!
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45793
Credit: 36,420,789
RAC: 7,186
Message 1357665 - Posted: 17 Apr 2013, 4:01:29 UTC

The 100 wu's I get last Me about 2 hours and 35 minutes on this Zotac closed loop liquid cooled Infinity Edition GTX580 and I have equipment for 2 more of these cards, that would be about 51 minutes each, so I hear ya, if it was 100 wu's per gpu I sure wouldn't mind...
____________

Lionel
Send message
Joined: 25 Mar 00
Posts: 544
Credit: 217,399,797
RAC: 210,514
Australia
Message 1357666 - Posted: 17 Apr 2013, 4:13:21 UTC


I empathise and agree that the limits are to low and need to go up in value. As for the scheduled maintenance period, I can't get through that without things running dry.








____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3863
Credit: 107,111,159
RAC: 99,272
United States
Message 1358039 - Posted: 18 Apr 2013, 12:25:02 UTC - in response to Message 1357573.

$ cat /proc/cpuinfo

Your point being?

Just the irony of a 100 WU limit when there are machines that can run 244 at a time (theoretically...).

In that case you would simply run multiple instances of BOINC with each configured for a specific number of processors. I have considered running two instances on my 24 core box to give it a more then a few hours cache, but this way it gets some PG work done too.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Please rise the limits... just a little...

Copyright © 2014 University of California