Average Credit Decreasing?

Message boards : Number crunching : Average Credit Decreasing?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 32 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1787883 - Posted: 16 May 2016, 16:12:04 UTC - in response to Message 1787881.  

Every time I look and see my rac dropping I keep thinking there's a problem then I remember Credit Screw


Not the cause anymore... it's now due to your exclusively NVidia farm receiving GUPPI VLAR work units on the GPUs. As noted much elsewhere, they process much more slowly than Arecibo MBs but pay the same credit.

That's the main cause of my current decline....
20 NV GPUs having a nasty time with the Guppies.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1787883 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1787890 - Posted: 16 May 2016, 16:21:17 UTC

Is it more of a driver issue (where we would have to wait for Nvidia to get their act together before there is any relief), the way that the Guppies are configured when they are split (something that can be addressed internally, though with the manpower crunch, is that likely anytime soon?), or something else that causes such a penalty for them on Nvidia hardware? Are things much better in the AMD world? I haven't ran a AMD card for probably 15+ years, so I basically have no experience with them.

ID: 1787890 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787891 - Posted: 16 May 2016, 16:22:48 UTC - in response to Message 1787890.  

Not this time Al :), we're digging into computer science territory now :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787891 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3806
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1787894 - Posted: 16 May 2016, 16:28:41 UTC - in response to Message 1787890.  
Last modified: 16 May 2016, 16:37:49 UTC

Is it more of a driver issue (where we would have to wait for Nvidia to get their act together before there is any relief), the way that the Guppies are configured when they are split (something that can be addressed internally, though with the manpower crunch, is that likely anytime soon?), or something else that causes such a penalty for them on Nvidia hardware? Are things much better in the AMD world? I haven't ran a AMD card for probably 15+ years, so I basically have no experience with them.


I haven't checked the source code enough yet to know for sure, but I have a suspicion:

There's something in the CUDA framework that doesn't like that VLAR work units have negligibly small angular size. This should exclude them from even checking for Gaussians because the telescope is not crossing the signal which is what causes one, so any code which checks for Gaussians shouldn't even run in a VLAR.

The fact that it is running really slowly means that something in there is still using that angular size (what else but a Gaussian would need to use it?), and very likely shouldn't be. So, to find what the code is that does it, and don't run it if the angular width is below the VLAR threshold.

I'm going to try to bring myself up to speed to fix this thing, but I'm hoping someone will beat me to it... it's a lot of work getting there. :^)
Edit: Also when I get there I am not sure I will even recognize it when I see it...lol.
ID: 1787894 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787897 - Posted: 16 May 2016, 16:37:49 UTC - in response to Message 1787894.  

it's a lot of work getting there. :^)


Correct, but no shortcuts :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787897 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1787898 - Posted: 16 May 2016, 16:39:02 UTC - in response to Message 1787890.  

Is it more of a driver issue (where we would have to wait for Nvidia to get their act together before there is any relief), the way that the Guppies are configured when they are split (something that can be addressed internally, though with the manpower crunch, is that likely anytime soon?), or something else that causes such a penalty for them on Nvidia hardware? Are things much better in the AMD world? I haven't ran a AMD card for probably 15+ years, so I basically have no experience with them.

I am crunching both SETI@home GPUs and SETI Beta GPUs on an AMD HD 7770 in my Linux box and they take about one hour or less even if they are VLAR.
Tullio
ID: 1787898 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1787902 - Posted: 16 May 2016, 16:51:45 UTC - in response to Message 1787894.  
Last modified: 16 May 2016, 17:00:18 UTC


The fact that it is running really slowly means that something in there is still using that angular size (what else but a Gaussian would need to use it?), and very likely shouldn't be. So, to find what the code is that does it, and don't run it if the angular width is below the VLAR threshold.

AR used not only for Gaussians. It also defines how long telescope stare at the nearly same point so defines length of time through which data coud be accumulated (so all PoT analysis use it).
For explanations why VLAR relatively harder for GPU vs CPU vs other ARs look few my recent posts for example (actually it was repeated few times through years).
And effect strongly depends on memory organization. Even on the same frequency memory access to NV CC1.x (for example) device and AMD device very different. So called coalesced access cause enormous performance drop for early NV architectures in case of random (or close to random from hardware point of view) access to memory. Later architectures improved this.
ID: 1787902 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3806
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1787907 - Posted: 16 May 2016, 16:58:05 UTC - in response to Message 1787902.  

For explanations why VLAR relatively harder for GPU vs CPU vs other ARs look few my recent posts for example (actually it was repeated few times through years).


Thanks for the details... do you have a link to any of these posts?
ID: 1787907 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1787909 - Posted: 16 May 2016, 17:00:46 UTC - in response to Message 1787907.  
Last modified: 16 May 2016, 17:05:41 UTC

very short version:
low AR => longer time to stare at same point => bigger data array for single PoT search => failure to fit cache, failure to get enough parallel data to fill all CUs (longer single array = less number of such arrays cause 1M matrix of data point remains constant), decreased computation/memory access ratio (cause most of PulseFind is folding (simple additions) and this search has increased share) => performance drop for devices with massive parallelizm and big memory access latencies (that GPU are).

EDIT: to find person's posts:
http://setiathome.berkeley.edu/forum_user_posts.php?userid=7779286
ID: 1787909 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1787919 - Posted: 16 May 2016, 17:20:14 UTC - in response to Message 1787898.  

Is it more of a driver issue (where we would have to wait for Nvidia to get their act together before there is any relief), the way that the Guppies are configured when they are split (something that can be addressed internally, though with the manpower crunch, is that likely anytime soon?), or something else that causes such a penalty for them on Nvidia hardware? Are things much better in the AMD world? I haven't ran a AMD card for probably 15+ years, so I basically have no experience with them.

I am crunching both SETI@home GPUs and SETI Beta GPUs on an AMD HD 7770 in my Linux box and they take about one hour or less even if they are VLAR.
Tullio

Hmmmm, are you running 2 at a time...or something?
My ATI 7750 runs them in under 30 minutes, http://setiathome.berkeley.edu/result.php?resultid=4932770085, which is better than the 6850s and about the same as the 150 watt 6870.
You might try adding some settings, try just the basic ones my 7750 is using;
-sbs 256 -oclfft_tune_gr 256 -oclfft_tune_wg 128
ID: 1787919 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3806
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1787922 - Posted: 16 May 2016, 17:30:01 UTC - in response to Message 1787909.  
Last modified: 16 May 2016, 17:39:19 UTC

very short version:
low AR => longer time to stare at same point => bigger data array for single PoT search => failure to fit cache, failure to get enough parallel data to fill all CUs (longer single array = less number of such arrays cause 1M matrix of data point remains constant)


OK, this helps. Since slewing Arecibo work units don't do this, the AR must be greater than the scope's "aperture". Let's say that the AR is 3x the size of the aperture of the scope.

Then why not break the WU into that ratio of pieces (in this case 3) and run the pulsefind on each piece, then add the results? That way each piece being of the same timebase as an Arecibo Gaussian won't overload the cache.

Edit: This will take some tinkering due to pulses at the edge of each piece that and up in both of them...
ID: 1787922 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1787923 - Posted: 16 May 2016, 17:34:22 UTC - in response to Message 1787919.  

No, I am a total novice on GPUs and run them one at a time, both on the Linux box with AMD and the Windows 10 PC with a GTX 750 Ti OC, which runs mostly Einstein@home tasks, which take much longer but reward me with 4400 credits.
Tullio
ID: 1787923 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787927 - Posted: 16 May 2016, 17:39:56 UTC - in response to Message 1787923.  

No, I am a total novice on GPUs and run them one at a time, both on the Linux box with AMD and the Windows 10 PC with a GTX 750 Ti OC, which runs mostly Einstein@home tasks, which take much longer but reward me with 4400 credits.
Tullio


Multiply 'Seti Time' and credits by ~3.3 +/- and you will get Einstein time :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787927 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1787931 - Posted: 16 May 2016, 17:57:01 UTC - in response to Message 1787927.  

I made a rule of thumb calculation. Einstein@home is giving me 900 credits/hour per elapsed time on a GPU task, while SETI@home gives me 100 credits/hour also on a GPU task. I am not crunching for credits, being a (retired ) physicist I am strongly interested in Einstein@home. Most of the Einstein projects use only CPUs, only the search for binary radio pulsars on Arecibo and Parkes data uses GPUs.
Tullio
ID: 1787931 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787932 - Posted: 16 May 2016, 18:00:30 UTC - in response to Message 1787931.  

Is that before, during, or after v8 transition, and does it factor that Guppi tasks are lower efficiency running ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787932 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22535
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1787940 - Posted: 16 May 2016, 18:14:13 UTC

From memory of my walk through of the credit code a few years back, and running some simulations, it would appear that the code is all but incapable of correctly resolving the sot of changes that have happened in the last few months. It struggles with the slow increase in performance of both computer and application, but when you have a step change in application and the type of work unit coming out it is all but incapable of working out what is going on. It will default to granting to the lowest possible credit for each task, which will result in a drop in credit granted for a "standard" of between 15 and 30%. Further this drop will continue for about another 5 to 10 % (of the initial credit).
As has been said by some for a long time CreditNew is far from "fit for purpose" - even if one assumed that the purpose is to allow comparison in performance between systems and applications within a project - it is far too dependent upon the performance of the individual computer, and NOT on the content of the task.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1787940 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787943 - Posted: 16 May 2016, 18:22:46 UTC - in response to Message 1787940.  

Works for me, so question is what do we do about it ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787943 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1787944 - Posted: 16 May 2016, 18:25:04 UTC - in response to Message 1787932.  

All my tasks are V8 now. guppi tasks take a little longer, but not much.
Tullio
ID: 1787944 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1787945 - Posted: 16 May 2016, 18:29:30 UTC - in response to Message 1787944.  

All my tasks are V8 now. guppi tasks take a little longer, but not much.
Tullio


and credit is equal for equal work ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1787945 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1787946 - Posted: 16 May 2016, 18:35:34 UTC - in response to Message 1787922.  
Last modified: 16 May 2016, 18:42:38 UTC


Then why not break the WU into that ratio of pieces (in this case 3) and run the pulsefind on each piece, then add the results? That way each piece being of the same timebase as an Arecibo Gaussian won't overload the cache.


Well I skip any discussion about number 3 origin. But in essense it's the way to deal with too long arrays - to split them on subarrays where possible. This have own pluses and minuses: + less number of data point fit in smaller cache better,
+ more separate chains of data to load parallel device better, - need to assemble back, that is, additional synching between data parts processing.

Also, not always possible to select even loosely independend parts of data array.
Consider folding algorithm (in real MultiBeam PulseFind actual folding done also by 3 and 5 ) in its simpliess form (as it implemented in AstroPulse):
one need to take 2 numbers separated by "arbitrary" (in real life - computed from let say data recording params and wanted period to analyze) stride, add them and put in next array. Then repeat the same (and probably with different stride) on new array and so on [And, of course, check each iteration if we have smth over threshold and select best of them - another synching point in this reduction process]. To launch separate kernel for each cycle will be absolute performance kill. So one pass. Then one should know that global memory considered asynchronous between workitems in the same kernel. That is, if CU0 writes smth in cellN and CU1 reads from cellN - order of these operations undefined. So, when one try to split array onto parts one can have synching only inside workgroup (256 workitems for AMD, up to 2048 for modern NV). Obviously part of array that should be handled by single workgroup should be enough to include all data that constitute last point after folding. That limits ability of "divide and conquer" approach in this case.
ID: 1787946 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 32 · Next

Message boards : Number crunching : Average Credit Decreasing?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.