I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 58 · Next

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749761 - Posted: 16 Dec 2015, 22:50:19 UTC - in response to Message 1749760.  
Last modified: 16 Dec 2015, 22:50:42 UTC

Yes, so far they've all validated except the one that had triplets after a restart. The question is why it had to wait for the other card to finish the AP before it would start the cuda.
I just switched over to the Stock App I compiled about a week ago to see how it handles the APs. The only change from stock is all the CC-1.x cards were removed and the cards up to CC-5.0 were added. So far it seems to be working, although much slower than the New code. Hopefully it won't time out on any tasks.

Here's the first shorty completed, http://setiathome.berkeley.edu/result.php?resultid=4602894114
setiathome enhanced x41zc (Sanity Check #3), Cuda 6.50 special ????

A few had validated already, let's see what happens when it hits this AP,
http://setiathome.berkeley.edu/result.php?resultid=4603595614
Will it slow everything down or keep going?

I need more APs.


special refers to my code.


I do not know who handles the desanitation.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749761 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749762 - Posted: 16 Dec 2015, 22:55:59 UTC

TBar,

there was a important staement in what you posted. Regarding of those that did not validate immediately. I noticed the same. My count for 'inaccurate' dropped a few days ago. But.. Now I do not get any invalids, but my 'pending/needing another validation' is climbing up. I may have done something right and something bad.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749762 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749763 - Posted: 16 Dec 2015, 22:58:41 UTC - in response to Message 1749761.  
Last modified: 16 Dec 2015, 23:09:11 UTC

special refers to my code.

I do not know who handles the desanitation.

I have No idea where the name came from. As I remember it the only changes were to remove the Compute Code 1.x cards so it would work with ToolKit 6.5 and add the cards up to my 750Ti. It shouldn't have any of your code as far as I know. It certainly looks to be running at stock speed and stock CPU usage.

This was the change to the Compiler line; NVCCFLAGS = -O3 --use_fast_math --ptxas-options="-v" --compiler-options "$(AM_CXXFLAGS) $(CXXFLAGS) -fno-strict-aliasing" -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=sm_21 -gencode arch=compute_20,code=compute_20 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50
ID: 1749763 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749774 - Posted: 17 Dec 2015, 0:18:10 UTC

Since it is 'special'
I'll have to flush
the toilet.

There are going to be a/more leak/s or another/more.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749774 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749776 - Posted: 17 Dec 2015, 0:24:11 UTC - in response to Message 1749763.  

Check the svn logs on the branch, some weeks back I was doing some minor cleanups preparing a 'sanity check' as baseline reference, to lever in and compare some of petri's tweaks that can be adopted generically (along with some of my own), for all older compute capabilities.

The only reason that procedure paused, is v8 multibeam demanding a focus shift. along with work commitments piling on at the time.

Probably I'll change the text on the head of that branch to 'Pre-Alpha' and 'Not for use on main', since while I intend to maintain v6-v7-v8 compatibility, some breakage is likely before the polish goes on.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749776 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749778 - Posted: 17 Dec 2015, 0:29:56 UTC - in response to Message 1749774.  

Haha, yes cleanup for this is requiring more of a shovel than a plunger.

Verdict is: build system is too complex for the needs of this branch.
Replacing that by a single unified Makefile is going smoothly so far, but taking a while to get every file compiling. I'll be labelling it mac_build under client first, but from what I can tell it may just work on Linux and other Cuda platforms too (Except Windows)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749778 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749780 - Posted: 17 Dec 2015, 0:56:03 UTC

All the hassle is to do with my code being not in accordance to the Main.


There will be bugs in my code. But it'll be fast.


And yes: I'm sorry. You'll get it (the whole community will get code/exe sooner or later). The best of them are working for You.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749780 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749783 - Posted: 17 Dec 2015, 1:15:59 UTC - in response to Message 1749780.  
Last modified: 17 Dec 2015, 1:49:10 UTC

Well, IMHO the hassle is trying to update code from 2007. Most of the stock code was written before my Mac even existed. It would be nice to move beyond CUDA 5 sometime soon.

The AP is now running and it appears the slowdown is present with the Stock code as well. The AP is 20 minutes in and only 44% complete. The CUDA task is 25 mins in and only 65% complete. So, I'd imagine it will have the same problem with starting a CUDA after an AP just as before.
============
Yep, using the Stock code the AP that should have finished in 36 mins took 45 mins, http://setiathome.berkeley.edu/result.php?resultid=4603595614
The Shorty that should have taken 6:50 took 19 mins, http://setiathome.berkeley.edu/result.php?resultid=4603096519

Oh well.
ID: 1749783 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749797 - Posted: 17 Dec 2015, 2:56:47 UTC - in response to Message 1749780.  
Last modified: 17 Dec 2015, 3:02:41 UTC

All the hassle is to do with my code being not in accordance to the Main.


Not so much that IMO. More that the project stock needs to support the older gen cards (and Cuda versions) that have been deprecated by NV, and the makefiles have not been maintained (Didn't own a Mac until a couple of months ago, and not familiar enough myself with the full autotools suite to polish that).

Now finding the clean handcrafted Makefile approach is *much* more manageable, so working on that steadily, after which things will proceed much more quickly (with stock), especially since I now have holidays. Did not yet isolate the exact cause of validation variation in the new code yet, though it appears shorties and therefore autocorrelations don't see the [>20% inconclusive to pending ratio], so that's good news for slotting in some of the key refinements.

Adding to the cluster of challenges is v8 transition.

It's totally appropriate, legit and encouraged, to make and refine you're own fork, which can ultimately test possible refinements, for which there are so many left to look at.

What's 'difficult' is maintaining the wide hardware capability in stock, while also maintaining 3 different build systems, and protecting the science from validation drift. As it happens most of the work here is helping kick things along ... so keep it up :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749797 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749825 - Posted: 17 Dec 2015, 6:36:32 UTC

Seems the 'stock' CUDA build doesn't have the problem with starting a CUDA task after both cards have been running APs. I had both cards run APs for a couple hours and the first card finished launched the CUDA task without any problems. It is running Slowly again, but didn't give any errors. It will be here once finished; http://setiathome.berkeley.edu/workunit.php?wuid=2004339873
ID: 1749825 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749832 - Posted: 17 Dec 2015, 7:07:24 UTC - in response to Message 1749825.  
Last modified: 17 Dec 2015, 7:10:55 UTC

[Edit:] I see that validated already, so looks good

Yep, will take some digging yet, and going to have to pause until the temps die down here (41C in the shade here right now)

Clean (non autotools) makefile is building all the Cuda files now, just need to finish injecting all the non-cuda files into the linking process

appropriate compiler flags for clang and nvcc on the Macs (fine tuning), along with some appropriate boinc libs/api will be the next adventure after that, though XCode seems to have built some libs already without issues (how suitable they are don;t know yet)

In any case, as soon as the dedicated Makefile is complete, will commit that ( fine tuned or not) , this way maintenance of the build becomes a lot simpler for the updates.

In theory the dedicated Makefile approach should work on Linux and various other Cuda enabled platforms as well. If that turns out to be the case, leaving only Windows as the odd duck, then I'll probably just clean out =all the legacy CPU app based makefiles. That'll just free up all the time wasted trying to hack on the Existing Makefiles that weren't really created with a Cuda application in mind in the first place.

That time then becomes available for update & optimisations into stock proper.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749832 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1749837 - Posted: 17 Dec 2015, 7:50:58 UTC - in response to Message 1749783.  
Last modified: 17 Dec 2015, 7:54:40 UTC


Yep, using the Stock code the AP that should have finished in 36 mins took 45 mins
Oh well.


What "stock code"? Code tree is exactly the same, all difference you could get is different revision numbers.

And what regarding your VM issues - any advancement in localization of issue? If you don't want to test Einstein or any other BOINC project what "VM memory usage" examples from CUDA SDK show?

EDIT: and did you find any tool that show real GPU load on cards? Wo this the chance that your both NV tasks are launched on the same device can't be excluded.
ID: 1749837 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749838 - Posted: 17 Dec 2015, 8:35:24 UTC

Is there a nvidia-smi command line tool in MAC?

+-----------------------------------------------------------------------------+
Thu Dec 17 10:33:59 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     On   | 0000:01:00.0      On |                  N/A |
| 43%   64C    P0   121W / 230W |   1828MiB /  4095MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 780     On   | 0000:02:00.0     N/A |                  N/A |
| 54%   62C    P0    N/A /  N/A |   1606MiB /  3071MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 980     On   | 0000:03:00.0     Off |                  N/A |
| 40%   57C    P0   132W / 230W |   1636MiB /  4095MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 780     On   | 0000:04:00.0     N/A |                  N/A |
| 62%   71C    P0    N/A /  N/A |   1606MiB /  3071MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       880    G   /usr/bin/X                                     158MiB |
|    0      1057    C   ...tiathome_x41zc_x86_64-pc-linux-gnu_cuda65  1618MiB |
|    0      1475    G   compiz                                          33MiB |
|    1                  Not Supported                                         |
|    2      1073    C   ...tiathome_x41zc_x86_64-pc-linux-gnu_cuda65  1618MiB |
|    3                  Not Supported                                         |
+-----------------------------------------------------------------------------+

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749838 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749839 - Posted: 17 Dec 2015, 8:36:56 UTC - in response to Message 1749837.  
Last modified: 17 Dec 2015, 8:50:03 UTC


Yep, using the Stock CUDA code the AP that should have finished in 36 mins took 45 mins
Oh well.


What "stock code"? Code tree is exactly the same, all difference you could get is different revision numbers.

And what regarding your VM issues - any advancement in localization of issue? If you don't want to test Einstein or any other BOINC project what "VM memory usage" examples from CUDA SDK show?

EDIT: and did you find any tool that show real GPU load on cards? Wo this the chance that your both NV tasks are launched on the same device can't be excluded.

There fixed it for you. Seems there is a problem running a CUDA task on One card and an AP on the Other card. This will cause Both cards to slowdown quite noticeably. Others, including Richard, have commented they have noticed similar slowdowns when mixing CUDA & APs on the same card. On the Mac it appears the slowdown will happen if you just run One Cuda task on One card and One AP on the other card. No, it would seem CUDA-Z will show the same 'Performance' even if the task is suspended, so there isn't a tool to check usage on different cards. However, it is quite obvious it's running One task on One card when running CUDAs, and One task on One card when running APs, so, there isn't any reason to suspect it suddenly decides to run a CUDA & AP on the same card somehow leaving the second card doing Nothing. The slowdown happens with Three different Mac CUDA Apps, indicating the problem isn't with the particular App.

Nothing more on the VM memory usage, and seeing as how the current systems have Unlimited VM I've placed that mystery on the low priority list. Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP. Apparently the 'Stock' CUDA code doesn't have that problem.

Is there a nvidia-smi command line tool in MAC?

I haven't heard of one. If you try it you get;
TomsMacPro:~ Tom$ nvidia-smi
-bash: nvidia-smi: command not found
ID: 1749839 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1749845 - Posted: 17 Dec 2015, 9:12:14 UTC - in response to Message 1749839.  

Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP. Apparently the 'Stock' CUDA code doesn't have that problem.

excuse me but could you tighten your terminology?
As far as I understood your 'new' code is 'Petri's code' and 'stock' is Jason's.
However there have been changes to the svn recently, so when you build always note the svn revision number.
Trying to diff petri gave me a major headache (but not as bad as unraveling boinc code) but I think I saw a dropped synchronisation that worried me at the time.
You'd want to make a thorough comparison of your 'stock' baseline and the changes made in the code to trace where the apparent startup problem might be coming from.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1749845 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749849 - Posted: 17 Dec 2015, 9:31:21 UTC - in response to Message 1749845.  

Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP. Apparently the 'Stock' CUDA code doesn't have that problem.

excuse me but could you tighten your terminology?
As far as I understood your 'new' code is 'Petri's code' and 'stock' is Jason's.
However there have been changes to the svn recently, so when you build always note the svn revision number.
Trying to diff petri gave me a major headache (but not as bad as unraveling boinc code) but I think I saw a dropped synchronisation that worried me at the time.
You'd want to make a thorough comparison of your 'stock' baseline and the changes made in the code to trace where the apparent startup problem might be coming from.

Yes, the 'new' code is 'Petri's code' and 'stock' is Jason's. Or you could say the 'New' code came via email and the 'stock' code came from the repository. I'm using my downloaded zipped copies of the repositories r3164 and r3185. The CUDA builds have been using r3185 and since there might be problems with later versions I think staying with r3185 would be a good idea for now.
ID: 1749849 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1749851 - Posted: 17 Dec 2015, 9:33:25 UTC - in response to Message 1749839.  
Last modified: 17 Dec 2015, 9:46:46 UTC


There fixed it for you.

Now it has sense, thanks.


Seems there is a problem running a CUDA task on One card and an AP on the Other card.
...
Others, including Richard, have commented they have noticed similar slowdowns when mixing CUDA & APs on the same card. On the Mac it appears the slowdown will happen if you just run One Cuda task on One card and One AP on the other card.
...
However, it is quite obvious it's running One task on One card when running CUDAs, and One task on One card when running APs, so, there isn't any reason to suspect it suddenly decides to run a CUDA & AP on the same card somehow leaving the second card doing Nothing. The slowdown happens with Three different Mac CUDA Apps, indicating the problem isn't with the particular App.

Mixed statements of facts and your unfounded conclusions from these facts.
Facts - there is slowdown, slowdown noticed running 2 tasks on single cards, slowdown noticed running 2 tasks on host having 2 cards, slowdown independed from app's rev.
Conclusion: "obvious that tasks running on 2 different devices". Sorry, for me it's just opposite. And of course one CAN expect such issue exactly if both AP and both CUDA have no slowdown but their mix - does. It was mentioned in this thread already that OpenCL and CUDA have different device enumerations!!!
Please don't jump to conclusions w/o solid proves.



Nothing more on the VM memory usage, and seeing as how the current systems have Unlimited VM I've placed that mystery on the low priority list. Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP.

As you wish though proper issue exploration could lead to bug ticket actually, cause issue exists.


Apparently the 'Stock' CUDA code doesn't have that problem.

Then I would recommend to look into possible differencies inside device enumeration very closely.
EDIT: actually, you speaking about third issue here so while checking enumeration difference still possible hardly it will apply to this third issue. Enumeration can affect issue 2: slowdown.
ID: 1749851 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749853 - Posted: 17 Dec 2015, 10:08:52 UTC - in response to Message 1749851.  


There fixed it for you.

Now it has sense, thanks.


Seems there is a problem running a CUDA task on One card and an AP on the Other card.
...
Others, including Richard, have commented they have noticed similar slowdowns when mixing CUDA & APs on the same card. On the Mac it appears the slowdown will happen if you just run One Cuda task on One card and One AP on the other card.
...
However, it is quite obvious it's running One task on One card when running CUDAs, and One task on One card when running APs, so, there isn't any reason to suspect it suddenly decides to run a CUDA & AP on the same card somehow leaving the second card doing Nothing. The slowdown happens with Three different Mac CUDA Apps, indicating the problem isn't with the particular App.

Mixed statements of facts and your unfounded conclusions from these facts.
Facts - there is slowdown, slowdown noticed running 2 tasks on single cards, slowdown noticed running 2 tasks on host having 2 cards, slowdown independed from app's rev.
Conclusion: "obvious that tasks running on 2 different devices". Sorry, for me it's just opposite. And of course one CAN expect such issue exactly if both AP and both CUDA have no slowdown but their mix - does. It was mentioned in this thread already that OpenCL and CUDA have different device enumerations!!!
Please don't jump to conclusions w/o solid proves.



Nothing more on the VM memory usage, and seeing as how the current systems have Unlimited VM I've placed that mystery on the low priority list. Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP.

As you wish though proper issue exploration could lead to bug ticket actually, cause issue exists.


Apparently the 'Stock' CUDA code doesn't have that problem.

Then I would recommend to look into possible differencies inside device enumeration very closely.
EDIT: actually, you speaking about third issue here so while checking enumeration difference still possible hardly it will apply to this third issue. Enumeration can affect issue 2: slowdown.

Well, if you know of something other than BOINC on a Mac to tell which App is running on which card let me know, because I haven't found anything. Assuming one card does nothing for around 50 minutes, you would think the temperature would drop significantly. I can monitor the I/O fan/temperature and I haven't noticed any change in temperature, so it would seem there isn't any change in card temperatures. The Apps I'm using has the GPU load near 100% running a Single task, you would think there would be a larger slowdown if One card was running Two Apps both trying to use 100% GPU.

The bug ticket would probably fail as soon as I mentioned code from an email ;-)
ID: 1749853 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1749862 - Posted: 17 Dec 2015, 10:55:02 UTC - in response to Message 1749845.  

Top priority is determining why the 'New' CUDA code gives Errors after running APs on both cards and then trying to start a CUDA task on the first card finishing it's AP. Apparently the 'Stock' CUDA code doesn't have that problem.

excuse me but could you tighten your terminology?
As far as I understood your 'new' code is 'Petri's code' and 'stock' is Jason's.
However there have been changes to the svn recently, so when you build always note the svn revision number.
Trying to diff petri gave me a major headache (but not as bad as unraveling boinc code) but I think I saw a dropped synchronisation that worried me at the time.
You'd want to make a thorough comparison of your 'stock' baseline and the changes made in the code to trace where the apparent startup problem might be coming from.


I've dropped almost all synchronizations and implemented cudaStreams nearly everywhere with apprppriate events and waits where needed.

For the "out of memory" error with cudaMallocHost it is possible that a card has its memory fragmented after an openCL app. "My" cuda source needs some huge continuous blocks of GPU memory for results that are calculated in advance. It is not optimal, but works for a 3Gb GTX780 and a 4Gb GTX980. It is a kind of a miracle it works with a 2 Gb 750Ti most of the time.

I could try adding a cudaDeviceReset() as the first cuda call in cuda MB app if it would defragment/clean the memory after an openCL (AP) app has finished.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1749862 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1749870 - Posted: 17 Dec 2015, 11:49:04 UTC - in response to Message 1749862.  

I've dropped almost all synchronizations and implemented cudaStreams nearly everywhere with apprppriate events and waits where needed.

might well be, in the code fragment I was checking nothing replaced the sync. not that I had any idea why a synch was there in the first place, but I was focusing on other things.

For the "out of memory" error with cudaMallocHost it is possible that a card has its memory fragmented after an openCL app. "My" cuda source needs some huge continuous blocks of GPU memory for results that are calculated in advance. It is not optimal, but works for a 3Gb GTX780 and a 4Gb GTX980. It is a kind of a miracle it works with a 2 Gb 750Ti most of the time.

I could try adding a cudaDeviceReset() as the first cuda call in cuda MB app if it would defragment/clean the memory after an openCL (AP) app has finished.

IMO worth a try, see if it clears up TBar's problem.

As to needing that much VRAM, that's a nogo for stock anyway. A 512k card should be good to run.
Unless we find a 256k card somewhere that is in the hands of a competent alpha tester (cough cough) or at minimum somebody who can follow instructions, we don't have a 256k testbed any more [my rig died]. I suppose a suitable card could be found and placed in the hands of a competent person, but it seems excessive for a presumably negligible amount of hosts out there.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1749870 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.