I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 58 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1763091 - Posted: 7 Feb 2016, 8:16:41 UTC - in response to Message 1763074.  
Last modified: 7 Feb 2016, 8:21:00 UTC


They don't seem to make a difference. The run time is still about twice the CPU time.

-use_sleep to reduce CPU time? (if implemented on OS X) The more slow GPU the less negative impact -use_sleep will have on its performance, but could save some CPU cycles.

Also need to check GPU counters I don't see enabled for OS X build you use.
usually they ignored but they can be good indication why CPU time increases a lot.
There is possibility that your GPU returns wrong results with some of GPU search kernels, but doesn't damage data array. In such case app in whole will return valid results but will spend much more CPU time than usually being "semi-CPU" one (CPU processing will fix errors that GPU made). This will result in sharp increase of some search misses in counters.

Look for example this windows result:
http://setiathome.berkeley.edu/result.php?resultid=4692482310

here are the counters I speak of:

class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0


class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0


class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0


class PC_triplet_find_hit: total=206, N=206, <>=1, min=1 max=1
class PC_triplet_find_miss: total=34, N=34, <>=1, min=1 max=1


class PC_pulse_find_hit: total=237, N=237, <>=1, min=1 max=1
class PC_pulse_find_miss: total=3, N=3, <>=1, min=1 max=1
class PC_pulse_find_early_miss: total=1, N=1, <>=1, min=1 max=1
class PC_pulse_find_2CPU: total=1, N=1, <>=1, min=1 max=1


class PoT_transfer_not_needed: total=206, N=206, <>=1, min=1 max=1
class PoT_transfer_needed: total=35, N=35, <>=1, min=1 max=1
ID: 1763091 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763103 - Posted: 7 Feb 2016, 9:21:54 UTC

Also, when all else fails, look at how similar machines are running, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=77941&offset=60
Note how that machine has much more consistent times than Yours. That is usually caused by having a Freed CPU core. Without a Freed core you get inconsistent times and results, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=77827&offset=20
So, try freeing a CPU core and use the same settings as Beta;
-sbs 128 -oclfft_tune_gr 64 -oclfft_tune_wg 64 -period_iterations_num 32 -no_caching
ID: 1763103 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1763226 - Posted: 7 Feb 2016, 20:37:56 UTC - in response to Message 1763103.  

Any idea why my max working group is not getting updated to 256? Here's the top of my stderr for the following wu:

http://setiathome.berkeley.edu/result.php?resultid=4709802139

Seems to look like the max work group is still 64. On my windows machine that line reads 256 as I would expect.

<stderr_txt>
Running on device number: 0
DATA_CHUNK_UNROLL set to:18
oclFFT plan class overrides requested: global radix 256; local radix 16; max workgroup size 256
FFA thread block override value:16384
FFA thread fetchblock override value:8192
TUNE: kernel 1 now has workgroup size of (64,4,1)
TUNE: kernel 2 now has workgroup size of (64,4,1)
OpenCL platform detected: Apple
Number of OpenCL devices found : 2
BOINC assigns slot on device #0.
Info: BOINC provided OpenCL device ID used
Used GPU device parameters are:
Number of compute units: 32
Single buffer allocation size: 256MB
Total device global memory: 6144MB
max WG size: 64
local mem type: Real
ID: 1763226 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763238 - Posted: 7 Feb 2016, 21:03:11 UTC - in response to Message 1763226.  
Last modified: 7 Feb 2016, 21:07:19 UTC

I'd say there is something in the Code causing that WG setting. If you remember back about a year ago, there was a problem trying to compile the first series of Apple MB Apps because Apple uses 1024 for a WG size. The code didn't allow for a Work Group setting of 1024, only 256. There was some work around that wasn't included in the code people such as myself download. I just Compile what is present at the Repository, if there is something somewhere else I don't see or receive it. If you look at the CLinfo program in Mountain Lion you can see it correctly lists the Work Group as 1024 which is where the app was compiled.

So, I'd say there is some work around in place *somewhere* that changes the Correct Apple Mountain Lion Work Group Size on ATI GPUs from 1024 to 256, or in the case of the files in the Repository, to 64.
It would be nice if the SETI Code used the Correct Apple WG size on ATI GPUs, I just compile it.
ID: 1763238 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1763247 - Posted: 7 Feb 2016, 21:25:04 UTC - in response to Message 1763238.  


It would be nice if the SETI Code used the Correct Apple WG size on ATI GPUs, I just compile it.

Before marking smth correct or incorrect worth to look back and see why additional workaround was needed. If such restriction was added then it was needed apparently. It means particular config allows correct operation only with such WG size.
ID: 1763247 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1763248 - Posted: 7 Feb 2016, 21:26:39 UTC - in response to Message 1763238.  

Oh yeah, I remember that problem with the WG size. I had totally forgotten about that. Kinda hinders performance compared to Windows but not by a whole lot. Maybe a comparable fix is possible and will work its way into the repository in the not too distant future.:)

Thanks!

Chris
ID: 1763248 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763251 - Posted: 7 Feb 2016, 21:37:08 UTC - in response to Message 1763247.  
Last modified: 7 Feb 2016, 22:18:27 UTC


It would be nice if the SETI Code used the Correct Apple WG size on ATI GPUs, I just compile it.

Before marking smth correct or incorrect worth to look back and see why additional workaround was needed. If such restriction was added then it was needed apparently. It means particular config allows correct operation only with such WG size.

I think I remember it was 1024 in Mountain Lion, but changed to 256 in Mavericks. The current App was compiled in Mountain Lion as I still get the Linker Error when trying to Compile in any OSX higher.
So, any idea how to fix it? Apparently there is a workaround somewhere as the Stock App says WG 256. Either it's reporting wrong, or there is something being used that isn't in the Files I'm downloading from the Repository.

I've got someplace to be, so, I'll have to get back to it later...

Before I go, one last look. It is set for WG 128, but reports WG 256?
http://setiathome.berkeley.edu/result.php?resultid=4707438342
Maximum single buffer size set to:128MB
oclFFT global radix override set to:256
oclFFT max WG size override set to:128

Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 128MB
Total device global memory: 1024MB
max WG size: 256

???
ID: 1763251 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1763281 - Posted: 7 Feb 2016, 23:23:27 UTC - in response to Message 1763251.  

oclFFT WG size used for FFT plan class and WG size used through app itself are different things.
And size reported by Apple's runtime is just third distinct thing.
ID: 1763281 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763377 - Posted: 8 Feb 2016, 4:55:08 UTC - in response to Message 1763281.  

Ok, so I shouldn't expect the WG settings to show up in the Used GPU device parameters field.
After looking over the post by Chris I NOW see he is referring to the Stock AP7r2750. Hmmm, I had nothing to do with that one. I do remember seeing the WG 64, but it doesn't seem to be causing any problems. In fact, apparently whatever was causing that was gone by AP7r2934 as the AP App I built has the WG listed as 256, http://setiathome.berkeley.edu/result.php?resultid=4710552601
Used GPU device parameters are:
Number of compute units: 14
Single buffer allocation size: 224MB
Total device global memory: 1024MB
max WG size: 256

Comparing the Builds on My machine there isn't any noticeable difference between r2750 & r2934, they preform about the same. All it would take then, is to compile the App with a more recent source version to have it show WG 256. Whether it would make any difference on a different machine is unknown.
ID: 1763377 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763458 - Posted: 8 Feb 2016, 15:18:06 UTC - in response to Message 1763281.  

There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that?
single pulses: 0
repetitive pulses: 1
percent blanked: 0.00

single pulses: 0
repetitive pulses: 0
percent blanked: 0.00
ID: 1763458 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1763460 - Posted: 8 Feb 2016, 15:33:46 UTC - in response to Message 1762524.  

The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.
Tom,
could you try also on your ATI Radeon HD 4670 ? Need to be convinced that the current beta 8.06 works on that lower class GPU, too.
Other testers at beta with HD 4670 seem to have problems to finish work units with valid results.
Maybe there is some other problem ...


Urs -

I have run a few WUs on Beta on my iMac with the ATI Radeon HD 4670. It runs them, but not well. The first one it ran took a long time and got a computation error. After that it seems to run them correctly. The problem is that it creates rectangles of messed up screen image randomly around the iMac's screen. I suspect the GPU just doesn't have enough VRAM to run the app properly. Here are three that have run so far:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606611 - Computation error

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606866

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606854

- Tom
ID: 1763460 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1763468 - Posted: 8 Feb 2016, 16:08:42 UTC - in response to Message 1763458.  
Last modified: 8 Feb 2016, 16:11:22 UTC

There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that?
single pulses: 0
repetitive pulses: 1
percent blanked: 0.00

single pulses: 0
repetitive pulses: 0
percent blanked: 0.00


Failed task:
ffa total=1.967E+12 , N=999 , <>=1.969E+09 , min=8.032E+08 , max=1.192E+10

FFA blocks counters:
FFA_fetch total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_tt_build total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_compare total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_coadd total=2.342E+09 , N=159729 , <>=1.466E+04 , min=6.840E+03 , max=5.401E+06
FFA_stride_add total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
T_GPU_buffer_read_backs total=0.0000E+00, N=0 , <>=0 , min=0 , max=0


correct result:
class T_ffa: total=1.32e+012, N=999, <>=1.32e+009, min=5.48e+008, max=7.55e+009

FFA blocks counters:
class T_FFA_fetch: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_FFA_tt_build: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_FFA_compare: total=1.80e+007, N=8, <>=2.26e+006, min=5.30e+005, max=6.25e+006
class T_FFA_coadd: total=5.70e+008, N=124771, <>=4.57e+003, min=2.86e+003, max=3.01e+005
class T_FFA_stride_add: total=9.07e+004, N=7, <>=1.30e+004, min=1.06e+004, max=1.69e+004
class T_GPU_buffer_read_backs: total=0, N=0, <>=0, min=0 max=0

correct result 2:
ffa total=2.838E+12 , N=999 , <>=2.841E+09 , min=1.005E+09 , max=1.665E+10

FFA blocks counters:
FFA_fetch total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_tt_build total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_compare total=9.362E+06 , N=8 , <>=1.170E+06 , min=2.410E+05 , max=3.157E+06
FFA_coadd total=9.261E+09 , N=242209 , <>=3.823E+04 , min=2.521E+04 , max=1.631E+07
FFA_stride_add total=3.891E+05 , N=7 , <>=5.559E+04 , min=5.390E+04 , max=5.771E+04
GPU_buffer_read_backs total=0.0000E+00, N=0 , <>=0 , min=0 , max=0

So, failed task never tried to find signal after pre-compute.
ID: 1763468 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763470 - Posted: 8 Feb 2016, 16:16:00 UTC - in response to Message 1763468.  

There is a Real problem with the OSX ATI AP App though. Seems the signal strength, or whatever, is just enough off that it misses a signal ever now and then. Usually it's only noticeable when there is only One or Zero signals found, as in this task, http://setiathome.berkeley.edu/workunit.php?wuid=2054897008. It's been happening Forever and happens in both r2750 and 2934. It is the reason I built r2934 and the ones before it. I see it on other machines as well. Anyway to fix that?
single pulses: 0
repetitive pulses: 1
percent blanked: 0.00

single pulses: 0
repetitive pulses: 0
percent blanked: 0.00


Failed task:
ffa total=1.967E+12 , N=999 , <>=1.969E+09 , min=8.032E+08 , max=1.192E+10

FFA blocks counters:
FFA_fetch total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_tt_build total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_compare total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_coadd total=2.342E+09 , N=159729 , <>=1.466E+04 , min=6.840E+03 , max=5.401E+06
FFA_stride_add total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
T_GPU_buffer_read_backs total=0.0000E+00, N=0 , <>=0 , min=0 , max=0


correct result:
class T_ffa: total=1.32e+012, N=999, <>=1.32e+009, min=5.48e+008, max=7.55e+009

FFA blocks counters:
class T_FFA_fetch: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_FFA_tt_build: total=0.00e+000, N=0, <>=0.00e+000, min=1.84e+019, max=0.00e+000
class T_FFA_compare: total=1.80e+007, N=8, <>=2.26e+006, min=5.30e+005, max=6.25e+006
class T_FFA_coadd: total=5.70e+008, N=124771, <>=4.57e+003, min=2.86e+003, max=3.01e+005
class T_FFA_stride_add: total=9.07e+004, N=7, <>=1.30e+004, min=1.06e+004, max=1.69e+004
class T_GPU_buffer_read_backs: total=0, N=0, <>=0, min=0 max=0

correct result 2:
ffa total=2.838E+12 , N=999 , <>=2.841E+09 , min=1.005E+09 , max=1.665E+10

FFA blocks counters:
FFA_fetch total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_tt_build total=0.000E+00 , N=0 , <>=0.000E+00 , min=1.845E+19 , max=0.000E+00
FFA_compare total=9.362E+06 , N=8 , <>=1.170E+06 , min=2.410E+05 , max=3.157E+06
FFA_coadd total=9.261E+09 , N=242209 , <>=3.823E+04 , min=2.521E+04 , max=1.631E+07
FFA_stride_add total=3.891E+05 , N=7 , <>=5.559E+04 , min=5.390E+04 , max=5.771E+04
GPU_buffer_read_backs total=0.0000E+00, N=0 , <>=0 , min=0 , max=0

So, failed task never tried to find signal after pre-compute.

...and the fix is?
Note the cards, I have 3 of them, Don't have this problem in Linux or Windows. They only fail with the Zero or 1 signal task in OSX.
ID: 1763470 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1763476 - Posted: 8 Feb 2016, 16:50:56 UTC - in response to Message 1763470.  
Last modified: 8 Feb 2016, 16:51:30 UTC

...and the fix is?
Note the cards, I have 3 of them, Don't have this problem in Linux or Windows. They only fail with the Zero or 1 signal task in OSX.

Try to catch task for offline benchmarking.
In offline run try -v 3 -skip_ffa_precompute
save log for comparison with another build (windows/linux) running with the same parameters. Preferable to set -ffa_block and other values to the same to simplify comparison.

Then provide both logs for analys.
ID: 1763476 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763483 - Posted: 8 Feb 2016, 17:12:06 UTC - in response to Message 1763476.  

I'll keep an eye on the inconclusives and see if another suspect shows. Usually though I only see one every couple hundred tasks or so. It might be a while. Right now I don't see any, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=0&show_names=0&state=3&appid=20
ID: 1763483 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1763505 - Posted: 8 Feb 2016, 18:12:14 UTC - in response to Message 1763460.  

The opencl_ati_mac app currently being tested on beta finally works properly on a HD4XXX without having to add -no_caching to the command line file.
Tom,
could you try also on your ATI Radeon HD 4670 ? Need to be convinced that the current beta 8.06 works on that lower class GPU, too.
Other testers at beta with HD 4670 seem to have problems to finish work units with valid results.
Maybe there is some other problem ...


Urs -

I have run a few WUs on Beta on my iMac with the ATI Radeon HD 4670. It runs them, but not well. The first one it ran took a long time and got a computation error. After that it seems to run them correctly. The problem is that it creates rectangles of messed up screen image randomly around the iMac's screen. I suspect the GPU just doesn't have enough VRAM to run the app properly. Here are three that have run so far:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606611 - Computation error

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606866

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606854

- Tom

Ok Tom.
First thanks for running the beta test app on the ATI HD 4670. The times for the results are what to expect because of GPU core frequency and RAM type from these GPUs, similar to the runtimes i get on linux but with somewhat higher clocked ATI HD 4670.

Now could you test how using less RAM influences the screen garbage you reported to see. Please add to a commandline only " -sbs 88", nothing else. Same way like you did it before for the no-caching option.
_\|/_
U r s
ID: 1763505 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1763574 - Posted: 9 Feb 2016, 0:48:09 UTC - in response to Message 1763476.  

...and the fix is?
Note the cards, I have 3 of them, Don't have this problem in Linux or Windows. They only fail with the Zero or 1 signal task in OSX.

Try to catch task for offline benchmarking.
In offline run try -v 3 -skip_ffa_precompute
save log for comparison with another build (windows/linux) running with the same parameters. Preferable to set -ffa_block and other values to the same to simplify comparison.

Then provide both logs for analys.

This is another strange one, http://setiathome.berkeley.edu/result.php?resultid=4698650636
That was a HD 7750 in Linux. The same machine now has a 6850 that spent a couple years in the Mac.
The basic count was the same on All three;
single pulses: 2
repetitive pulses: 0
percent blanked: 4.98
But the 7750 got an Invalid.
ID: 1763574 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1763673 - Posted: 9 Feb 2016, 15:22:55 UTC - in response to Message 1763505.  

Ok Tom.
First thanks for running the beta test app on the ATI HD 4670. The times for the results are what to expect because of GPU core frequency and RAM type from these GPUs, similar to the runtimes i get on linux but with somewhat higher clocked ATI HD 4670.

Now could you test how using less RAM influences the screen garbage you reported to see. Please add to a commandline only " -sbs 88", nothing else. Same way like you did it before for the no-caching option.


Urs -

I ran the Beta app on both of my iMacs last night with "-sbs 88" in the command line. It worked with no screen glitches.

iMac with the ATI Radeon HD 4670:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606662

It is currently 6.5 hours into a task and at only 25%. I suspect it will be another computation error.

iMac with the ATI Radeon HD 4850:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22664176
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22664113
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663993
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663984
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663954
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22663952

I will keep running tasks on the 4670 and report back.

- Tom
ID: 1763673 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1763908 - Posted: 10 Feb 2016, 18:11:59 UTC - in response to Message 1763673.  

iMac with the ATI Radeon HD 4670:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=22606662

It is currently 6.5 hours into a task and at only 25%. I suspect it will be another computation error.

My fear is that the long runtime still points to a low memory problem.

Could you try to reduce to " -sbs 80" let some tasks (10+) run one by one and watch if another too long running wu happens on your ATI HD 4670 ?

If it happens again try to reduce more, until times for similar angle ranged wus are also always similar. That will be the new default for this app.
_\|/_
U r s
ID: 1763908 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1764119 - Posted: 11 Feb 2016, 18:31:06 UTC

I will try -sbs 80. Adding -sbs 88 made a big difference. The screen glitches went away and it is completing most tasks in just under 10k seconds. I think the 29 k task was part way through when I added -sbs 88. The results are here:

http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=78060

I will keep running WUs when the computer is available and watch the results with -sbs 80.

- Tom
ID: 1764119 · Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.