OpenCL AstroPulse crash after processing completion - write here.

Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379293 - Posted: 10 Jun 2013, 11:13:13 UTC - in response to Message 1379266.  


Is there a newer version I should try? Most of the discussion in this thread seems to be aimed at ATI cards. I want to be able to set my count back to .5 to properly load these NV GPUs.

Await debug build.

I shall limp along until the new builds are available.
My 1763 WU cache now has 1704 AP WUs in it. So even opting out of AP would take me a long time to clear.

Thank you for your reply, I shall watch this thread for further news.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379293 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1379295 - Posted: 10 Jun 2013, 11:25:34 UTC - in response to Message 1379290.  
Last modified: 10 Jun 2013, 11:32:59 UTC

BOINC is latest version. I even have 2 cores "free" of work for a single GPU.

If there isn't any special need to run a certain video driver, Version: 266.58 WHQL should work fine on your Host. It DOESN'T use a full CPU core to process Astropulse tasks and is also good up to, and including CUDA 3.2. I use it on both my NV 8800 & GTS 250 to run APs when I desire. From my tests, there isn't any slow down from using 266.58 instead of a driver that uses a full CPU core. The release notes say it good for GeForce GTX 580s and below. You would need to run the Clean Install Option...

GeForce/ION Driver Release 266.58 WHQL
ID: 1379295 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1379301 - Posted: 10 Jun 2013, 11:43:08 UTC - in response to Message 1379295.  

BOINC is latest version. I even have 2 cores "free" of work for a single GPU.

If there isn't any special need to run a certain video driver, Version: 266.58 WHQL should work fine on your Host. It DOESN'T use a full CPU core to process Astropulse tasks and is also good up to, and including CUDA 3.2. I use it on both my NV 8800 & GTS 250 to run APs when I desire. From my tests, there isn't any slow down from using 266.58 instead of a driver that uses a full CPU core. The release notes say it good for GeForce GTX 580s and below. You would need to run the Clean Install Option...

GeForce/ION Driver Release 266.58 WHQL


That would be worth a try for Mark also then.



With each crime and every kindness we birth our future.
ID: 1379301 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379302 - Posted: 10 Jun 2013, 11:45:31 UTC - in response to Message 1379301.  

BOINC is latest version. I even have 2 cores "free" of work for a single GPU.

If there isn't any special need to run a certain video driver, Version: 266.58 WHQL should work fine on your Host. It DOESN'T use a full CPU core to process Astropulse tasks and is also good up to, and including CUDA 3.2. I use it on both my NV 8800 & GTS 250 to run APs when I desire. From my tests, there isn't any slow down from using 266.58 instead of a driver that uses a full CPU core. The release notes say it good for GeForce GTX 580s and below. You would need to run the Clean Install Option...

GeForce/ION Driver Release 266.58 WHQL


That would be worth a try for Mark also then.

Except that I need higher driver revisions to run 4.2 and 5.0 Cuda, which are the versions best suited to the MB apps on my GPUs.
Meowsigh.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379302 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379307 - Posted: 10 Jun 2013, 12:03:21 UTC - in response to Message 1379290.  

BOINC is latest version. I even have 2 cores "free" of work for a single GPU.

It's not connected with free cores. Should not at least.
As experiment try to downgrade BOINC.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379307 · Report as offensive
Ibo

Send message
Joined: 22 Mar 00
Posts: 6
Credit: 6,075,931
RAC: 0
Austria
Message 1379319 - Posted: 10 Jun 2013, 12:34:21 UTC - in response to Message 1379307.  

As experiment try to downgrade BOINC.

Back on 7.0.28.
It might take a while till I get some AP units (one is in queue). The chance to get an error was so far 2 out of 3.
ID: 1379319 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379416 - Posted: 10 Jun 2013, 16:13:07 UTC

As an experiment, I raised the priority of boinc.exe to high along with the app on the 3 problem rigs I have. Nope, that didn't stop the errors either. Oh well, it was a thought.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379416 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379432 - Posted: 10 Jun 2013, 16:44:11 UTC - in response to Message 1379430.  

As an experiment, I raised the priority of boinc.exe to high along with the app on the 3 problem rigs I have. Nope, that didn't stop the errors either. Oh well, it was a thought.


Man, are you trashing my beloved AP WUs Mark? Tell me it isn't so please...

Sorry....I have been trying to mitigate the damage. But it's still more than I am willing to accept.

Hoping that Raistmer can come up with a build that will work better on those rigs.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379432 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379436 - Posted: 10 Jun 2013, 16:48:14 UTC

I have one idea... but it doesn't explain why you see issues on older BOINC too.
What about trying to get to older NV drivers for testing? I know it will hit your CUDA MB performance and don't consider this as solution even as workaround but worth to try and see will it reduce number of failures or not.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379436 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1379442 - Posted: 10 Jun 2013, 16:53:16 UTC - in response to Message 1379436.  

I have one idea... but it doesn't explain why you see issues on older BOINC too.
What about trying to get to older NV drivers for testing? I know it will hit your CUDA MB performance and don't consider this as solution even as workaround but worth to try and see will it reduce number of failures or not.


What i`ve suggested already.



With each crime and every kindness we birth our future.
ID: 1379442 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379444 - Posted: 10 Jun 2013, 16:56:54 UTC - in response to Message 1379436.  
Last modified: 10 Jun 2013, 17:19:58 UTC

I have one idea... but it doesn't explain why you see issues on older BOINC too.
What about trying to get to older NV drivers for testing? I know it will hit your CUDA MB performance and don't consider this as solution even as workaround but worth to try and see will it reduce number of failures or not.

I had older drivers on them....304.79, I believe.
Mike had suggested updating them as one of the first things to try to get the errors to stop.

I also just had another thought.
Until Eric patches the servers to stop sending VLAR work to NV hosts (which he hoped to do today or tomorrow), I am still getting VLAR GPU work.
Now, there has been much documentation about how badly VLAR on NV Cuda hosts can tie up the whole system, and will in fact slow down another non-VLAR task running at the same time. I am wondering if a VLAR running whilst the AP task is trying to finish could be having some impact here. They can and do cause problems with system hangs for some people.

If the fix is implemented and my caches clear the VLAR tasks, it would be interesting to see if THAT might be having an impact here. I suspect it's not the answer, but...

And the odd thing is that the app is running fine on my top host with 3 580s. Not a single AP error turned in. Same Boinc, same OS, still on 304.79 drivers.
Ditto for my number 2 and 3 rigs....no AP errors.
Now, the rigs having the problems are older mobos, processors, and are slower. One is my only dual core.

Just more thinking out loud.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379444 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379448 - Posted: 10 Jun 2013, 17:04:31 UTC - in response to Message 1379444.  
Last modified: 10 Jun 2013, 17:05:52 UTC

VLARs are very useful for pulse kind of signals cause they allow better Pulse accumulation (better sensitivity) and wider pulse sizes (bigger parameter space for search). Unfortunately, they don't map nicely to current GPU hardware (more precisely, our current algorithm implementation doesn't map too well). This can and will be improved over time. But no need to perceive VLAR tasks as some type of trash tasks, they don't trash tasks at all.
And because some tapes contain mostly VLAR tasks it would be good if GPU would be able to process them too. AFAIK ATi cards have less performance hit than NV ones but new NV cards have less hit than pre-FERMI ones. So it was interesting to try if modern GPU can handle VLARs well enough.
Flaming on boards shows that "still not right time". Maybe we could get separate checkbox for VLARs on GPU indeed. To enable VLARs on GPU when no other work for GPU available.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379448 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379457 - Posted: 10 Jun 2013, 17:23:33 UTC - in response to Message 1379448.  
Last modified: 10 Jun 2013, 17:24:04 UTC

VLARs are very useful for pulse kind of signals cause they allow better Pulse accumulation (better sensitivity) and wider pulse sizes (bigger parameter space for search). Unfortunately, they don't map nicely to current GPU hardware (more precisely, our current algorithm implementation doesn't map too well). This can and will be improved over time. But no need to perceive VLAR tasks as some type of trash tasks, they don't trash tasks at all.
And because some tapes contain mostly VLAR tasks it would be good if GPU would be able to process them too. AFAIK ATi cards have less performance hit than NV ones but new NV cards have less hit than pre-FERMI ones. So it was interesting to try if modern GPU can handle VLARs well enough.
Flaming on boards shows that "still not right time". Maybe we could get separate checkbox for VLARs on GPU indeed. To enable VLARs on GPU when no other work for GPU available.

For now, I think Eric is just going to stop sending VLAR to NV. Same as he did for v6.
I would not mind having the other approaches available as well.
Of course, the ultimate would be an app for NV MB that does not choke on VLAR work so it would no longer be an issue.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379457 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379630 - Posted: 11 Jun 2013, 0:17:04 UTC

try this build for NV.
and put pdb file along with exe.
https://dl.dropboxusercontent.com/u/60381958/AP6_win_x86_SSE2_OpenCL_NV_r1857.7z
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379630 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379714 - Posted: 11 Jun 2013, 6:25:24 UTC - in response to Message 1379630.  
Last modified: 11 Jun 2013, 6:34:22 UTC

try this build for NV.
and put pdb file along with exe.
https://dl.dropboxusercontent.com/u/60381958/AP6_win_x86_SSE2_OpenCL_NV_r1857.7z

Tried it on one host, did not work so well.
Copied everything over except the readme, Authors, Copying, and Copyright files.
Ran aimerge. Rebooted.

It did start up and run 1857 instead of 1843, noticed in task manager right after startup that it was using a whole CPU core (might just have been getting things ready to run), but then it went into a loop restarting the task that had already been running, and then after a few seconds came back with exited but no finished file or something like that. You might have to reset the project, yada yada. And just kept looping like that.
Re-ran the installer to get back to 1843.

Should there have been a problem restarting a task that was already 60% completed with 1843?
Or might I have done something wrong?

Meow?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379714 · Report as offensive
Ibo

Send message
Joined: 22 Mar 00
Posts: 6
Credit: 6,075,931
RAC: 0
Austria
Message 1379723 - Posted: 11 Jun 2013, 6:49:59 UTC - in response to Message 1379319.  

As experiment try to downgrade BOINC.

Back on 7.0.28.
It might take a while till I get some AP units (one is in queue).


7.0.28 did no good:
http://setiathome.berkeley.edu/result.php?resultid=3033287726
I just started one using build 1857.
ID: 1379723 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1379725 - Posted: 11 Jun 2013, 6:53:37 UTC - in response to Message 1379714.  

Yes, there is a problem swapping Apps in mid task. You might want to finish the ongoing one. The program creates binary files when it runs, if it finds that the App & Binaries don't match, it could cause problems. You need to start with a fresh task, so it places the correct files in the Slots. Since you have trashed so many, I don't see a problem trashing another. You might want to suspend all the tasks, stop, install the new App, start, then unsuspend one new task and see how it goes.
ID: 1379725 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379726 - Posted: 11 Jun 2013, 6:56:53 UTC - in response to Message 1379725.  
Last modified: 11 Jun 2013, 6:58:36 UTC

Yes, there is a problem swapping Apps in mid task. You might want to finish the ongoing one. The program creates binary files when it runs, if it finds that the App & Binaries don't match, it could cause problems. You need to start with a fresh task, so it places the correct files in the Slots. Since you have trashed so many, I don't see a problem trashing another. You might want to suspend all the tasks, stop, install the new App, start, then unsuspend one new task and see how it goes.

I'll not get that fancy.
I'll just recopy the aistub and run aimerge again, and the manually abort the WU that is underway.
We'll see if it starts a new one successfully.

It's getting late here, but I'll give it a go.
I would like to have it running whilst I get some sleep to see what's up in the morning.

And thanks for that bit of info, TBar.

Meowf.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379726 · Report as offensive
Ibo

Send message
Joined: 22 Mar 00
Posts: 6
Credit: 6,075,931
RAC: 0
Austria
Message 1379727 - Posted: 11 Jun 2013, 6:58:45 UTC - in response to Message 1379723.  

I just started one using build 1857.

It is getting even more strange. I just saw that it set back the elapsed time from 1:53 to 1:30 several times. The following is contained several times in the stderr.txt. No idea if it is related. Are there any other things that might be of interest?

### Restart at 0.00 percent.
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
s0=0;s1=0,s2=0
ERROR: some exception inside short FFA, probably video-driver restart, restarting app...
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 0
Info: BOINC provided device ID used
Used GPU device parameters are:
	Number of compute units: 2
	Single buffer allocation size: 128MB
	max WG size: 512
	FERMI path used: no

Build features: Non-graphics	OpenCL	USE_OPENCL_NV	OCL_ZERO_COPY	COMBINED_DECHIRP_KERNEL	FFTW	USE_INCREASED_PRECISION	USE_SSE2	x86	
     CPUID: Intel(R) Core(TM) i7 CPU       Q 740  @ 1.73GHz 

     Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2
ID: 1379727 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379728 - Posted: 11 Jun 2013, 7:04:17 UTC

ok, stop using this one and await next attempt.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379728 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.