GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1786751 - Posted: 11 May 2016, 15:43:16 UTC - in response to Message 1786742.  

I was finally able to compile Petri's code in Yosemite but I'm back to getting a number of these "SIGBUS: bus error". It's a little faster with ToolKit 7.5, it's just strange why I'm getting what appears to be a BOINC Error. The Task is finished, the results printed, then it gives an Error;
...
setiathome v8 enhanced x41p_zi r3452-64, Cuda 7.50 special
Compiled with NVCC 7.5. Modifications done by petri33.

Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.008975
Sigma 372
Sigma > GaussTOffsetStop: 372 > -308
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.
13
Flopcounter: 31398940702413.851562

Spike count:    1
Autocorr count: 1
Pulse count:    5
Triplet count:  3
Gaussian count: 0
SIGBUS: bus error

Crashed executable name: setiathome_x41p_zi_x86_64-apple-darwin_cuda75
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.10.5 build 14F1713
Wed May 11 10:59:49 2016
...

I've switched back to BOINC 7.4.36, but, I think I've been here and done this before. I've been using <no_priority_change> for years.
ID: 1786751 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1786753 - Posted: 11 May 2016, 15:48:47 UTC - in response to Message 1786751.  
Last modified: 11 May 2016, 15:49:07 UTC

Assuming you changed maxregisters up to 64 as you stated elsewhere, try dialling it back to 32. The long pulsefinds in both baseline and Petri's code are greedy, and will dominate in guppi vlars, potentially tripping up OS failsafes.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1786753 · Report as offensive
Profile Gianfranco Lizzio
Volunteer tester
Avatar

Send message
Joined: 5 May 99
Posts: 39
Credit: 28,049,113
RAC: 87
Italy
Message 1786756 - Posted: 11 May 2016, 15:57:05 UTC - in response to Message 1786753.  

Assuming you changed maxregisters up to 64 as you stated elsewhere, try dialling it back to 32.


I'm using maxrregcount=64 in El Capitan without any problem.

Gianfranco
I don't want to believe, I want to know!
ID: 1786756 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1786757 - Posted: 11 May 2016, 15:58:53 UTC - in response to Message 1786753.  

I think part of the compiling problem I was having before was trying it with maxregisters=32. I believe it's coded to use maxregisters=64. I don't have this problem using maxregisters=64 with the ToolKit 6.5 App compiled in ML. Also, it doesn't have a problem with the Science App, the App is Finished, and the results printed before the SIGBUS error.
ID: 1786757 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1786758 - Posted: 11 May 2016, 15:59:33 UTC - in response to Message 1786756.  

Assuming you changed maxregisters up to 64 as you stated elsewhere, try dialling it back to 32.


I'm using maxrregcount=64 in El Capitan without any problem.

Gianfranco


Hmmmm, Tbar's Build/Boinc suspicions remain then :), I have no other ideas.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1786758 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1786759 - Posted: 11 May 2016, 16:01:57 UTC - in response to Message 1786757.  

I think part of the compiling problem I was having before was trying it with maxregisters=32. I believe it's coded to use maxregisters=64. I don't have this problem using maxregisters=64 with the ToolKit 6.5 App compiled in ML. Also, it doesn't have a problem with the Science App, the App is Finished, and the results printed before the SIGBUS error.


IOW it's dying after boinc_finish() call ? or being killed by the client (if any way to tell the difference)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1786759 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1786761 - Posted: 11 May 2016, 16:13:43 UTC - in response to Message 1786759.  

I think part of the compiling problem I was having before was trying it with maxregisters=32. I believe it's coded to use maxregisters=64. I don't have this problem using maxregisters=64 with the ToolKit 6.5 App compiled in ML. Also, it doesn't have a problem with the Science App, the App is Finished, and the results printed before the SIGBUS error.


IOW it's dying after boinc_finish() call ? or being killed by the client (if any way to tell the difference)

All I know is the last 30 seconds seems to take much longer than 30 secs ;-)
So far no errors since going back to 7.4.36.
ID: 1786761 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1786763 - Posted: 11 May 2016, 16:18:25 UTC - in response to Message 1786761.  
Last modified: 11 May 2016, 16:22:44 UTC

One thing that comes to mind, is a change at some point in the shared memory arrangement. This may not affect Gianfranco or myself, by may (possibly) affect you. Using app_info somewhere, iirc, you may need to include an explicit <api_version> tag containing the api version number you built with. That's one item that could depend on both the api and client version you chose (though I didn't have to mess with adding the entry on my system,)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1786763 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14531
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1786765 - Posted: 11 May 2016, 16:32:41 UTC - in response to Message 1786763.  

One thing that comes to mind, is a change at some point in the shared memory arrangement. This may not affect Gianfranco or myself, by may (possibly) affect you. Using app_info somewhere, iirc, you may need to include an explicit <api_version> tag containing the api version number you built with. That's one item that could depend on both the api and client version you chose (though I didn't have to mess with adding the entry on my system,)

In reality, it doesn't have to be the exact build version: the only tests are for BOINC/API 6.0 (PID instead of heartbeat), and 7.5 (something to do with a Bitcoin Utopia command line). There was also the shared memory segment issue (the old SpyHill bug) which was specific to multi-core, multi-instance Macs, but that would blow up at the start of the run, not at the end.

It would be good practice to get into the habit of using <api_version> tags in app_info.xml, just in case somebody starts using one for real and we miss it, but I doubt they're implicated here.
ID: 1786765 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1786777 - Posted: 11 May 2016, 17:09:01 UTC - in response to Message 1786763.  

One thing that comes to mind, is a change at some point in the shared memory arrangement. This may not affect Gianfranco or myself, by may (possibly) affect you. Using app_info somewhere, iirc, you may need to include an explicit <api_version> tag containing the api version number you built with. That's one item that could depend on both the api and client version you chose (though I didn't have to mess with adding the entry on my system,)

Hmmm, that doesn't seem to work very well for me. Adding the line <api_version>7.7.0</api_version> results in Many trashed tasks;
Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.006956
Sigma 513
Sigma > GaussTOffsetStop: 513 > -449
plan autocorr R2C batched FFT failed 5
Not enough VRAM for Autocorrelations...
setiathome_CUDA: CUDA runtime ERROR in device memory allocation, attempt 1 of 6
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.
13 waiting 5 seconds...
 Reinitialising Cuda Device...
Cuda error 'Couldn't get cuda device count
' in file 'cuda/cudaAcceleration.cu' in line 161 : invalid resource handle.

</stderr_txt>
...
Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.005559
Cuda error 'cudaMalloc((void**) &dev_WorkData' in file 'cuda/cudaAcceleration.cu' in line 439 : out of memory.
ID: 1786777 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1786780 - Posted: 11 May 2016, 17:26:12 UTC - in response to Message 1786777.  
Last modified: 11 May 2016, 17:27:04 UTC

plan autocorr R2C batched FFT failed 5 Not enough VRAM for Autocorrelations... setiathome_CUDA: CUDA runtime ERROR in device memory allocation, attempt 1 of 6


it has failed at the start...

Sometimes a reboooooot helps. The GPU memory can get in to a fragmented state. There is plenty of ram available but not in one continuous block.

How about running the older build? On top hosts there is one running happily and without any extra inconclusives. (I have trashed my version and need to revert back because of too many inconclusives, no errors though.)
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1786780 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14531
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1786783 - Posted: 11 May 2016, 17:31:12 UTC - in response to Message 1786777.  

Hmmm, that doesn't seem to work very well for me. Adding the line <api_version>7.7.0</api_version> results in Many trashed tasks;

That might mess with the command line passed to your application at startup:

    if (!app_version->api_version_at_least(7, 5)) {
        int rt = app_version->gpu_usage.rsc_type;
        if (rt) {
            coproc_cmdline(rt, result, app_version->gpu_usage.usage, cmdline, sizeof(cmdline));
        }
    }

Try knocking it back to 7.3.0 and see if that changes anything.

But you should be getting info about the device to use from init_data.xml these days, not from the command line at all.
ID: 1786783 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1786790 - Posted: 11 May 2016, 17:52:51 UTC

@Mark

2 hours seems a bit long for those cards.

I ran both my computers running 3 tasks on 750Ti's (started at same time) and ended up with run times of 1:29 to 1:39.

A bit strange since I was running 1:05 with 2 tasks.

My mbcuda.cfg

processpriority = abovenormal
pfblockspersm = 16
pfperiodsperlaunch = 400
ID: 1786790 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1786792 - Posted: 11 May 2016, 17:55:49 UTC - in response to Message 1786790.  

@Mark

2 hours seems a bit long for those cards.

I ran both my computers running 3 tasks on 750Ti's (started at same time) and ended up with run times of 1:29 to 1:39.

A bit strange since I was running 1:05 with 2 tasks.

My mbcuda.cfg

processpriority = abovenormal
pfblockspersm = 16
pfperiodsperlaunch = 400

Well, with work keeping me busy, I have not done any optimizing yet.
This weekend, I'll have to see if I can find time to make sure I am running the right apps and add some opti parameters.
Hopefully that will get the kitties back on the right track.
I'll be posting in the team forum for some help and tips.
Thanks for the input.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1786792 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1786793 - Posted: 11 May 2016, 17:55:53 UTC - in response to Message 1786780.  
Last modified: 11 May 2016, 18:16:50 UTC

Well, rebooting didn't help. I had to remove the line <api_version>7.7.0</api_version> to get it to stop immediately trashing All the GPU tasks. Changing the line to <api_version>7.3.0</api_version> allows it to start again. The current build is using the code in Jason's folder, it seems to be quite a bit faster than the exact same code compiled in ML with ToolKit 6.5. It would be nice if it weren't for the Crash After Finish. I might try building it again with boinc-master 7.5, the App compiled in ML with ToolKit 6.5 was using boinc-master 7.5.

Adding <api_version>7.3.0</api_version> didn't help. Still get the Crash After Finish with 1 out of 3 tasks.

Why does the validator give Some tasks an Invalid when run a second time? It was Only Reported Once. As you can see, the results are the same, http://setiathome.berkeley.edu/workunit.php?wuid=2155978136 The way I see it the Results you Report are the ones that should be used. Not the ones caught up in a Mass trashing and later run and reported.
ID: 1786793 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1786794 - Posted: 11 May 2016, 18:19:45 UTC - in response to Message 1786789.  

Running Raismter's SoG on my Titan X's

Running 3 at a time on each GPU with commandlines from Mike,

Each are taking just under 1 hour (56-58 minutes)

http://setiathome.berkeley.edu/result.php?resultid=4923517925

http://setiathome.berkeley.edu/result.php?resultid=4923517967

http://setiathome.berkeley.edu/result.php?resultid=4923517729

That was a long time. Running 3 at a time with SoG on my 980, takes 28-29 minutes each.

http://setiathome.berkeley.edu/result.php?resultid=4924826717
http://setiathome.berkeley.edu/result.php?resultid=4924819417
http://setiathome.berkeley.edu/result.php?resultid=4924799847


Those have ar 0.009, 0.01 and 0.009. Other reporters may have ar at around 0.005 or something. Nice times though (3 ones under 30).
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1786794 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1786795 - Posted: 11 May 2016, 18:23:26 UTC

Jus as someone said: Let the creditscrew settle now for a week or thee.
I was looking at the credit granted for host http://setiathome.berkeley.edu/results.php?hostid=7939003&offset=0&show_names=0&state=4&appid= and saw a huge difference in credit granted depending on which kind of host the task validated against.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1786795 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1786797 - Posted: 11 May 2016, 18:37:01 UTC - in response to Message 1786789.  

Running Raismter's SoG on my Titan X's

Running 3 at a time on each GPU with commandlines from Mike,

Each are taking just under 1 hour (56-58 minutes)

http://setiathome.berkeley.edu/result.php?resultid=4923517925

http://setiathome.berkeley.edu/result.php?resultid=4923517967

http://setiathome.berkeley.edu/result.php?resultid=4923517729

That was a long time. Running 3 at a time with SoG on my 980, takes 28-29 minutes each.

http://setiathome.berkeley.edu/result.php?resultid=4924826717
http://setiathome.berkeley.edu/result.php?resultid=4924819417
http://setiathome.berkeley.edu/result.php?resultid=4924799847


Are you dividing by 3 or is that the exact time? 3 in 54 minutes for me averages 18 minutes
ID: 1786797 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1786799 - Posted: 11 May 2016, 18:44:55 UTC - in response to Message 1786797.  
Last modified: 11 May 2016, 18:50:32 UTC

Running Raismter's SoG on my Titan X's

Running 3 at a time on each GPU with commandlines from Mike,

Each are taking just under 1 hour (56-58 minutes)

http://setiathome.berkeley.edu/result.php?resultid=4923517925

http://setiathome.berkeley.edu/result.php?resultid=4923517967

http://setiathome.berkeley.edu/result.php?resultid=4923517729

That was a long time. Running 3 at a time with SoG on my 980, takes 28-29 minutes each.

EDIT: A retake: all were run on GPU 0. So 29 min fore each. Not so good.

http://setiathome.berkeley.edu/result.php?resultid=4924826717
http://setiathome.berkeley.edu/result.php?resultid=4924819417
http://setiathome.berkeley.edu/result.php?resultid=4924799847


Are you dividing by 3 or is that the exact time? 3 in 54 minutes for me averages 18 minutes


I clicked the links and saw about 29 min for each three of them. So running one at a time that would be about 10 min for one.

EDIT: but they'd have to run on same GPU simultaneously.,
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1786799 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1786802 - Posted: 11 May 2016, 18:53:54 UTC - in response to Message 1786801.  
Last modified: 11 May 2016, 18:54:26 UTC

Running Raismter's SoG on my Titan X's

Running 3 at a time on each GPU with commandlines from Mike,

Each are taking just under 1 hour (56-58 minutes)

http://setiathome.berkeley.edu/result.php?resultid=4923517925

http://setiathome.berkeley.edu/result.php?resultid=4923517967

http://setiathome.berkeley.edu/result.php?resultid=4923517729

That was a long time. Running 3 at a time with SoG on my 980, takes 28-29 minutes each.

http://setiathome.berkeley.edu/result.php?resultid=4924826717
http://setiathome.berkeley.edu/result.php?resultid=4924819417
http://setiathome.berkeley.edu/result.php?resultid=4924799847


Are you dividing by 3 or is that the exact time? 3 in 54 minutes for me averages 18 minutes

Not dividing as you can see from the results. It is the exact times. Each of them is done in less than 30 minutes.

Name blc2_2bit_guppi_57451_24929_HIP63406_0019.20556.831.17.26.163.vlar_0
Run time 29 min 10 sec
CPU time 27 min 50 sec
--------------------------------------------------------------------------
Name blc2_2bit_guppi_57451_25284_HIP63406_OFF_0020.20361.831.17.26.177.vlar_1
Run time 28 min 25 sec
CPU time 27 min 48 sec
------------------------------------------------------------------------------
Name blc2_2bit_guppi_57451_24929_HIP63406_0019.17306.831.18.27.176.vlar_1
Run time 28 min 32 sec
CPU time 25 min 40 sec
----------------------------------------------------------------------------


"Ja, men tider dom var uppladdat är inte samma. Dom var något interlaced."
Yes, but the times they were uploaded are not the same. They were somewhat interlaced. (Not runnig the whole time together, and may have had other kind of tasks running at the same time)
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1786802 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Number crunching : GBT ('guppi') .vlar tasks will be send to GPUs, what you think about this?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.