SETI orphans

Message boards : Number crunching : SETI orphans
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 43 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2072911 - Posted: 9 Apr 2021, 3:57:02 UTC

Thanks for the confirmation of the observed behavior Alan. I don't know of any tools offhand to figure out what is going on with the app and work units.

We need someone like Raistmer or Petri to chime in I think. Else we are just guessing and postulating.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2072911 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2072928 - Posted: 9 Apr 2021, 11:49:57 UTC - in response to Message 2072906.  

P.S. I have got some strace output from both machines to plough through at some point, but without access to the source code it's not immediately obvious why there are these delays.

P.P.S. if anyone knows how to get better granularity for GPU usage without writing one's own tools against the NVIDIA libraries, please tell!
I struck a particular problem during the Beta testing of, specifically, the Windows OpenCL deployment on Intel iGPUs. I found and reported my specific problem, and my machines can now run the iGPU tasks without problem. But in the process, I learned some more general trouble-shooting procedures which may be of interest to curious minds here.

First, I wanted to be able to run (and re-run) a sample task offline. That's easy:
0) First download your task! (That's the hardest bit at the moment)
1) Suspend it, so it doesn't have a chance to get away.
2) Open client_state.xml with an editor. You're not going to change it, so no particular precautions are needed.
3) Find the WCG project, and within it, find the current <app_version> segment and the <workunit> segment for your your selected task. Those are likely to be adjacent to each other, at the end of the version list.
4) Select those two segments, copy them, and paste them into an empty work folder. Close client_state - we're finished with it.
5) Open your copied <app_version> and <workunit> file so you can read it. Find every file mentioned in both segments, and copy them from the WCG project folder to your work folder. Mostly, the app files will have the oldest datestamps in the project folder, and the workunit files will have the youngest - except one.
6) The files all have complex names, but are each given a simpler <open_name> in client_state. Rename all files to their simpler form (you should have copied 8 files, and 7 of them will need renaming).
7) You'll need an init_data.xml file - ideally a simplified one, such as the ones provided in the Lunatics MBbench. The only bits you really need are the lines which specify which device to run your test on.
8) Look back at your <workunit> file. You'll see a very long command line for the workunit, starting "-jobs ...". Make a startup file (why not a batch file?) containing the name of your main program, followed by the command_line. Remember the space between them.

Launch your batch file, and watch the process unfold before your eyes. By the end of the process, you'll have something like 100 extra files in your work folder - unpacked data files, work files, checkpoint files, result files.

The highlights for me were:
a) The pause between sub-tasks is very clear, but the length of the pause depends on how busy or reactive your CPU is.
b) There's also a continuous variation between 0% and 100% GPU load during each sub-task, but you have to slow things down to see it. Clearly visible at iGPU speeds.
c) The main program files for NVidia and Intel are bitwise identical. I think it's fair to call that a wrapper.
d) You'll have renamed a program file "stringify.h". During the run, a second copy called "winstringify.h" will have been created. The only differences between them are (i) the original has Linux line endings, and (ii) the original has every line enclosed in double quotes.

It's easier to look at winstringify.h. To me, it looks like the entire source code of the OpenCL part of the calculation. It's well commented, and it's freely licensed under the GNU Lesser General Public License.

OpenCL programmers, have at it!
ID: 2072928 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2072954 - Posted: 9 Apr 2021, 19:21:08 UTC - in response to Message 2072928.  

Thanks for the process Richard. That source code file would be most interesting to follow through the process iteration for the job crunching.
I have done offline crunching before on other projects for benchmarking hardware/software changes, just never one with a wrapper app. Seems similar though.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2072954 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24876
Credit: 3,081,182
RAC: 7
Ireland
Message 2073012 - Posted: 10 Apr 2021, 13:55:59 UTC

The team is 1 year old.
00:00:00 10th April 2020 - 23:59:59 9th April 2021
Total Run Time: 47 years 48 days 21 hours 31 minutes 0 seconds
Results Returned: 159,142
Points Generated: 86,822,385
Boinc Credit: 12,403,197.86

June was the best month for run time.
October the best for points.
ID: 2073012 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 2073057 - Posted: 11 Apr 2021, 8:40:58 UTC

\o/
ID: 2073057 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073108 - Posted: 12 Apr 2021, 8:08:02 UTC

Happy Cosmonautics Day to everyone!
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073108 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073109 - Posted: 12 Apr 2021, 8:22:58 UTC - in response to Message 2072711.  
Last modified: 12 Apr 2021, 8:30:41 UTC


GPU load looked like a fine-tooth comb - switching between 0% and 100% every second, with some longer runs at the start of an Autodock run. Anyone know of a visualisation tool with a resolution better than 1 second?

GPU profiling tools like AMD's CodeXL (perhaps calling differently these days) can provide better resolution + much more info about what and how loads GPU.
For example, GPU-Z could show GPU as busy but actually only few SMs/compute units are under load...


very strange spike! Where that energy goes???
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073109 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073110 - Posted: 12 Apr 2021, 8:35:31 UTC - in response to Message 2072796.  

It’s def paralleled if it’s maxing out the GPU to 100% at times with thousands of GPU cores active

Not so fast!
First, one should be sure that the tool (GPU-Z perhaps) can distinguish between single busy SM and all busy SMs. Load % can be just % of busy time and irrelevant to % of loaded SMs.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073110 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073111 - Posted: 12 Apr 2021, 8:44:43 UTC - in response to Message 2072906.  
Last modified: 12 Apr 2021, 8:45:02 UTC


P.P.S. if anyone knows how to get better granularity for GPU usage without writing one's own tools against the NVIDIA libraries, please tell!


For nVidia GPUs one could use their profiling software. Something like this https://developer.nvidia.com/nvidia-visual-profiler
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073111 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073112 - Posted: 12 Apr 2021, 8:51:07 UTC - in response to Message 2072928.  
Last modified: 12 Apr 2021, 8:59:08 UTC


It's easier to look at winstringify.h. To me, it looks like the entire source code of the OpenCL part of the calculation. It's well commented, and it's freely licensed under the GNU Lesser General Public License.

OpenCL programmers, have at it!

could you just attach it somewhere to save reader from all those manipulations? ;)
BTW, if they have OpenCL code in header file (like it was in oclFFT for example) they doomed to compile it at each app launch. Worth to apply same caching I did for oclFFT perhaps. Of course if OpenCL app is long enough to take noticeable build time.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073112 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073118 - Posted: 12 Apr 2021, 9:45:46 UTC - in response to Message 2073112.  

Probably better to get the whole thing properly structured:

https://github.com/ccsb-scripps/AutoDock-GPU

The distributed code looks like it is multiple files concatenated into one, but I don't pretend to be a GPU programmer.
ID: 2073118 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073121 - Posted: 12 Apr 2021, 10:06:11 UTC - in response to Message 2073118.  
Last modified: 12 Apr 2021, 10:12:47 UTC

Probably better to get the whole thing properly structured:

https://github.com/ccsb-scripps/AutoDock-GPU

The distributed code looks like it is multiple files concatenated into one, but I don't pretend to be a GPU programmer.

@ It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.@
So, should be well parallelized [and my previous fears regarding SMs underusage are void]...
But perhaps too many CPU processing between kernel launches and no use of overlapping/single and synchronous CPU thread.

And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @
So, separate version for NV? CUDA, not OpenCL , one?

EDIT:
https://github.com/ccsb-scripps/AutoDock-GPU/tree/develop/device
Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073121 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073122 - Posted: 12 Apr 2021, 10:12:28 UTC - in response to Message 2073121.  

One particular thing we'd like help with, please:

On slow devices (my Intel i5's HD 4600, Sten Arne's GTX 660M), the kernels can run >2 seconds - either triggering the watchdog, or causing horrible screen lag. I think Scripps possibly didn't test on a wide enough range of devices before releasing. Shades of VLARs in January 2009.
ID: 2073122 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073123 - Posted: 12 Apr 2021, 10:18:30 UTC - in response to Message 2073122.  
Last modified: 12 Apr 2021, 10:23:37 UTC

One particular thing we'd like help with, please:

On slow devices (my Intel i5's HD 4600, Sten Arne's GTX 660M), the kernels can run >2 seconds - either triggering the watchdog, or causing horrible screen lag. I think Scripps possibly didn't test on a wide enough range of devices before releasing. Shades of VLARs in January 2009.

It's very possible. And fast solution could be to divide launch space onto smaller parts to launch few kernels instead of one. In VLAR there was more complex case - single thread computes too long. But there was a cycle. So, kernel launch space was extended into new (parameter) dimension and now each thread computes only part of that cycle iterations.
It's the second relatively easy opportunity (if cycles are independent enough).

EDIT: and to be more specific one should identify what kernel causes watchdog trigger.
It's possible to do by profiling, even on fast device. Longest kernel on fast device most probably will cause issued on slower. Also one need to look at launch space size. Cause fast GPUs are not so faster but more multiprocessing. Hence, not longest kernel on "fast" (better say big) GPU could be the longest on GPU with smaller number of SMs/compute units.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073123 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073124 - Posted: 12 Apr 2021, 10:26:51 UTC - in response to Message 2073121.  

And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @
So, separate version for NV? CUDA, not OpenCL , one?
WCG aren't distributing a CUDA version (yet?), only the OpenCL one to all platforms. People are asking why, but there hasn't been an answer yet. I have sympathy with that: they're trying to manage a slow, careful, safe roll-out of their first GPU app in years, to the biggest and most hyped-up audience since Gone with the Wind. No wonder he retreats to his ranch at the weekend.

EDIT:
https://github.com/ccsb-scripps/AutoDock-GPU/tree/develop/device
Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??
Well, it's the right organisation, and the right name - I got the link from discussions on their forums. A user, I think, rather than staff, but I'll check. WCG operates a two-stage system: the researchers (Scripps, in this instance) code the science, and WCG (i.e. IBM) code the BOINC library into the source for release.

The talk is of there having been months of co-working before this was made available. My guess (but it is only a guess) is that there might have been some fine tuning in-house that hasn't been committed to GitHub yet - but I haven't checked the commit history for datestamps. I'll try and take a look over the next few days.
ID: 2073124 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073125 - Posted: 12 Apr 2021, 10:30:24 UTC - in response to Message 2073123.  

EDIT: and to be more specific one should identify what kernel causes watchdog trigger.
It's possible to do by profiling, even on fast device. Longest kernel on fast device most probably will cause issued on slower. Also one need to look at launch space size. Cause fast GPUs are not so faster but more multiprocessing. Hence, not longest kernel on "fast" (better say big) GPU could be the longest on GPU with smaller number of SMs/compute units.
Since I've got a working benchtest, I can try to fish for that. But I may need some usage hints.

Benchtest is Windows, can run on either (fast) NVidia, or (slow) iGPU.
ID: 2073125 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073126 - Posted: 12 Apr 2021, 10:39:45 UTC - in response to Message 2073123.  

https://github.com/ccsb-scripps/AutoDock-GPU/blob/develop/host/src/performdocking.cpp.OpenCL
#ifdef DOCK_DEBUG
printf("%-25s %10s %8u %10s %4u\n", "K_GA_GENERATION", "gSize: ", kernel4_gxsize, "lSize: ", kernel4_lxsize); fflush(stdout);
#endif

So, debugging is possible.
One need to build with DOCK_DEBUG enabled (then launch space size will be visible w/o profiler).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073126 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073127 - Posted: 12 Apr 2021, 10:40:47 UTC - in response to Message 2073125.  
Last modified: 12 Apr 2021, 10:42:02 UTC

EDIT: and to be more specific one should identify what kernel causes watchdog trigger.
It's possible to do by profiling, even on fast device. Longest kernel on fast device most probably will cause issued on slower. Also one need to look at launch space size. Cause fast GPUs are not so faster but more multiprocessing. Hence, not longest kernel on "fast" (better say big) GPU could be the longest on GPU with smaller number of SMs/compute units.
Since I've got a working benchtest, I can try to fish for that. But I may need some usage hints.

Benchtest is Windows, can run on either (fast) NVidia, or (slow) iGPU.


Ok, you need profiler then.
I'll look if iGPU has its own.

EDIT:
https://software.intel.com/content/www/us/en/develop/articles/profiling-opencl-applications-with-system-analyzer-and-platform-analyzer.html
So you need to install Intel SDK.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073127 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073128 - Posted: 12 Apr 2021, 10:50:22 UTC - in response to Message 2073121.  

Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??
Well, looking at the activity (a whole page of commits since 1 March 2021), it must be pretty close. But remember the final 'compilation for BOINC' is done by IBM, not Scripps.

Looking at my local folder, I downloaded their last Beta app on 26 March, and the current live app on 6 April.
ID: 2073128 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 2073129 - Posted: 12 Apr 2021, 10:51:13 UTC - in response to Message 2073127.  
Last modified: 12 Apr 2021, 11:17:02 UTC

It seems VTune has OpenCL support now. It's very reverend tool since I would say Pentium era or even earlier...
https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html?operatingsystem=window&distributions=webdownload&options=offline

3.5GB installer :)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073129 · Report as offensive     Reply Quote
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 43 · Next

Message boards : Number crunching : SETI orphans


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.