Message boards :
Number crunching :
SETI orphans
Message board moderation
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 43 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the confirmation of the observed behavior Alan. I don't know of any tools offhand to figure out what is going on with the app and work units. We need someone like Raistmer or Petri to chime in I think. Else we are just guessing and postulating. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
P.S. I have got some strace output from both machines to plough through at some point, but without access to the source code it's not immediately obvious why there are these delays.I struck a particular problem during the Beta testing of, specifically, the Windows OpenCL deployment on Intel iGPUs. I found and reported my specific problem, and my machines can now run the iGPU tasks without problem. But in the process, I learned some more general trouble-shooting procedures which may be of interest to curious minds here. First, I wanted to be able to run (and re-run) a sample task offline. That's easy: 0) First download your task! (That's the hardest bit at the moment) 1) Suspend it, so it doesn't have a chance to get away. 2) Open client_state.xml with an editor. You're not going to change it, so no particular precautions are needed. 3) Find the WCG project, and within it, find the current <app_version> segment and the <workunit> segment for your your selected task. Those are likely to be adjacent to each other, at the end of the version list. 4) Select those two segments, copy them, and paste them into an empty work folder. Close client_state - we're finished with it. 5) Open your copied <app_version> and <workunit> file so you can read it. Find every file mentioned in both segments, and copy them from the WCG project folder to your work folder. Mostly, the app files will have the oldest datestamps in the project folder, and the workunit files will have the youngest - except one. 6) The files all have complex names, but are each given a simpler <open_name> in client_state. Rename all files to their simpler form (you should have copied 8 files, and 7 of them will need renaming). 7) You'll need an init_data.xml file - ideally a simplified one, such as the ones provided in the Lunatics MBbench. The only bits you really need are the lines which specify which device to run your test on. 8) Look back at your <workunit> file. You'll see a very long command line for the workunit, starting "-jobs ...". Make a startup file (why not a batch file?) containing the name of your main program, followed by the command_line. Remember the space between them. Launch your batch file, and watch the process unfold before your eyes. By the end of the process, you'll have something like 100 extra files in your work folder - unpacked data files, work files, checkpoint files, result files. The highlights for me were: a) The pause between sub-tasks is very clear, but the length of the pause depends on how busy or reactive your CPU is. b) There's also a continuous variation between 0% and 100% GPU load during each sub-task, but you have to slow things down to see it. Clearly visible at iGPU speeds. c) The main program files for NVidia and Intel are bitwise identical. I think it's fair to call that a wrapper. d) You'll have renamed a program file "stringify.h". During the run, a second copy called "winstringify.h" will have been created. The only differences between them are (i) the original has Linux line endings, and (ii) the original has every line enclosed in double quotes. It's easier to look at winstringify.h. To me, it looks like the entire source code of the OpenCL part of the calculation. It's well commented, and it's freely licensed under the GNU Lesser General Public License. OpenCL programmers, have at it! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the process Richard. That source code file would be most interesting to follow through the process iteration for the job crunching. I have done offline crunching before on other projects for benchmarking hardware/software changes, just never one with a wrapper app. Seems similar though. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Sirius B Send message Joined: 26 Dec 00 Posts: 24909 Credit: 3,081,182 RAC: 7 |
The team is 1 year old. 00:00:00 10th April 2020 - 23:59:59 9th April 2021 Total Run Time: 47 years 48 days 21 hours 31 minutes 0 seconds Results Returned: 159,142 Points Generated: 86,822,385 Boinc Credit: 12,403,197.86 June was the best month for run time. October the best for points. |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 |
\o/ |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Happy Cosmonautics Day to everyone! SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
GPU profiling tools like AMD's CodeXL (perhaps calling differently these days) can provide better resolution + much more info about what and how loads GPU. For example, GPU-Z could show GPU as busy but actually only few SMs/compute units are under load... very strange spike! Where that energy goes??? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It’s def paralleled if it’s maxing out the GPU to 100% at times with thousands of GPU cores active Not so fast! First, one should be sure that the tool (GPU-Z perhaps) can distinguish between single busy SM and all busy SMs. Load % can be just % of busy time and irrelevant to % of loaded SMs. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
For nVidia GPUs one could use their profiling software. Something like this https://developer.nvidia.com/nvidia-visual-profiler SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
could you just attach it somewhere to save reader from all those manipulations? ;) BTW, if they have OpenCL code in header file (like it was in oclFFT for example) they doomed to compile it at each app launch. Worth to apply same caching I did for oclFFT perhaps. Of course if OpenCL app is long enough to take noticeable build time. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
Probably better to get the whole thing properly structured: https://github.com/ccsb-scripps/AutoDock-GPU The distributed code looks like it is multiple files concatenated into one, but I don't pretend to be a GPU programmer. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Probably better to get the whole thing properly structured: @ It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.@ So, should be well parallelized [and my previous fears regarding SMs underusage are void]... But perhaps too many CPU processing between kernel launches and no use of overlapping/single and synchronous CPU thread. And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @ So, separate version for NV? CUDA, not OpenCL , one? EDIT: https://github.com/ccsb-scripps/AutoDock-GPU/tree/develop/device Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ?? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
One particular thing we'd like help with, please: On slow devices (my Intel i5's HD 4600, Sten Arne's GTX 660M), the kernels can run >2 seconds - either triggering the watchdog, or causing horrible screen lag. I think Scripps possibly didn't test on a wide enough range of devices before releasing. Shades of VLARs in January 2009. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
One particular thing we'd like help with, please: It's very possible. And fast solution could be to divide launch space onto smaller parts to launch few kernels instead of one. In VLAR there was more complex case - single thread computes too long. But there was a cycle. So, kernel launch space was extended into new (parameter) dimension and now each thread computes only part of that cycle iterations. It's the second relatively easy opportunity (if cycles are independent enough). EDIT: and to be more specific one should identify what kernel causes watchdog trigger. It's possible to do by profiling, even on fast device. Longest kernel on fast device most probably will cause issued on slower. Also one need to look at launch space size. Cause fast GPUs are not so faster but more multiprocessing. Hence, not longest kernel on "fast" (better say big) GPU could be the longest on GPU with smaller number of SMs/compute units. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @WCG aren't distributing a CUDA version (yet?), only the OpenCL one to all platforms. People are asking why, but there hasn't been an answer yet. I have sympathy with that: they're trying to manage a slow, careful, safe roll-out of their first GPU app in years, to the biggest and most hyped-up audience since Gone with the Wind. No wonder he retreats to his ranch at the weekend. EDIT:Well, it's the right organisation, and the right name - I got the link from discussions on their forums. A user, I think, rather than staff, but I'll check. WCG operates a two-stage system: the researchers (Scripps, in this instance) code the science, and WCG (i.e. IBM) code the BOINC library into the source for release. The talk is of there having been months of co-working before this was made available. My guess (but it is only a guess) is that there might have been some fine tuning in-house that hasn't been committed to GitHub yet - but I haven't checked the commit history for datestamps. I'll try and take a look over the next few days. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
EDIT: and to be more specific one should identify what kernel causes watchdog trigger.Since I've got a working benchtest, I can try to fish for that. But I may need some usage hints. Benchtest is Windows, can run on either (fast) NVidia, or (slow) iGPU. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
https://github.com/ccsb-scripps/AutoDock-GPU/blob/develop/host/src/performdocking.cpp.OpenCL #ifdef DOCK_DEBUG printf("%-25s %10s %8u %10s %4u\n", "K_GA_GENERATION", "gSize: ", kernel4_gxsize, "lSize: ", kernel4_lxsize); fflush(stdout); #endif So, debugging is possible. One need to build with DOCK_DEBUG enabled (then launch space size will be visible w/o profiler). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
EDIT: and to be more specific one should identify what kernel causes watchdog trigger.Since I've got a working benchtest, I can try to fish for that. But I may need some usage hints. Ok, you need profiler then. I'll look if iGPU has its own. EDIT: https://software.intel.com/content/www/us/en/develop/articles/profiling-opencl-applications-with-system-analyzer-and-platform-analyzer.html So you need to install Intel SDK. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??Well, looking at the activity (a whole page of commits since 1 March 2021), it must be pretty close. But remember the final 'compilation for BOINC' is done by IBM, not Scripps. Looking at my local folder, I downloaded their last Beta app on 26 March, and the current live app on 6 April. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
It seems VTune has OpenCL support now. It's very reverend tool since I would say Pentium era or even earlier... https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html?operatingsystem=window&distributions=webdownload&options=offline 3.5GB installer :) SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.