SETI orphans

Message boards : Number crunching : SETI orphans
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 42 · Next

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4086
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2072800 - Posted: 7 Apr 2021, 21:36:02 UTC - in response to Message 2072798.  

Yeah I agree with that
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2072800 · Report as offensive     Reply Quote
Profile StFreddy
Avatar

Send message
Joined: 4 Feb 01
Posts: 35
Credit: 14,080,356
RAC: 26
Hungary
Message 2072886 - Posted: 8 Apr 2021, 19:24:58 UTC

You can compare your wcg opng results with your wingman: under your account, click Results Status, then click on the name of the workunit in the Result name column. You will see your wingman here. Click on the Valid link in the Status column.
ID: 2072886 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12865
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2072888 - Posted: 8 Apr 2021, 19:52:19 UTC - in response to Message 2072886.  

But that is all you can do unfortunately. No conventional BOINC stats page, no server status page, no member host pages. Very limited and an outlier project compared to BOINC project norms.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2072888 · Report as offensive     Reply Quote
alanb1951 Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 May 99
Posts: 8
Credit: 6,904,127
RAC: 34
United Kingdom
Message 2072906 - Posted: 9 Apr 2021, 2:42:18 UTC - in response to Message 2072796.  

It’s def paralleled if it’s maxing out the GPU to 100% at times with thousands of GPU cores active. I just think they could make better use of it by not having so much up/down behavior. But it could be that each run to 100% corresponds to each instance number listed in the task report. I’d have to count the spikes to know for sure though. Not as easy to do on Linux

I can confirm that the activity burst seems to be a single job from within the work-unit.

During the Beta I was trying to work out why my 1050Ti jobs (on an i7-7700K) used CPU for 98% of the elapsed time whilst my 1660Ti jobs (on a Ryzen 3700X) only used CPU for about 60% of the elapsed time(!); as a side-effect of looking into that I was able to tie GPU usage to one-second time-slices (which was also the most accurate timing statistic I could easily get for the jobs, of course...), and it was quite obvious that there was "quiet time" on the GPU starting before one job finished and continuing until the next job had started.

The only assumption I made to get GPU usage was that nvidia-smi dmon streaming processors usage (sm %) was a reasonably accurate estimate of what had happened in the preceding 1 or 2 second interval. I do realize that the non-percentages are point-in-time status snapshots :-) but a percentage ought to be what it says it is, as should the data transfer numbers.

OPN tasks have a fairly small data footprint compared with some other GPU projects we might run(!) so there's not likely to be a lot of data movement; hence low PCIe numbers. The power draw can get to be higher than for either Einstein or Milkyway jobs...

By the way, I found that running two at a time on the Ryzen got more use out of the GPU, but there were still too many intervals with several seconds of GPU inactivity -- I put this down to whatever is causing the CPU-to-elapsed oddity in the first place, possibly a revised task scheduler (or I/O scheduler?) in the 5.4 kernel on the Ryzen as against the 4.15 kernel on the Intel. I see other oddities with BOINC-related stuff other than OPNG as well, including times when jobs finish but still haven't updated their status in client_state.xml several seconds later (my monitoring software gives up if it takes 5 seconds; it should not take anywhere near that long, I'd've thought...)

And I've tried various things to see the effects -- fewer concurrent CPU tasks, suspending tasks that are heavy on I/O, and so on -- and nothing seems to make much difference... Once we've got a reliable flow of production work I can get back to this, but I can't sit glued to a screen 24/7 waiting for work to turn up!

Hope the above is of interest - Al.

P.S. I have got some strace output from both machines to plough through at some point, but without access to the source code it's not immediately obvious why there are these delays.

P.P.S. if anyone knows how to get better granularity for GPU usage without writing one's own tools against the NVIDIA libraries, please tell!
ID: 2072906 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12865
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2072911 - Posted: 9 Apr 2021, 3:57:02 UTC

Thanks for the confirmation of the observed behavior Alan. I don't know of any tools offhand to figure out what is going on with the app and work units.

We need someone like Raistmer or Petri to chime in I think. Else we are just guessing and postulating.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2072911 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14391
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2072928 - Posted: 9 Apr 2021, 11:49:57 UTC - in response to Message 2072906.  

P.S. I have got some strace output from both machines to plough through at some point, but without access to the source code it's not immediately obvious why there are these delays.

P.P.S. if anyone knows how to get better granularity for GPU usage without writing one's own tools against the NVIDIA libraries, please tell!
I struck a particular problem during the Beta testing of, specifically, the Windows OpenCL deployment on Intel iGPUs. I found and reported my specific problem, and my machines can now run the iGPU tasks without problem. But in the process, I learned some more general trouble-shooting procedures which may be of interest to curious minds here.

First, I wanted to be able to run (and re-run) a sample task offline. That's easy:
0) First download your task! (That's the hardest bit at the moment)
1) Suspend it, so it doesn't have a chance to get away.
2) Open client_state.xml with an editor. You're not going to change it, so no particular precautions are needed.
3) Find the WCG project, and within it, find the current <app_version> segment and the <workunit> segment for your your selected task. Those are likely to be adjacent to each other, at the end of the version list.
4) Select those two segments, copy them, and paste them into an empty work folder. Close client_state - we're finished with it.
5) Open your copied <app_version> and <workunit> file so you can read it. Find every file mentioned in both segments, and copy them from the WCG project folder to your work folder. Mostly, the app files will have the oldest datestamps in the project folder, and the workunit files will have the youngest - except one.
6) The files all have complex names, but are each given a simpler <open_name> in client_state. Rename all files to their simpler form (you should have copied 8 files, and 7 of them will need renaming).
7) You'll need an init_data.xml file - ideally a simplified one, such as the ones provided in the Lunatics MBbench. The only bits you really need are the lines which specify which device to run your test on.
8) Look back at your <workunit> file. You'll see a very long command line for the workunit, starting "-jobs ...". Make a startup file (why not a batch file?) containing the name of your main program, followed by the command_line. Remember the space between them.

Launch your batch file, and watch the process unfold before your eyes. By the end of the process, you'll have something like 100 extra files in your work folder - unpacked data files, work files, checkpoint files, result files.

The highlights for me were:
a) The pause between sub-tasks is very clear, but the length of the pause depends on how busy or reactive your CPU is.
b) There's also a continuous variation between 0% and 100% GPU load during each sub-task, but you have to slow things down to see it. Clearly visible at iGPU speeds.
c) The main program files for NVidia and Intel are bitwise identical. I think it's fair to call that a wrapper.
d) You'll have renamed a program file "stringify.h". During the run, a second copy called "winstringify.h" will have been created. The only differences between them are (i) the original has Linux line endings, and (ii) the original has every line enclosed in double quotes.

It's easier to look at winstringify.h. To me, it looks like the entire source code of the OpenCL part of the calculation. It's well commented, and it's freely licensed under the GNU Lesser General Public License.

OpenCL programmers, have at it!
ID: 2072928 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12865
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2072954 - Posted: 9 Apr 2021, 19:21:08 UTC - in response to Message 2072928.  

Thanks for the process Richard. That source code file would be most interesting to follow through the process iteration for the job crunching.
I have done offline crunching before on other projects for benchmarking hardware/software changes, just never one with a wrapper app. Seems similar though.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2072954 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 23992
Credit: 3,081,182
RAC: 7
Ireland
Message 2073012 - Posted: 10 Apr 2021, 13:55:59 UTC

The team is 1 year old.
00:00:00 10th April 2020 - 23:59:59 9th April 2021
Total Run Time: 47 years 48 days 21 hours 31 minutes 0 seconds
Results Returned: 159,142
Points Generated: 86,822,385
Boinc Credit: 12,403,197.86

June was the best month for run time.
October the best for points.
ID: 2073012 · Report as offensive     Reply Quote
Grumpy Swede (Democratic Socialist)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8627
Credit: 49,849,242
RAC: 65
Sweden
Message 2073013 - Posted: 10 Apr 2021, 13:59:41 UTC - in response to Message 2073012.  

Hip hip hooray!!
ID: 2073013 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 709
Credit: 8,032,827
RAC: 62
France
Message 2073057 - Posted: 11 Apr 2021, 8:40:58 UTC

\o/
ID: 2073057 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073108 - Posted: 12 Apr 2021, 8:08:02 UTC

Happy Cosmonautics Day to everyone!
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073108 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073109 - Posted: 12 Apr 2021, 8:22:58 UTC - in response to Message 2072711.  
Last modified: 12 Apr 2021, 8:30:41 UTC


GPU load looked like a fine-tooth comb - switching between 0% and 100% every second, with some longer runs at the start of an Autodock run. Anyone know of a visualisation tool with a resolution better than 1 second?

GPU profiling tools like AMD's CodeXL (perhaps calling differently these days) can provide better resolution + much more info about what and how loads GPU.
For example, GPU-Z could show GPU as busy but actually only few SMs/compute units are under load...


very strange spike! Where that energy goes???
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073109 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073110 - Posted: 12 Apr 2021, 8:35:31 UTC - in response to Message 2072796.  

It’s def paralleled if it’s maxing out the GPU to 100% at times with thousands of GPU cores active

Not so fast!
First, one should be sure that the tool (GPU-Z perhaps) can distinguish between single busy SM and all busy SMs. Load % can be just % of busy time and irrelevant to % of loaded SMs.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073110 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073111 - Posted: 12 Apr 2021, 8:44:43 UTC - in response to Message 2072906.  
Last modified: 12 Apr 2021, 8:45:02 UTC


P.P.S. if anyone knows how to get better granularity for GPU usage without writing one's own tools against the NVIDIA libraries, please tell!


For nVidia GPUs one could use their profiling software. Something like this https://developer.nvidia.com/nvidia-visual-profiler
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073111 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073112 - Posted: 12 Apr 2021, 8:51:07 UTC - in response to Message 2072928.  
Last modified: 12 Apr 2021, 8:59:08 UTC


It's easier to look at winstringify.h. To me, it looks like the entire source code of the OpenCL part of the calculation. It's well commented, and it's freely licensed under the GNU Lesser General Public License.

OpenCL programmers, have at it!

could you just attach it somewhere to save reader from all those manipulations? ;)
BTW, if they have OpenCL code in header file (like it was in oclFFT for example) they doomed to compile it at each app launch. Worth to apply same caching I did for oclFFT perhaps. Of course if OpenCL app is long enough to take noticeable build time.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073112 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14391
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073118 - Posted: 12 Apr 2021, 9:45:46 UTC - in response to Message 2073112.  

Probably better to get the whole thing properly structured:

https://github.com/ccsb-scripps/AutoDock-GPU

The distributed code looks like it is multiple files concatenated into one, but I don't pretend to be a GPU programmer.
ID: 2073118 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073121 - Posted: 12 Apr 2021, 10:06:11 UTC - in response to Message 2073118.  
Last modified: 12 Apr 2021, 10:12:47 UTC

Probably better to get the whole thing properly structured:

https://github.com/ccsb-scripps/AutoDock-GPU

The distributed code looks like it is multiple files concatenated into one, but I don't pretend to be a GPU programmer.

@ It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.@
So, should be well parallelized [and my previous fears regarding SMs underusage are void]...
But perhaps too many CPU processing between kernel launches and no use of overlapping/single and synchronous CPU thread.

And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @
So, separate version for NV? CUDA, not OpenCL , one?

EDIT:
https://github.com/ccsb-scripps/AutoDock-GPU/tree/develop/device
Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073121 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14391
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073122 - Posted: 12 Apr 2021, 10:12:28 UTC - in response to Message 2073121.  

One particular thing we'd like help with, please:

On slow devices (my Intel i5's HD 4600, Sten Arne's GTX 660M), the kernels can run >2 seconds - either triggering the watchdog, or causing horrible screen lag. I think Scripps possibly didn't test on a wide enough range of devices before releasing. Shades of VLARs in January 2009.
ID: 2073122 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6321
Credit: 106,370,077
RAC: 121
Russia
Message 2073123 - Posted: 12 Apr 2021, 10:18:30 UTC - in response to Message 2073122.  
Last modified: 12 Apr 2021, 10:23:37 UTC

One particular thing we'd like help with, please:

On slow devices (my Intel i5's HD 4600, Sten Arne's GTX 660M), the kernels can run >2 seconds - either triggering the watchdog, or causing horrible screen lag. I think Scripps possibly didn't test on a wide enough range of devices before releasing. Shades of VLARs in January 2009.

It's very possible. And fast solution could be to divide launch space onto smaller parts to launch few kernels instead of one. In VLAR there was more complex case - single thread computes too long. But there was a cycle. So, kernel launch space was extended into new (parameter) dimension and now each thread computes only part of that cycle iterations.
It's the second relatively easy opportunity (if cycles are independent enough).

EDIT: and to be more specific one should identify what kernel causes watchdog trigger.
It's possible to do by profiling, even on fast device. Longest kernel on fast device most probably will cause issued on slower. Also one need to look at launch space size. Cause fast GPUs are not so faster but more multiprocessing. Hence, not longest kernel on "fast" (better say big) GPU could be the longest on GPU with smaller number of SMs/compute units.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 2073123 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14391
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2073124 - Posted: 12 Apr 2021, 10:26:51 UTC - in response to Message 2073121.  

And another important point: @The Cuda version was developed in collaboration with Nvidia to run AutoDock-GPU.... @
So, separate version for NV? CUDA, not OpenCL , one?
WCG aren't distributing a CUDA version (yet?), only the OpenCL one to all platforms. People are asking why, but there hasn't been an answer yet. I have sympathy with that: they're trying to manage a slow, careful, safe roll-out of their first GPU app in years, to the biggest and most hyped-up audience since Gone with the Wind. No wonder he retreats to his ranch at the weekend.

EDIT:
https://github.com/ccsb-scripps/AutoDock-GPU/tree/develop/device
Hm... there are many CL files here, not single header. Are you sure it's the source for binary you running under BOINC ??
Well, it's the right organisation, and the right name - I got the link from discussions on their forums. A user, I think, rather than staff, but I'll check. WCG operates a two-stage system: the researchers (Scripps, in this instance) code the science, and WCG (i.e. IBM) code the BOINC library into the source for release.

The talk is of there having been months of co-working before this was made available. My guess (but it is only a guess) is that there might have been some fine tuning in-house that hasn't been committed to GitHub yet - but I haven't checked the commit history for datestamps. I'll try and take a look over the next few days.
ID: 2073124 · Report as offensive     Reply Quote
Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 42 · Next

Message boards : Number crunching : SETI orphans


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.