Posts by Raistmer

1) Message boards : Number crunching : SETI orphans (Message 2078251)
Posted 19 Jun 2021 by Profile Raistmer
Post:
I am trying a new HP laptop with a Ryzen 5 4000 CPU and Radeon Graphics incorporated. The Einstein@home gamma ray GPU tasks take about 1095 s on my GTX 1650 board, of which 1076 are CPU.. The same tasks take about 3385 s on the Radeon chip of which only 107 are CPU. I was going to test the new laptop on the OpenPandemics-Covid-19 GPU tasks but I am not getting any of them.
Tullio

You could check what Radeon part time will be if CPU part fully loaded (no CPU cores reserved).
2) Message boards : Number crunching : SETI orphans (Message 2077952)
Posted 15 Jun 2021 by Profile Raistmer
Post:
There is extra consumption of non-CPU part
So IMHO better computations per J will provide running 12/7
If host completely unattended consider to run 24/3.5 instead to decrease booting/shutdown losses and hardware wear out overall.

BTW, don't you have discounts in night hours?... Maybe better to run @night only....
3) Message boards : Number crunching : SETI orphans (Message 2074434)
Posted 28 Apr 2021 by Profile Raistmer
Post:
Stress test? Would love to know what parameters they use.
Results returned practically doubled, points practically quintupled yet total run time no change.

i wouldnt expect runtime to change much. most users are still running 24/7 and that hasnt changed with the new tasks.

personally I think that whole metric is pointless. it just adds up all of the combined runtime of jobs to imagine how long it would take on some hypothetical single core computer. and only serves to glorify slow CPUs which will get much more runtime than results. and GPUs are a bit left out of that metric. since 1 job runs much faster yet is massively threaded across thousands of GPU cores but only gets the runtime of 1. I could add over a Year every day if I enabled CPU processing for WCG (over 380 threads across my systems).

so what's more impressive: the person with "10 years" of runtime, or the person with "30 days" of runtime that processed 10x the work? In WCG-logic, the former gets a higher 'rank'.


Add to that penalizing of code optimization work )
If same work will be done faster - who interested in that? :))))
4) Message boards : Number crunching : SETI orphans (Message 2073637)
Posted 17 Apr 2021 by Profile Raistmer
Post:
Actually more data just blurs main point (just as our rulers like to do speaking about average salary and other average benefits).
TDR will not be triggered by average kernel length, it will be triggered by the longest one. And if first picture clearly shows there are at least some kernels longer than 4s, new one "averages" it to 2.8 s...
5) Message boards : Number crunching : SETI orphans (Message 2073355)
Posted 14 Apr 2021 by Profile Raistmer
Post:
@SETI we at least got exactly same result from run to run on same task.
Results could be little different between devices/compilers/OSes, but they were reproducible (at least if app comes from alpha to beta stage of development).
Cause SETI data processing is deterministic one.
Genetic algorithms for energy of interaction minimization this app uses have random component.
It's possible to get reproducible results even in this case (if pseudo-random generator always gives same sequence starting with same initial seed). But... perhaps they don't use same seed or their random number generator more "random" :)

EDIT: also, simple error like using value of uninitialized memory cell will give unreproducible results too. Hard to tell w/o diving deeply in code what case is here.
6) Message boards : Number crunching : SETI orphans (Message 2073351)
Posted 14 Apr 2021 by Profile Raistmer
Post:
If you under Linux you don't need to know about TDR :))

Regarding invalids. As I said before I don't know how they could do validation at all with so different results from run to run on same dataset.
7) Message boards : Number crunching : SETI orphans (Message 2073349)
Posted 14 Apr 2021 by Profile Raistmer
Post:
I don't know what "TDR" refers to.
TDR is a Windows specific concept, but I think it accounts for a large proportion of the iGPU 'time limit exceeded' errors under Windows.

http://developer.download.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm
8) Message boards : Number crunching : SETI orphans (Message 2073336)
Posted 14 Apr 2021 by Profile Raistmer
Post:
I'm going to go even older, I'll see if a GTX 295 will run LOL


it took some fiddling but it just picked up a couple OPNG tasks on it.

GPU is nearly pegged to 99-100% on both cores (two tasks, dual GPU), i haven't noticed many dips.
~145MB VRAM used
temps also fine at about 62C on an open air test bench
looks like it'll take an hour to run

we'll see if it completes and/or validates, an absolute waste of power though LOL.

not bad for an OpenCL 1.0 card (online specs list 1.1, driver reports 1.0)

And no driver restarts due to TDR? Or did you disable it already?
9) Message boards : Number crunching : SETI orphans (Message 2073318)
Posted 14 Apr 2021 by Profile Raistmer
Post:
FYI, you can run these GPU tasks on openCL 1.1, 1.2 doesnt seem to be absolutely necessary (maybe just an artificial limit project-side)

jobs ran fine on my GTX 550ti which is OpenCL 1.1


I ran it (until driver crash) on GTX 460 SE :)
Didn't manage to catch profile log so far though.

Only nSight for Visual Studio is able to trace (in theory) OpenCL on NV. So I had to install VS 2019 also.
nSight Compute is only for CUDA....
[And of course latest build dropped support of pre-Win10 OSes so had to use older one]
10) Message boards : Number crunching : SETI orphans (Message 2073317)
Posted 14 Apr 2021 by Profile Raistmer
Post:
Richard, even if you could manage to catch full run it will be hardly usable. Even with 2-3 min log I wait 10-20 seconds on each presentation update...
Also, the bigger run, the coarser resolution.
Better to start on pause and hit "play" only before one job finishes and hit pause or stop when second job do few gradient steps. If you want to catch the gap between jobs.
11) Message boards : Number crunching : SETI orphans (Message 2073245)
Posted 13 Apr 2021 by Profile Raistmer
Post:
I think this picture is enough to point to main problem - there is kernel that runs too long on low-level hardware.
And there are good chances that issue can be solved very easy - just to split single call to few. Global size of 640000 easely allows that. There are 640k /128 workgroups so will fill almost any GPU, even biggest ones. So, kernel could be split even unconditionally (not only for low-level ones).
12) Message boards : Number crunching : SETI orphans (Message 2073179)
Posted 12 Apr 2021 by Profile Raistmer
Post:
Seems I found adequate metrics and config:




That gradient_minAD takes 4s per launch.
Why TDR tolerates - no idea, maybe some time ago I disabled it...
launch space(global size) is huge, 640000 per kernel call.

Maybe worth do decrease it to avoid crashes.
That's all for today, Richard, at least smth to start with.
There are 19 launches catched in log going one by one. Maybe dev can launch twice more but smaller... (19 is arbitrary number, just length of data collection while 4s per launch and 640000 items per launch characterize kernel itself.
13) Message boards : Number crunching : SETI orphans (Message 2073171)
Posted 12 Apr 2021 by Profile Raistmer
Post:


some unbelievable low numbers ://
Sampling overhead too big?...
14) Message boards : Number crunching : SETI orphans (Message 2073170)
Posted 12 Apr 2021 by Profile Raistmer
Post:
OK, got some initial profile data


separate spikes....
15) Message boards : Number crunching : SETI orphans (Message 2073155)
Posted 12 Apr 2021 by Profile Raistmer
Post:
all those faulty drivers polluting the database here by cross-validating.


And non-reproducible results add to that complexity.
16) Message boards : Number crunching : SETI orphans (Message 2073154)
Posted 12 Apr 2021 by Profile Raistmer
Post:
If I cut jobs list to just single entry?
Haven't tried to find exactly how to do that. Feel free - it would certainly be quicker.

But CPU load between sub-jobs needs optimising too.

It works as expected. Just optimizes first atom config and exits.

But.... different energies! Quite different!

Finished evaluation after reaching -6.68 +/- 0.13 kcal/mol combined.
45 samples, best energy -6.94 kcal/mol.


I'll try to run test few times and will edit this post with values

Richard's run:
Finished evaluation after reaching
-8.37 +/- 0.06 kcal/mol combined.
51 samples, best energy -8.49 kcal/mol.

My second run:
Finished evaluation after reaching -6.92 +/- 0.07 kcal/mol combined.
48 samples, best energy -7.09 kcal/mol.

One more run:
Finished evaluation after reaching -6.91 +/- 0.15 kcal/mol combined.
23 samples, best energy -7.07 kcal/mol.

So, they DEFINITELY didn't reach real minimum in those runs.

Interesting task - to validate between 2 results when each new run even on the very same host/device/initial setup gives different data :))))

EDIT:
genetic algorithms tend to give different results between runs cause random "mutations"included, but usually final value approx the same, just length and trajectory to it will differ from run to run.
Here is quite different situation, final optimized energy value differs between runs and quite strongly. [Hence, assuming they know what they doing, result value is not the final energy minimum and some additional stage to find true minimum will be done on servers]

But from "number crunching" point of view it's not quite clear how to correctly validate such results? Especially if -cl-mad-enable can add subtle errors in calculations. Such errors can be much smaller than deviation between runs ....
17) Message boards : Number crunching : SETI orphans (Message 2073151)
Posted 12 Apr 2021 by Profile Raistmer
Post:


And it just finished first job. Seems my HD500 is powerful enough to pass w/o watchdog triggering
18) Message boards : Number crunching : SETI orphans (Message 2073150)
Posted 12 Apr 2021 by Profile Raistmer
Post:
-cl-mad-enable and iGPU.... danger :))))
19) Message boards : Number crunching : SETI orphans (Message 2073149)
Posted 12 Apr 2021 by Profile Raistmer
Post:
C:\__Test>wcgrid_beta29_autodockgpu_7.28_windows_x86_64__opencl_intel_gpu_102 -jobs OPNG_0000025_00056.job -input OPNG_0000025_00056.zip -seed 160279976 -wcgruns 1700 -wcgdpf 34
Running 1 jobs in pipeline mode
Warning: value of -devnum argument ignored. Value must be an integer between 1 and 65536.
AutoDock-GPU version: 51800118f2c7e78ac6794e087a956e50737c5d85-dirty

Kernel source file: ./device/calcenergy.cl
Kernel compilation flags: -I ./device -I ./common -DN128WI -cl-mad-enable
OpenCL device: Intel(R) HD Graphics 500
(Thread 0 is setting up Job #1)

Running Job #1:
Fields from: receptor.maps.fld
Ligands from: ZINC000309335454-ACR2.13_RX1--fr2266benz_001--CYS114.pdbqt
Using heuristics: (capped) number of evaluations set to 2818671
Local-search chosen method is: ADADELTA (ad)

Executing docking runs, stopping automatically after either reaching 0.15 kcal/mol standard deviation of
the best molecules of the last 4 * 5 generations, 27000 generations, or 2818671 evaluations:

Generations | Evaluations | Threshold | Average energy of best 10% | Samples | Best energy
------------+--------------+------------------+------------------------------+---------+-------------------
0 | 100 | 2.90 kcal/mol | 0.30 +/- 1.04 kcal/mol | 4 | -0.82 kcal/mol
5 | 34051 | 2.90 kcal/mol | -1.73 +/- 1.33 kcal/mol | 978 | -5.86 kcal/mol
10 | 63164 | -1.71 kcal/mol | -4.94 +/- 0.41 kcal/mol | 70 | -5.87 kcal/mol
15 | 90893 | -4.88 kcal/mol | -5.76 +/- 0.18 kcal/mol | 23 | -6.28 kcal/mol
20 | 118377 | -5.68 kcal/mol | -6.01 +/- 0.23 kcal/mol | 19 | -6.78 kcal/mol

it runs offline at least.
If I cut jobs list to just single entry?
20) Message boards : Number crunching : SETI orphans (Message 2073139)
Posted 12 Apr 2021 by Profile Raistmer
Post:
It seems VTune has OpenCL support now. It's very reverend tool since I would say Pentium era or even earlier...
https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html?operatingsystem=window&distributions=webdownload&options=offline

3.5GB installer :)


Huh? Intel C++ compiler included? In free download? Remember that scandal with Intel optimizing compiler usage for free SETI project?...
Well... But we need only vTune currently from this pack.

EDIT: so many years passed and still can't provide adequate installer. Mighty intel...
If one selects only vTune (and interface allows it!) installation fails complaining about compiler...
Attempt two...


Next 20


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.