Posts by RueiKe

1) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1969975)
Posted 6 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I just released a new version of benchMT:
https://github.com/Ricks-Lab/benchMT/releases/tag/v1.2.0

Changes include the following:
    Fixed a problem with the when lock_file was created and checked. Now placed before slot initialization.
    Fixed issue where program would exit if Reference file didn't exist. Now an error message is printed and no comparison results are printed to summary files.
    Added commmand line option --no_ref which will not create reference results when selected. This is useful for characterizing potential reference WUs.
    Added color to status display.
    Modified so that status display will not show skipped jobs (Reference data already exists).
    Updated reference WUs in the WU_test/safe directory. Still need a WU with a Gaussian signal.

2) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1969465)
Posted 9 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I just released a new version of benchMT:
https://github.com/Ricks-Lab/benchMT/releases/tag/v1.1.0

Changes include the following:
    Command line options can now be specified in mode lines of the BenchCFG file. Options given on the command line will override modes specified in the CFG file.
    An alternative CFG file can now be specified as a command line option.
    Signal Counts and Angle Range are now included in the psv and txt summary files.
    Remove app -device N arg if specified, since -device is automatically added based on slot assignment.
    Added --gpu_devices x,y command line option to specify which GPU devices the user would like to include in the benchmark run.
    Added a lock_file in the working directory to prevent a second occurrence of benchMT from using the same directory.
    Updated reference WUs in the WU_test/safe directory.
    Changed --ref_signals option to --std_signals for clarity.

3) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1969283)
Posted 10 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I am working on an updated version of the tool that will also output angle range and signal counts. I have used this feature to assess the current sample WUs included with the package. Here is a summary of the current work units included. Seems like it could be optimized. Let me know of any recommendations on what would be an ideal set of sample WUs for benchmarking.

4) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1968287)
Posted 16 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I have just released benchRP which will parse the output of the Windows based MB_bench_213 benchmark utility. It will parse command line arguments and convert to individual data columns with the header containing the argument name with the corresponding values in row by job. benchRP can also be used to expand the argument field of the benchMT psv file. This file format is useful for the import into analytics tools for the analysis of the sensitivity of processing time and similarity to command line tuning parameters. benchRP can be downloaded here:
https://github.com/Ricks-Lab/benchRP
5) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1968118)
Posted 17 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Initial release of the benchMT tool is now available on GitHub:
https://github.com/Ricks-Lab/benchMT

I will monitor this thread for any feedback or reported issues.
6) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967507)
Posted 20 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Here is my comparison of all of the Linux CPU apps that I have. I ran 7 Arecibo and 8 GBT WUs twice each on each system using only 30 threads on each system. The 2990WX had SMT disabled and the 1950X had it enabled. The MB, cooling solution, memory, BIOS, OS are all the same between the 2 systems. BIOS settings are also the same with the exception of manual Vcore and CPU Core ratio. LLC is -L2 on 2990WX and -L1 on the 1950X.


Based on these results, the r3711_sse41 app is fastest, though the 2 newer apps have a noticeable reduction in Similarity. Not sure if that difference is significant though,
7) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967242)
Posted 22 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm currently running the r3711 SSE41 app against the AVX2 app since that one wasn't included in Rick's set of default apps for some reason. Also some anomalous behavior in the number of gpu instances that can be invoked for some reason.

benchMT currently only allows 1 task per GPU. Number of GPUs is determined by
lshw -short | grep display

Does the log file indicate the correct number of GPUs?

I'm only running one task per gpu since I am running the CUDA92 app. But if I only invoke 3 instances of the application, it only runs two tasks on two gpus and has the third instance pending until the first two complete, then the pending task runs on the first gpu. I see all 3 gpus always.

This is the entry in benchCFG

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
#setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
This is what the benchmark is going to execute

Only 0 CPU jobs and 3 GPU jobs. Max Threads reduced to 3
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
##### 3 total slots
Pending jobs (CPU/GPU): 0 / 3
Pending reference jobs: 0
Execute listed jobs? [y/N]

With this benchCFG file entry

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92

This is what the benchmark is going to execute

Only 0 CPU jobs and 4 GPU jobs. Max Threads reduced to 4
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| GPU | 2 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
-3------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/3
##### 4 total slots
Pending jobs (CPU/GPU): 0 / 4
Pending reference jobs: 0
Execute listed jobs? [y/N]

Looks like a bug. Let me try to reproduce it on my system this evening.


When I run this on my system, It all appears normal. Can you post or send me your complete BenchCFG file and the hostname*.txt file in the run subdir of testData? Also, running it with --debug option might give more insight. Also, does this app require a .cl file? If so, it needs to be in the APPS_GPU directory. Thanks!


I was just able to reproduce the problem. It happens when there are less GPU jobs than GPUs. Working on it...
8) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967241)
Posted 22 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
The r3711 SSE41 app is the default app installed the TBar BOINC All-in-One packages.
http://www.arkayn.us/lunatics/BOINC-7.8.3.7z


I just downloaded and extracted. Did not find the r3711 SSE41 app.


I found it in a different download on the site. I will include it in my next run.
9) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967237)
Posted 22 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
The r3711 SSE41 app is the default app installed the TBar BOINC All-in-One packages.
http://www.arkayn.us/lunatics/BOINC-7.8.3.7z


I just downloaded and extracted. Did not find the r3711 SSE41 app.
10) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967236)
Posted 22 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm currently running the r3711 SSE41 app against the AVX2 app since that one wasn't included in Rick's set of default apps for some reason. Also some anomalous behavior in the number of gpu instances that can be invoked for some reason.

benchMT currently only allows 1 task per GPU. Number of GPUs is determined by
lshw -short | grep display

Does the log file indicate the correct number of GPUs?

I'm only running one task per gpu since I am running the CUDA92 app. But if I only invoke 3 instances of the application, it only runs two tasks on two gpus and has the third instance pending until the first two complete, then the pending task runs on the first gpu. I see all 3 gpus always.

This is the entry in benchCFG

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
#setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
This is what the benchmark is going to execute

Only 0 CPU jobs and 3 GPU jobs. Max Threads reduced to 3
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
##### 3 total slots
Pending jobs (CPU/GPU): 0 / 3
Pending reference jobs: 0
Execute listed jobs? [y/N]

With this benchCFG file entry

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92

This is what the benchmark is going to execute

Only 0 CPU jobs and 4 GPU jobs. Max Threads reduced to 4
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| GPU | 2 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
-3------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/3
##### 4 total slots
Pending jobs (CPU/GPU): 0 / 4
Pending reference jobs: 0
Execute listed jobs? [y/N]

Looks like a bug. Let me try to reproduce it on my system this evening.


When I run this on my system, It all appears normal. Can you post or send me your complete BenchCFG file and the hostname*.txt file in the run subdir of testData? Also, running it with --debug option might give more insight. Also, does this app require a .cl file? If so, it needs to be in the APPS_GPU directory. Thanks!
11) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967205)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
This is the output of the r3711SSE41 app versus the r3712AVX2 app. I ran 4 instances or each app.
https://www.dropbox.com/s/wjgz56tqmrn1zi1/Screenshot%20from%202018-11-25%2018-52-54.png?dl=0

The SSE41 app is up to 10% faster than the AVX2 app. That is what I found on my old Gen. 1700X and 1800X cpus. So not seeing any improvement on Ryzen+ 2700X cpus. Might be something different on Threadrippers.

I will be able to test on TR once I get my TR platform built.


For analysis, I suggest using the .psv file the testData directory. This file is easy to import into excel and summarize with pivot. It is pipe delimited.
12) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967204)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm currently running the r3711 SSE41 app against the AVX2 app since that one wasn't included in Rick's set of default apps for some reason. Also some anomalous behavior in the number of gpu instances that can be invoked for some reason.

benchMT currently only allows 1 task per GPU. Number of GPUs is determined by
lshw -short | grep display

Does the log file indicate the correct number of GPUs?

I'm only running one task per gpu since I am running the CUDA92 app. But if I only invoke 3 instances of the application, it only runs two tasks on two gpus and has the third instance pending until the first two complete, then the pending task runs on the first gpu. I see all 3 gpus always.

This is the entry in benchCFG

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
#setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
This is what the benchmark is going to execute

Only 0 CPU jobs and 3 GPU jobs. Max Threads reduced to 3
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
##### 3 total slots
Pending jobs (CPU/GPU): 0 / 3
Pending reference jobs: 0
Execute listed jobs? [y/N]

With this benchCFG file entry

setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92
setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92

This is what the benchmark is going to execute

Only 0 CPU jobs and 4 GPU jobs. Max Threads reduced to 4
List of Initialized Slots
SlotNum | platform | device | state | job | SlotDir
-0------| GPU | 0 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/0
-1------| GPU | 1 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/1
-2------| GPU | 2 | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/2
-3------| CPU | NA | EMPTY | None| /home/keith/Downloads/Utils/benchMT/Slots/3
##### 4 total slots
Pending jobs (CPU/GPU): 0 / 4
Pending reference jobs: 0
Execute listed jobs? [y/N]

Looks like a bug. Let me try to reproduce it on my system this evening.
13) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967193)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm currently running the r3711 SSE41 app against the AVX2 app since that one wasn't included in Rick's set of default apps for some reason. Also some anomalous behavior in the number of gpu instances that can be invoked for some reason.

benchMT currently only allows 1 task per GPU. Number of GPUs is determined by
lshw -short | grep display

Does the log file indicate the correct number of GPUs?
14) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967191)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I'm currently running the r3711 SSE41 app against the AVX2 app since that one wasn't included in Rick's set of default apps for some reason. Also some anomalous behavior in the number of gpu instances that can be invoked for some reason.

Can you provide a link of where a set of 3711 apps can be found? I will plan to include them in the benchmark run planned during the outage.
15) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1967167)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Here are the 1950x results for the benchmark run identical to what I did for the 2990WX. The 1950x has SMT enabled, while the 2990WX has it disabled, so both runs used 30 of a total of 32 available threads.
16) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1966974)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Curious, Did you run each app individually for a day, then the next, etc.
Or run a mixture of apps all at once?


I ran 30 iterations of all apps with a single WU, which is 180 tasks. These 180 tasks were loaded across 30 cores until complete.
17) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1966971)
Posted 23 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
Also, probably would be better to use 30 different WUs rather than 30 repetitions of the same WU.

A mix of Areibo & GBT tasks would be interesting.
One of the issues with running 2 GPU WUs at a time under CUDA50 was when a Arecibo & GBT WU were running on the same GPU, the runtime for the Arecibo task would generally triple.
I don't see that happening on the CPU, but I wouldn't be surprised if there were some performance impact there.


Good idea. I will setup a new benchmark run to execute during this week's outage.
18) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1966945)
Posted 24 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I have run my first benchmark of CPU app performance. It uses 30 cores and does 30 repetitions for each app.


The high variability is likely an effect of the 2990WX having only half the cores with direct memory access and the last run of jobs occuring in low loading. Should re-run on my 1950X system in the future. Also, probably would be better to use 30 different WUs rather than 30 repetitions of the same WU.
19) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1966150)
Posted 29 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
I have figured out this issue. I had to add a space before the "-device n". Not sure why it is needed since I don't put a space in front of the cpu app argument string.

Concerning the starting of GPU tasks on a specified device, I have checked the MB app source code and found that "--device n" is correct, so maybe there is something else wrong with my code. I am using this chunk of code to start both GPU and CPU tasks:
                    #Execute job
                    shutil.copy2(slots.list[slot_num].job.wu_path + slots.list[slot_num].job.wu_name,
                            slots.list[slot_num].slot_dir +"/" + mb_const.activeWU)
                    if plat == "GPU":
                        for file_str in glob.glob(env.gpu_app_path + "*.cl"):
                            shutil.copy2(file_str, slots.list[slot_num].slot_dir +"/")
                    if plat == "CPU":
                        cmd_str_list = [env.cpu_app_path + slots.list[slot_num].job.app_name , slots.list[slot_num].job.app_args]
                    else:
                        device_arg = "--device " + str(slots.list[slot_num].device)
                        cmd_str_list = [env.gpu_app_path + slots.list[slot_num].job.app_name,
                                device_arg + " " + slots.list[slot_num].job.app_args]
                    if mb_const.DEBUG == True: print(cmd_str_list)
                    os.chdir(slots.list[slot_num].slot_dir)
                    slots.list[slot_num].job.cmd = subprocess.Popen(cmd_str_list, shell=False, stdout=subprocess.PIPE)
                    slots.list[slot_num].job.start_time = datetime.utcnow()
                    os.chdir(env.current_dir)

It works fine for CPU tasks, but still not working for GPU. Here is what cmd_str_list looks like for a CPU task:
['/home/rick/PyDev/benchMB/APPS_CPU/MBv8_8.05r3345_avx_linux64', '--nographics'] 

Maybe there is still something I have not prepared properly in the working directory for a GPU job.
20) Message boards : Number crunching : Developing a Multi-Threaded Benchmarking App for Linux (Message 1966140)
Posted 29 days ago by Profile RueiKe Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Post:
@ReuiKe
Re: downloadable stock apps...
I don't remember how I discovered it, but useful for those of us "anonymous platform" hosts that don't get the automatic downloads of the science apps. Even if we only want to run them as benchmarks to compare to optimized apps.
In the following list the application name, as found in the ->Computing ->Applications web page; then the full application file name.
Linux/x86_64    8.00
setiathome_8.00_x86_64-pc-linux-gnu
---
Linux/x86_64    8.22(opencl_nvidia_sah)
setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_sah
---
Linux/x86_64   8.22(opencl_nvidia_SoG)
setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_SoG
---

Point a browser to:
http://boinc2.ssl.berkeley.edu/sah/download_fanout/<file name> #without the < > of course and it will download to you.
These are the only stock apps that I have in hand and could verify this method. But, no doubt, you will see a pattern in the morphing from Applications description to the downloadable file name, in case you want additional stock apps.
(I hope I'm not revealing any Berkeley secrets here; I have not found any on-line index, or "How-To" , for fetching the stock apps.)

I will follow this thread for progress reports as I am frequently motivated to do benchmark experiments of my own.

AZGene;


Thanks Gene, That worked!

I hope to have a functional beta by the end of next weekend. Things I need to complete:
1) Write meaningful data to testData directory. Will likely leverage the comparison routine used in the previous package
2) Run and store results for reference apps
3) Figure out how to make previous compiled kernels available for new runs. I am using a slots approach, similar to boinc. It looks like boinc writes compiled kernels to a common directory, but in my case, the job running in a slot is expecting the kernels in the slots dir. Still some digging to do on this one.


Next 20


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.