Intel® iGPU AP bench test run (e.g. @ J1900)

Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1664462 - Posted: 12 Apr 2015, 10:29:49 UTC - in response to Message 1664442.  

http://www.pcworld.com/article/240016/idf_day_1_recap_ivy_bridge_and_the_x79_factor_in_photos.html


The designers also added an L3 cache to the GPU itself. In addition to performance improvements, the cache also helps power efficiency, since anything located in the cache means the CPU ring bus doesn’t need to be fired up.
ID: 1664462 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1666643 - Posted: 18 Apr 2015, 1:51:54 UTC
Last modified: 18 Apr 2015, 1:53:33 UTC

If I would add a NV GPU card to the J1900 PC ...

I set -cpu_lock for the Intel iGPU AP app.
I set -cpu_lock also for the NV GPU AP app.

Does this mean both apps are fixed at CPU-Core #0?
Or each GPU app get his own CPU-Core, #0 & #1?

Thanks.
ID: 1666643 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1666693 - Posted: 18 Apr 2015, 5:07:39 UTC - in response to Message 1666643.  

Code for -cpu_lock is probably the same in all Raistmer apps.
So any app will see (at start) which cores are already in use by other running apps (if they also use -cpu_lock) and choose the first unused core (if unused core exists).
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1666693 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1666981 - Posted: 18 Apr 2015, 20:35:44 UTC - in response to Message 1666693.  

Code for -cpu_lock is probably the same in all Raistmer apps.
So any app will see (at start) which cores are already in use by other running apps (if they also use -cpu_lock) and choose the first unused core (if unused core exists).

Yep, they should to do so.... but was never really checked with few different types of GPU installed. And it's quite easy to test.
Just open TaskManager when both app have running instances. Then look in affinity menu item what CPUs are checked for what process.

But even if both GPU apps happened to be pinned to the same core it can be still better than allow them to freely float between cores. Also, it's known that -cpu_lock has great impact on ATi app. Worth to directly check and show what its impact on iGPU's one.

-cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made.
Can be used to increase performance under some specific conditions. Can decrease performance in other cases though. Experimentation required.
Now this option allows GPU app to use only single logical CPU.
Different instances will use different CPUs as long as there is enough of CPU in the system.
To use CPUlock in round-robin mode GPUlock feature will be enabled. Use -instances_per_device N option if few instances per GPU device are needed.


So, maybe -instances_per_device 2 will be needed in this case too. Check that.
ID: 1666981 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1667041 - Posted: 18 Apr 2015, 22:54:01 UTC - in response to Message 1666981.  

So, maybe -instances_per_device 2 will be needed in this case too. Check that.

@Dirk
To clarify: this doesn't mean you have to run 2 instances on one GPU

You may even use -instances_per_device 4 (just to 'reserve' counter for max 4 apps running)
- since BOINC starts the apps it will still start only 2 apps (one per GPU) unless you change this by app_info.xml / app_config.xml
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1667041 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1668132 - Posted: 21 Apr 2015, 16:36:39 UTC
Last modified: 21 Apr 2015, 16:37:47 UTC

I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building.
Has anyone else seen that? Or know a cure? I'm just running bare Lunatics with no tweaks.
ID: 1668132 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1668167 - Posted: 21 Apr 2015, 22:17:28 UTC - in response to Message 1668132.  

I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building.
Has anyone else seen that? Or know a cure? I'm just running bare Lunatics with no tweaks.

I've noticed that on two of my XP machines with GTX 750 Ti cards - specifically, the two I upgraded to driver 347.88 to be able to run cuda65 tasks for GPUGrid (where it can be more of a problem, with tasks estimated at up to 22 hours, but return requested within 24 hours). It didn't seem to be a problem when running cuda60 with, IIRC, driver 335.28
ID: 1668167 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1668205 - Posted: 21 Apr 2015, 23:59:39 UTC - in response to Message 1668132.  

I've noticed lately that some GPU tasks on my Win10 J1900 are stalling at some point (e.g. the first one I noted was at a suspicious 66.677%) while the CPU total keeps marching on. If I restart BOINC then the Progress counter drops back a little but the CPU total drops back a lot (presumably to the time the Progress stalled) and again the Progress ticks up until it hits the previous stall point and stops there as CPU keeps building.
Has anyone else seen that? Or know a cure? I'm just running bare Lunatics with no tweaks.

I'm not seeing that on my J1900 with the same driver release 4061.
It could be OS related, but maybe it is MB related? The ASRock board I have had a BIOS update that listed.
1. Improve integrated graphics compatibility.
2. Improve add-on card compatibility.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1668205 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1668263 - Posted: 22 Apr 2015, 3:29:38 UTC
Last modified: 22 Apr 2015, 3:32:40 UTC

@ BilBg, Raistmer
Currently I don't have GPU card connected to my mobo.
I'm in contact with the manufacturer, if it's possible (just PCIe 2.0 x1 slot).

@ ivan
ASRock Q1900DC-ITX mobo with J1900 CPU. IIRC, bought in the middle of last year. No BIOS update.
Win8.1 x64. Intel driver v10.18.10.3408.
SETI & AstroPulse (CPU & iGPU) apps without problems (Lunatics Installer v0.43a x64).
BOINC Client v7.4.42 with BoincTasks Manager v1.67.
ID: 1668263 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1669672 - Posted: 25 Apr 2015, 4:26:01 UTC

In view of last findings in other thread I would suggest to check parameters (unroll, ffa_block) area around your current best config under full load.
Loaded state is not negligible difference.
ID: 1669672 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1669683 - Posted: 25 Apr 2015, 5:34:01 UTC - in response to Message 1669672.  

In view of last findings in other thread I would suggest to check parameters (unroll, ffa_block) area around your current best config under full load.
Loaded state is not negligible difference.

iGPU slows ~19% with all 4 CPU loaded in the bench test I did using default iGPU config.
http://hal6000.com/seti/test/apbench_test_celeron_j1900.htm
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1669683 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1669691 - Posted: 25 Apr 2015, 6:03:22 UTC - in response to Message 1669683.  

Does this mean you also not used -hp for priority high for the Intel iGPU OpenCL app?
If used, I guess the calculation time will decrease.
ID: 1669691 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1669697 - Posted: 25 Apr 2015, 6:23:00 UTC - in response to Message 1669691.  

Does this mean you also not used -hp for priority high for the Intel iGPU OpenCL app?
If used, I guess the calculation time will decrease.

True. For Bench test -hp not used. Also for normal running -hp is not being used.

It is late for me now, 2:17 AM. So tomorrow, or rather today after I have slept, I can run more tests. Similar to your test, but with CPU loaded. I will start with -hp only, to have baseline comparison with my other data. Then try the "best" config you found while running iGPU solo. Each test config will take me just under 2 hours to complete.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1669697 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1669740 - Posted: 25 Apr 2015, 11:26:38 UTC - in response to Message 1669691.  

In benchmark app's process uses same priority as under BOINC. That is, CPU apps run with IDLE, GPU - with BELOW_NORMAL.
And BOINC influence excluded (that runs on NORMAL hence competes with GPU app for CPU). So, I would not expect big difference. Worth to check though.
ID: 1669740 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1670873 - Posted: 27 Apr 2015, 21:29:02 UTC

Having completed the benchmarks on my J1900 iGPU with just the -hp switch while CPU cores active. I found little change from not using it 2-3 seconds + or -. However in the case of 2 CPU cores + iGPU the performance for CPU & iGPU was less. With the run times for the CPU ~3% & the iGPU ~2% longer.

Next I'm running
-hp -unroll 5 -ffa_block 1472 -ffa_block_fetch 368
After which I'm going to run
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368
to see if it makes any difference with tuned settings.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1670873 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1671012 - Posted: 28 Apr 2015, 4:38:14 UTC
Last modified: 28 Apr 2015, 4:45:24 UTC

I saw something strange on my J1900 board.

Normally, AFAIK, in past with WinXP x86:
Intel Core2 Extreme QX6700 (Intel's 1st 4-Core-CPU, with 2x 2-Core-chips inside):
I suspend all tasks in BOINC, just let run one task.
Task-Manager say 25% CPU usage - OK.

Intel Core2 Duo E7600 (2-Core-CPU):
I suspend all tasks in BOINC, just let run one task.
Task-Manager say 50% CPU usage - OK.

Now the strange ... with Intel Celeron J1900 (4-Core-CPU (2x L2-Cache, so maybe 'like' QX6700?)) and Win8.1 x64:
I suspend all tasks in BOINC, just let run one task.
Task-Manager say ~30% CPU usage (30% of the CPU) - strange.
Let run 2 tasks, 2x ~30% usage (60% of the CPU) - strange?
Let run 3 tasks, 3x ~30% usage (90% of the CPU) - strange! ;-)

The stock AP 7.09 (r2742) iGPU app use 0-3% CPU (up & down).
So finally up to ~93% CPU usage if 3x CPU + 1x iGPU tasks (on a J1900 CPU).

Is this normal or strange?


I tested live -use_sleep and without this settings (r2742):
Both times with: -v 0 -unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 256 -cpu_lock -instances_per_device 1
With 4x AP CPU tasks simultaneously both times.
Without -use_sleep:
Run time 16 hours 55 min 28 sec (60,928 sec)
CPU time 34 min 2 sec (2,042 sec)
single pulses: 4
repetitive pulses: 1
percent blanked: 18.39
With -use_sleep:
Run time 18 hours 15 min 34 sec (65,734 sec)
CPU time 26 min 8 sec (1,568 sec)
single pulses: 6
repetitive pulses: 0
percent blanked: 15.48

I don't know if the results and %blankeds are ~similar enough for to make the conclusion:
With -use_sleep (compared to without -use_sleep):
Calculation time: +4,806 sec (1 hour 26 min)
CPU time usage: -474 sec (7 min 54 sec)

Before we start to make a comparison what this means (longer iGPU calculation times and less CPU time usage - means 'faster' CPU tasks calculations then) ... best settings for max whole PC RAC.

What means the CPU time usage (in OS/BOINC phrasing)?
From the above mentioned example with -use_sleep:
1,568 sec of the whole CPU, or (x4) 6,272 sec on one CPU-Core?

OK, enough until now.

Thanks.
ID: 1671012 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1671015 - Posted: 28 Apr 2015, 4:50:14 UTC - in response to Message 1671012.  
Last modified: 28 Apr 2015, 4:52:14 UTC

Because of the methods windows task manager uses to sample CPU usage, they don't always show on the correct core exactly, but sometimes as a split between two, and in varying amounts. The sampling rate of task manager can also be changed, which may show a different total usage, depending on if set to low, normal or high 'update speed'

Comparing with process explorer, at different sampling sppeds, may give you a clearer indication depending on the situation and update speed.

I mention this because I found with some unrelated test pieces last week, that I can draw really cool patterns in task manager, and also in eVGA precision, by making code with timers that 'beat' with the tool sample rate.

So if you see weird numbers, always check with different tools at different sampling speeds. That's because it can be purely measurement 'artefacts', and not the pure data you were looking for.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1671015 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1671019 - Posted: 28 Apr 2015, 4:58:30 UTC - in response to Message 1671012.  
Last modified: 28 Apr 2015, 4:59:51 UTC

1,568 sec of the whole CPU, or (x4) 6,272 sec on one CPU-Core?


1568 seconds CPU time would be the sum of all the threads in that process. There will be one main worker thread, and a few little ones that don't do much (but can scatter on other cores at the same time). So without information on all the threads in the process, the CPU time can be a little useful, or much more than elapsed time in total making it harder to use. elapsed would be near enough wall clock (with some accuracy limitations)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1671019 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1671039 - Posted: 28 Apr 2015, 5:57:08 UTC - in response to Message 1671012.  
Last modified: 28 Apr 2015, 5:58:52 UTC

Now the strange ... with Intel Celeron J1900 (4-Core-CPU (2x L2-Cache, so maybe 'like' QX6700?)) and Win8.1 x64:
I suspend all tasks in BOINC, just let run one task.
Task-Manager say ~30% CPU usage (30% of the CPU) - strange.
Let run 2 tasks, 2x ~30% usage (60% of the CPU) - strange?
Let run 3 tasks, 3x ~30% usage (90% of the CPU) - strange! ;-)


I see the same with my:
GenuineIntel
Intel(R) Core(TM) i5-4460S CPU @ 2.90GHz [Family 6 Model 60 Stepping 3]
4 Cores, Win 8.1


I thought at first it was Hyper Threading, but no, it says I don't have that.

I have seen running 3 tasks each CPU go as high as 30-34% and have one doing not much.
ID: 1671039 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1671043 - Posted: 28 Apr 2015, 6:32:05 UTC - in response to Message 1671012.  
Last modified: 28 Apr 2015, 6:33:20 UTC

I think there was a change in Windows 8 Task Manager that adjusts/accounts for the MHz changes of the CPU (I don't remember where I read this at the time)
I also don't know/remember which MHz is used - the current, the max/turbo, or the base.
But this 'adjustment' makes it show not what the user will expect.

E.g.:
http://superuser.com/questions/495699/windows-8-task-manager-shows-49-cpu-process-explorer-shows-100

So I also suggest using Process Explorer
https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1671043 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.