Posts by HAL9000


log in
21) Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900) (Message 1671126)
Posted 6 days ago by Profile HAL9000
http://ark.intel.com/it/products/80818/Intel-Core-i5-4460S-Processor-6M-Cache-up-to-3_40-GHz

http://ark.intel.com/it/products/78867/Intel-Celeron-Processor-J1900-2M-Cache-up-to-2_42-GHz


Wow, did not expect Intel marketing department SO bad... They even can't compare 2 devices with common set of options... What instruction set J1900 supports for example?

What could be easier than just to take set of options, form table and fill that table, for both devices. And not to invent new table for each one?....

There have always been lots of holes in the data on the Intel site. You can really tell when you do a compare: http://ark.intel.com/compare/75048,78867
22) Message boards : Number crunching : Loading APU to the limit: performance considerations - ongoing research (Message 1670999)
Posted 6 days ago by Profile HAL9000
-ffa_block defines how many periods will be processed together on all FFA stages.
While -ffa_block_fetch defines how many periods will be used together on most lengthy part where initial folding from linear data file occurs.
in FFA fetch input data file the same for all periods. But it folded differently and form separate new data array for each period. Then those folded arrays processed further.

Thanks. From that it made sense to me to try higher ratios such as 4:1 & 8:1 on Haswell.
After a few runs it seems 4:1 ratio is "sweet spot" for iGPU. It happens that Dirk also found that to be best ratio for values in their iGPU AP tuning thread with the J1900 in stand alone testing.
I decided to do some quick tests just doing 4CPU+iGPU. I started with -unroll 4 -ffa_block 512 -ffa_block_fetch 128 & -unroll 4 -ffa_block 256 -ffa_block_fetch 64 with pretty good results. CPU run times about 15% lower than default config.
Then for to be silly I ran -unroll 4 -ffa_block 32 -ffa_block_fetch 8. This ran the iGPU time up very high. From the normal 90-100sec to ~250sec, but it also lowered the CPU time. From 280sec average for to 210sec average. It's not the 148sec average CPU time baseline without iGPU, but it is getting closer.
GPUz shows the iGPU load running ~39% with 90% spikes with that config. So longer iGPU times & faster CPU times would clearly be expected. Since iGPU is only being used about half the time.

I am not sure is the iGPU running ~39% & the CPU times still being that high is a bad sign or helps point to something else to try. I plan to try more configs to see if there is one that will make everything magically work together as well as BayTrail in the meantime.
-unroll 4 -ffa_block 64 -ffa_block_fetch 16
-unroll 4 -ffa_block 128 -ffa_block_fetch 32
Then play around with unroll again.
23) Message boards : Number crunching : Loading APU to the limit: performance considerations - ongoing research (Message 1670893)
Posted 7 days ago by Profile HAL9000
I think no real limit, even 1 could go, but need to check (maybe "foolproof system" will not allow such values ;) )
And yes, worth to check with lower values.

What I would suggest then:
1) run on unloaded PC and find from what sizes Elapsed time stops fast decrease.
2) try values aroun or slightly less those on loaded system.
That way improved performance could be achieved. Especially if that slowdown mostly caused by memory accesses indeed (including page faults, cache misses, bus saturation and so on).
I had no opportunity to check Intel's engeeneer suggestions regarding his theory (he thinks such slowdown cause by power limitation and leads to CPU freq lowering when GPU is active, look intel forum thread for details). Maybe worth to accurately check that too.

I ran 2 more configurations with lower settings & updated the data in:
http://hal6000.com/seti/test/apbench_test_i5-4670k_btcfg.htm
The first config didn't see much improvement in CPU times except when running 3CPU+iGPU.
-unroll 4 -ffa_block 512 -ffa_block_fetch 256

The next config also showed an improvement in CPU times with 3CPU+iGPU, but it didn't go as well when running 4CPU+iGPU.
-unroll 2 -ffa_block 512 -ffa_block_fetch 256
I reran the 4CPU+iGPU test with this config 3 times to be sure it wasn't an odd result, but each time it was similar.

Page faults were also reduced with the lower values. I forgot to take screen shots but they were ~200,000 for the 1st config & ~160,000 for the 2nd.

Based on the results on the 2nd config. It looks like -unroll show go back up. I will probably try -unroll 3 -ffa_block 512 -ffa_block_fetch 256 to see what happens before modifying block & fetch further. Perhaps I should jump right to -unroll 4 -ffa_block 2 -ffa_block_fetch 1 config to see if there is a great improvement in CPU times?

I have been sticking with default 2:1 ratio on block & fetch. Part of me thinks a 1:1 ratio might perform synchronous operations & somehow be better. Then part of me says that is silly & to lower -ffa_block_fetch to 4:1, 5:1, or more.
24) Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900) (Message 1670873)
Posted 7 days ago by Profile HAL9000
Having completed the benchmarks on my J1900 iGPU with just the -hp switch while CPU cores active. I found little change from not using it 2-3 seconds + or -. However in the case of 2 CPU cores + iGPU the performance for CPU & iGPU was less. With the run times for the CPU ~3% & the iGPU ~2% longer.

Next I'm running
-hp -unroll 5 -ffa_block 1472 -ffa_block_fetch 368
After which I'm going to run
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368
to see if it makes any difference with tuned settings.
25) Message boards : Number crunching : i7 4790K low performance (Message 1670552)
Posted 7 days ago by Profile HAL9000
Thanks for the replies guys. :)

Grant, the intel tuning utility is a program from intel that allows you to view processor properties, settings & various other bits.
http://www.intel.com/content/www/us/en/motherboards/desktop-motherboards/desktop-boards-software-extreme-tuning-utility.html

It also has a simple benchmark ability, and that's where I was comparing my setup to other people's setup with the same board & cpu. This board by default overvolts and overclocks the CPU right out of the gate where it hits mid 60C while idling. I've went into bios and adjusted a handful of settings to a more sane level and disabled the auto-overclock (which the auto-overlick netted only marginal gains at best and lots of thermal throttling).

The 4790K runs hot, hotter than expected. Thermal throttle is at 100C, but under load 70-80 is about 'normal'. Some chips are better because of the TIM behind the IHS (integrated heat spreader), but its not tied to a specific batch. Seems my chip is one of the hot batch. Running BOINC 24x7 gives me about 78-85C on full load in a normal room (non cold environment) on stock cooling.

I do have a water cooler, but I had to order the intel bracket, and a couple more fittings. Its not here yet.

The standard speed of the 4790K is 4.4GHz *but* follows this chart:
1-2 cores under load: 4.4GHz
3 cores under load 4.3GHz
4 cores under load 4.2GHz

My cpu follows this fine, I can bump up everything to 4.4, but thermal throttling is your overly-attached friend when using the stock cooler in this configuration. I will do this when my cooling is changed back to liquid. Running the Intel diagnostics says my cpu is fine "passed".

I suspected the better scores using optimized apps for BOINC which would explain the better performance for BOINC. Downloading SP1 right now. Yes all of my relevant drivers are installed, and up to date. (onboard video and the 'killer' NIC are disabled)... I actually wouldn't be surprised if its because I dont have SP1. I'll report back.

I use Gigabytes EasyTune app on my GA-Z87X-D3H's to adjust my fan cooling profile settings rather than the Intel tool.

I know an i7 running HT will run a bit warmer than an i5, but both of my i5-4670K boxes run under 60ºC while under full load in an ~80ºF room. I went overkill with a Noctua NH-D14 on my gaming machine, but went smaller with a NZXT-Respire T20 on my HTPC.

Your processing times also seem VERY high to me. My i5's, running at 3.5GHz, only take about 1 hour for normal AR MB tasks when running 4 at a time. With an i7 running 8 at a time I would only expect them to take an extra 15-20 min at the same clock speed. With the i7-4790K running 4.0GHz, by default, that may be enough to compensate for the 4 additional threads to run around 1 hr as well.
I'm using the lunatics apps. The AVX app did show a slight performance gain vs the SSE3, SSE3x, & SSE4.1 apps. I'm not sure how much difference there is vs stock these days, but I wouldn't expect your tasks to be taking around 3 hours, like they are, with everything working normally.

Also is your memory configuration single or dual channel? A single channel configuration will show a noticeable speed difference when running SETI@home & other projects.
26) Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900) (Message 1669697)
Posted 9 days ago by Profile HAL9000
Does this mean you also not used -hp for priority high for the Intel iGPU OpenCL app?
If used, I guess the calculation time will decrease.

True. For Bench test -hp not used. Also for normal running -hp is not being used.

It is late for me now, 2:17 AM. So tomorrow, or rather today after I have slept, I can run more tests. Similar to your test, but with CPU loaded. I will start with -hp only, to have baseline comparison with my other data. Then try the "best" config you found while running iGPU solo. Each test config will take me just under 2 hours to complete.
27) Message boards : Number crunching : SETI & AP apps/tasks checkpoints & HDD (Message 1669686)
Posted 9 days ago by Profile HAL9000
24 GPU tasks? I'm thinking even the largest GPU's should only be running 4 tasks at a time, even at that you should probably free up 1 core per GPU, or at least 2 total.

There was a lot of discussion about SSD here.
http://setiathome.berkeley.edu/forum_thread.php?id=76413

Consensus was basically, you couldn't burn out a SSD on SETI for about 4 times (or even much more) the life of your computer.

As I mentioned in that thread. I am running about 1% wear a year on my SSD, but it isn't a machine that only runs BOINC. Both my Gaming PC & HTPC use the SSD for OS, SWAP, & BOINC.

At work I have a 24 core server I run BOINC on a 2GB RAM Disk. It writes the RAM Disk to its image file once an hour. That machine has a great deal of disk activity so I don't want to add BOINC to its already heavy load. Even if it is only once a min. However, I have not checked the delta value when it writes the RAM Disk to its image file.

EDIT: For windows there is a command line to check disk usage. I think it is total since PC was started/rebooted
C:\>fsutil fsinfo statistics c:
28) Message boards : Number crunching : Intel® iGPU AP bench test run (e.g. @ J1900) (Message 1669683)
Posted 9 days ago by Profile HAL9000
In view of last findings in other thread I would suggest to check parameters (unroll, ffa_block) area around your current best config under full load.
Loaded state is not negligible difference.

iGPU slows ~19% with all 4 CPU loaded in the bench test I did using default iGPU config.
http://hal6000.com/seti/test/apbench_test_celeron_j1900.htm
29) Message boards : Number crunching : what is meant by forthcoming "large" tasks? (Message 1669682)
Posted 9 days ago by Profile HAL9000
Some time ago. We were testing SETI@home v7 "large tasks". Which were ~1.5MB IIRC? For whatever reason development was that was stopped & it looks like they will pick it back up with SETI@home v8. It could be that v8 does away with the 377k tasks altogether or maybe there will be "standard" tasks & separate "large" ones.

At one point I also recall hearing that the SETI@home & AP analysis could possibly be combined into one app. So we would only download a single file to do both types of work at once. Since the data source for both is the same anyway.

EDIT: Reading some of the old threads on Beta it looks like we started with 1.5MB & then moved to 4MB tasks.
30) Message boards : Number crunching : Loading APU to the limit: performance considerations - ongoing research (Message 1669483)
Posted 10 days ago by Profile HAL9000
Would using a smaller config than BayTrail default be worth trying on Haswell?
It looks like using BayTrail values on Haswell reduced CPU time & even iGPU time was slightly reduced too.
Haswell Default -unroll 18 -ffa_block 5120 -ffa_block_fetch 2560
BayTrail Default -unroll 4 -ffa_block 1024 -ffa_block_fetch 512
There is minimum limit of 2 for -unroll, but is there also minimum limit for block & block_fetch? ReadMe doesn't mention.
31) Message boards : Number crunching : Updated OpenCL AstroPulse coming main (Message 1669424)
Posted 10 days ago by Profile HAL9000
http://www.tomshardware.com/news/HD-4000-OpenCL-Drivers-Power-Consumption-Increased-Performance,21763.html

http://www.geeks3d.com/20120427/intel-ivy-bridge-hd-graphics-4000-gpu-opengl-and-opencl-tests/

Sorry for the confusion, but referring to Radeon HD GPUs not Intel HD GPUs.
32) Message boards : Number crunching : PCIe x1 GPU cards? (Message 1669395)
Posted 10 days ago by Profile HAL9000
Thanks.
Yes, maybe the max 19W (OpenCL 80%=15.2W) of the '1024MB Club 3D Radeon R5 230' card would be 'better'. ;-)



From ASRock Taiwan:
Due to the DC board is using DC adapter, there is power limitation when installing VGA card. So, we add the caution on the spec.

The power is provided by PSU on Q1900-ITX, so there is no power limitation on it.
However, Q1900-ITX support 1 x PCI Express 2.0 x1 Slot.
PCI Express 2.0 x1 Slot is not designed for VGA card, if install VGA card on it, the performance will not better than x16 slot, so we recommend to use onboard VGA. If user really want to install VGA card, the VGA card of user using is able to install on this model.



English isn't my first language and I just can poor english ...
So I don't understand really. ;-)

I have a 90W laptop power supply connected to my DC board, currently the whole PC (full load 4x CPU and 1x iGPU OpenCL) use ~31W at the wall plug.
Also they (Europe/NL support) said before, this mobo/PCIe slot can deliver 25W.
So the 23W Zotac card will, can, must, should, could be ?? ;-) work?
AFAIK, GPU cards use average 80% of max if CUDA/OpenCL, this would mean in this case the Zotac card will use 18.4W. ;-)

And the 'without DC' board, GPU card will work.

It would be good if they explained "there is power limitation when installing VGA card". Are they thinking in terms of the DC supply to MB being the limit or is there a limit built into the MB so it can not supply much power to the PCIe slot?
I would guess they are thinking the external DC supply to be the "limit". Expecting users to maybe only have small 25-30w DC supplies to run the MB.
33) Message boards : Number crunching : Updated OpenCL AstroPulse coming main (Message 1669390)
Posted 10 days ago by Profile HAL9000
Maybe some already noticed that we got AstroPulse OpenCL app update for Windows and Linux.
These binaries are the best currently available ones so I would recommend those who use anonymous platform config to update to these stock ones.

thx, will check if it works on a laptop's HD 3450m... ;)

Unfortunately the HD3000 series don't support OpenCL & the HD4000 series had "beta" OpenCL support.
34) Message boards : Number crunching : Best sub $200 video card for SETI? (Message 1669105)
Posted 11 days ago by Profile HAL9000
With NV does running Display Driver Uninstaller help clear out issues when switching cards?
35) Message boards : Number crunching : Installing BOINC 7.2.42 on Ubuntu 14.10 instead of 7.4.8 (Message 1668635)
Posted 12 days ago by Profile HAL9000
Can you not copy the boincmgr from 7.2.42 over 7.4.8? BOINC Manager is just the GUI interface to BOINC & doesn't have to match the version of BOINC itself.
Personally I'm sticking with BOINC Manager 6.10.48 to retain the messages tab. Rather than have to open the "Event Log" view that is used in newer versions.
36) Message boards : Number crunching : Panic Mode On (97) Server Problems? (Message 1668552)
Posted 12 days ago by Profile HAL9000
While it seems rather chaotic things are more or less under control. Still lots of database massaging happening in the background, during which we stop the assimilators (or they stop themselves). This shouldn't affect normal operations.

And yes, I reconfigured a bunch of things over the weekend to get 12 assimilators going and speed up the backlogs when they come.

The assimilators might be off for a while again (more index rebuilding) but no worries (yet).

- Matt

It it good to hear that the automated checks & balances kicked in & did their thing. Rather than a catastrophic issue that brings everything to a sudden halt.

Does everyone have "Professional Database Masseur" on their resume around there at this point? :P
37) Message boards : Number crunching : Panic Mode On (97) Server Problems? (Message 1668457)
Posted 12 days ago by Profile HAL9000
So there are 12 sah assimilator (v7) processes now? I don't recall there being so many previously. I wonder if that is related.
38) Message boards : Number crunching : Intel Compute Stick? (Message 1668236)
Posted 12 days ago by Profile HAL9000
According to Intel the Compute Stick is "Available Now" in the Windows, 32GB storage, configuration. With the Linux, 8GB storage, configuration to come in June.
Most of the sites still say it isn't shipping until May 5th, but someone at my work ordered one today.
39) Message boards : News : "Dan Werthimer: Are We Alone?" Talk Video (Message 1668231)
Posted 12 days ago by Profile HAL9000
It is always fun to find out the history of how things like this got started.

I always kind of figured it was a bunch of guys standing in a field like this: http://i.imgur.com/FfwAbsv.png
40) Message boards : Number crunching : Loading APU to the limit: performance considerations - ongoing research (Message 1668222)
Posted 12 days ago by Profile HAL9000

I noticed that Haswell has Max WG size of 512 vs 256 for BayTrail. So would it make sense to test -tune to adjust WG size? Maybe -tune 64 4 1 or -tune 32 8 1.


Currently available tune options will not cover all kernel calls anyway. Some of them using default (NULL) local size so runtime free to use what it needs.
But hardly WG size would affect on difference under consideration. iGPU has wave size of 32 or 64 (not clear what actual value is) and both less than WG size. So, switching between waves will be on both devices. And global buffer size (memory that needs to be accessed) will be the same as long as -unroll and -ffa_block set to the same values for both devices.

Would be interesting to get page faults data for new config.

Memory use & page faults on Haswell are similar to Bay Trail when using BayTrail config. -unroll 4 -ffa_block 1024 -ffa_block_fetch 512
http://www.hal6000.com/seti/test/apbench_procexp_i5-4670k_cpu0_igpu1-2_baytrail_param.png
CPU time is slightly changed by the different config. Around 10-15% for 4 & 3 CPU runs. Then 3-5% for 2 & 1 CPU runs.
http://hal6000.com/seti/test/apbench_test_i5-4670k_btcfg.htm
Note: This includes all new data as I change iGPU driver to newest to see if it had any effect. Looks like little to none when comparing to previous test.

Associated APBench log files.
http://hal6000.com/seti/test/APBench_iGPU_Data-2.7z


Previous 20 · Next 20

Copyright © 2015 University of California