Posts by Raistmer


log in
1) Message boards : Number crunching : Problems with PC. (Message 1564993)
Posted 1 day ago by Profile Raistmer
I use this on several Windows 7 machines, but think I used it on XP as well.

Not XP Home, I fear.

C:\test>ver

Microsoft Windows XP [Version 5.1.2600]

C:\test>timeout /?
'timeout' is not recognized as an internal or external command,
operable program or batch file.

Nor in Win XP Pro SP3.
But in Vista:

P:\bin\BoincData>timeout /?

TIMEOUT [/T] <таймаут> [/NOBREAK]

Описание:
Принимает значение таймаута, задающего фиксированный период времени
ожидания (в секундах), или ожидание до нажатия клавиши. Имеется также
параметр, зaдающий игнорирование нажатий клавиш.

Параметры:
/T <таймаут> Время ожидания в секундах.
Допустимый интервал: от -1 до 99999 секунд.

/NOBREAK Игнорировать нажатия клавиш, ждать указанное время.

/? Вывод справки по использованию.

Примечание: значение таймаута, равное -1 задает неограниченное время
ожидания до нажатия клавиши.

Примеры:
TIMEOUT /?
TIMEOUT /T 10
TIMEOUT /T 300 /NOBREAK
TIMEOUT /T -1
2) Message boards : Number crunching : Will the optimised apps still be faster with my old C2Q? (Message 1563771)
Posted 3 days ago by Profile Raistmer
Yea I know but it's a right pain looking at dozen+ like that.

you could use some script parsing client_state.xml file instead.
3) Message boards : Number crunching : blanked AP tasks (Message 1563764)
Posted 3 days ago by Profile Raistmer

I would guess that the change was only recently devised.


Not quite. Initial Joe's BLANKIT mod was introduced quite long time ago.
Need to keep telling that currently SETI@home is purely volunteered endeavor, I would say from Berkeley's side too, not only from our. What it means?
It means that despite of string on the bottom of beta forum pages SETI team has no funds from US government long ago already. Also that means that all team members can't dedicate long enough time quanta for SETI development, even day to day support severely deteriorated by non-SETI constant activity requirement.
All this incurs long delays in everything we do here. In each step.
Nobody can allow himself to live w/o money in this mad world...
4) Message boards : Number crunching : blanked AP tasks (Message 1563730)
Posted 3 days ago by Profile Raistmer
That's why I am asking the questions. Trying to figure if it's the way the new app handles things, or if the AP splitting errors shown on the SSP are related to the blanking issue.



I'm probably going to stir up a hornet's nest with this but.....
I actually don't mind too much (outside of the WOW challenge) when I get work units that eventually get highly blanked. Why, you ask? After all the reading the discussions on just what is going on when the data gets analyzed, I came to appreciate just what is going on. There is talk that v7 "will fix" the blanking problem. Well, not really. It actually just "replaces" the problem. I understand why it is being done. But sacrificing sensitivity for improve specificity leaves a lot of potential signals out of the mix. I know I'm late to the party on this and v7 is well underway in testing in Beta. I guess it just rubs me wrong way when someone says that v7 "fixes" the problem. It doesn't fix it, rather to try to find a signal in a mess of noise, it replaces that entire area with a blanket noise that easier to skip over. I really appreciate all the effort and time that has being put in over the years by everyone that has worked on this project. It's not easy to find people willing to donate their time and expertise in helping this project. I guess I posted this just so people are aware that in order to get a "Faster" AP we do so with the knowledge that might be "throwing out the baby with the bath water." Just my 2 pesos...lol

Zalster


I would just say "wrong approach".
Wait if Joe will describe it in more details maybe. AP v7 has better "science", not just better speed on GPU.
5) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1563715)
Posted 3 days ago by Profile Raistmer
@those who reported issues in OpenCL AstroPulse v6 from this Lunatic installer release:

Please test OpenCL APv7.03 on beta site and check if it works OK with your setups.
Doing so allow to release AP v7 that will work "out of box" on your own hardware, w/o additional efforts.
6) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1563713)
Posted 3 days ago by Profile Raistmer
So is the "problem" that there's no CUDA AP app?

Please see AP6_win_x86_SSE2_OpenCL_NV_r2399.exe from the v0.42 installer. While not CUDA it is the app that will give you the best use of NV cards for AP.

Yes I am using that, just wondered if there was an CUDA version if that would be better.


Perhaps. Lot of time both SETI apps do FFT. And proprietary cuFFT was significally better than open-source oclFFT on nVidia hardware when I tested them year or more ago.
I'm working to make oclFFT more flexible and tunable now, but expect cuFFT being faster still.
So, in my estimations, even direct porting of AstroPulse OpenCL code to CUDA would improve performance on nVidia cards. This porting requires time though.
Any volunteers?
7) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1561454)
Posted 7 days ago by Profile Raistmer

Considering some have 4 top GPU`s running with a slow CPU and running at least 2 instances on each card would require at least a 8 core CPU to cope with high blanked AP`s.

And that's why we wanna AP v7 release ASAP.
8) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1561226)
Posted 8 days ago by Profile Raistmer
Thanks. It does help if you bring questions and problems here to the public forums, where everyone can mull over the examples you've provided.

I know that, but i don´t like to keep posting too much on the public forums since as you kindly explain we use a highly specialyzed configuraitons who could cause a lot of confusion to the major part of the normal volunteers, and could easly leave to wrong assumptions. That why i normaly rise those question on the team forum only.

Плохой подход, если только в вашей команде нет своего собственного разработчика.
У меня нет высокопроизводительного оборудования, чтобы тестировать новые версии. Основная отладка и тьюнинг проходят на относительно старом оборудовании. Поэтому без регулярных отчетов о проблемах с высокопроизводительным оьборудованием не стоит ожидать каких-либо улучшений.
9) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1561204)
Posted 8 days ago by Profile Raistmer

Thanks in advance and please any help is welcomed.


Стоит проверить, была ли в логе опция TWIN_FFA в старой версии.
Новый билд использует TWIN_FFA? поэтому оптимальный набор параметров для FFA изменился.
Это касается опций -ffa_block и -ffa_block_fetch.
Возможно, стоит подобрать новые оптимальные значения. Как первая аппроксимация подойдет уменьшение обоих параметров в 2 раза.
10) Message boards : Number crunching : CUDA Versions (Message 1558299)
Posted 14 days ago by Profile Raistmer

Maybe this parameter if set to on by defoult will be more useful.


По-умолчанию выполняется 1 задание на ГПУ.
Чтобы выполнять 2 уже требуется вмешательство оператора.
BOINC не дает информации о числе запущенных копий программы, насколько мне известно. Поэтому оператор, меняя это число, должен самостоятельно указывать его и для приложения.
11) Message boards : Number crunching : CUDA Versions (Message 1558178)
Posted 14 days ago by Profile Raistmer


I allways ask why the number of WU running not apears on the log too.


Если указать соответствующую опцию, то и число экземпляров пишется в stderr.

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.
12) Message boards : Number crunching : CUDA Versions (Message 1558130)
Posted 14 days ago by Profile Raistmer

Could you recommend a commandline with those options for my card?

I'm afraid not. These options offer possibility and some testing needed to find best config. With my own Nv GPUs (GSO9600 and GTX260) I chose different approach - to stay with older 263.06 drivers cause GPUs installed in non-gamer PCs. That driver is free from 100% CPU usage "bug/feature" of newer NV drivers.
Also I prefer to crunch CUDA MultiBeam on NV instead of OpenCL AstroPulse. CUDA (at least in some of its synching modes) doesn't contain that 100% CPU usage "feature". OpenCL implementation by nVidia doesn't have CUDA's versality of chosing synching modes.
This in part could explain why ATi AP world little more "explored" than NV or iGPU ones.

P.S. I could understand Hovard, Bernadette is very cute :D
13) Message boards : Number crunching : CUDA Versions (Message 1558122)
Posted 15 days ago by Profile Raistmer
Thx Raistmer, but I'm afaraid I don't understand this language. I tried to translate with google and it looks like you recommend me to test those other options instead of "use sleep"?


"This language" is Russian. Worth to be acquainted with at least if not learn.
And Sheldon could ask Volovitz - he knows this language well ;) :D

Yep, those options allow more fine tuning sleeping time.
14) Message boards : Number crunching : CUDA Versions (Message 1558115)
Posted 15 days ago by Profile Raistmer
Еще один способ избежать неполной загрузки ЦПУ - использование -cpu_lock опции.
Для ATi эта опция позволяет использовать 100% загрузку ЦПУ другими приложениями, приводя лишь к небольшой потере производительности ГПУ (по крайней мере, в некоторых конфигурациях). Для nVidia (насколько мне известно) эта опция не тестировалась в полной мере. Тем не менее, она доступна для использования во всех OpenCL АстроПульсах, включая iGPU.

Для быстрых устройств она должна быть дополнена указанием, сколько экземпляров программы выполняется одновременно (для правильного распределения процессов по имеющимся в системе логическим процессорам).

Из ReadMe:
-cpu_lock : Enables CPUlock feature. Results in CPUs number limitation for particular app instance. Also attempt to bind different instances to different CPU cores will be made.
Can be used to increase performance under some specific conditions. Can decrease performance in other cases though. Experimentation required.
Now this option allows GPU app to use only single logical CPU.
Different instances will use different CPUs as long as there is enough of CPU in the system.
To use CPUlock in round-robin mode GPUlock feature will be enabled. Use -instances_per_device N option if few instances per GPU device are needed.

-cpu_lock_fixed_cpu N : Will enable CPUlock too but will bind all app instances to the same N-th CPU (N=0,1,.., number of CPUs-1).

-instances_per_device N :Sets allowed number of simultaneously executed GPU app instances per GPU device (shared with MultiBeam app instances).
N - integer number of allowed instances.

15) Message boards : Number crunching : CUDA Versions (Message 1558096)
Posted 15 days ago by Profile Raistmer
Can anybody pls explain to me what the use sleep option does?


После помещения в очередь нескольких вызовов ГПУ-ядер вызывает Sleep(), отдавая ЦПУ системе для других нужд.
-use_sleep простейший способ немного освободить процессор.
Для любителей поэкспериментировать я добавил другие опции:

Из ReadMe:

-initial_ffa_sleep N M: In PC-FFA will sleep N ms for short and M ms for large one before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N and M should be integer non-negative numbers.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.

-initial_single_pulse_sleep N : In SingleFind search will sleep N ms before looking for results. Can decrease CPU usage.
Affects performance. Experimentation required for particular CPU/GPU/GPU driver combo. N should be integer positive number.
Approximation of useful values can be received via running app with -v 2 and -use_sleep switches enabled and analyzing stderr.txt log file.


Эти опции позволяют подобрать время, на которое программа отдает ЦПУ. Это время должно быть примерно равно времени, которое требуется ГПУ для завершения работы. Тогда освобождение ЦПУ не будет приводить к существенному падению общей производительности.

Особенно актуально для самых быстрых (где требуется дополнительное укрупнение ГПУ-ядер путем дополнительных опций) и самых медленных, где ГПУ не успевает закончить обработку за временной интервал, устанавливаемый через -use_sleep.

P.S.
Обсуждаемые опции применимы к АстроПульсу, который написан на OpenCL, не на CUDA. Поэтому заголовок обсуждения может несколько запутать читателей в этой части.
16) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1557943)
Posted 15 days ago by Profile Raistmer

Sorry, but i don't know of any profiling application, which runs standalone and has an user interface suitable for laymens. :/


CodeXL for ATi, Nsight for NV.
Still have to find free good profiler for iGPU though.

EDIT: not for laymans perhaps though. But then продвинутые опции comes in play. Perhaps I need to write in own native language instead of english next time, to make it more clear if english "advanced" word not comprehendable :-/

I think instead of "advanced" it is more "developer" option. Unless you are developer, or reading the code like one, you probably do not understand what the option is actually doing.

Нет проблем, исправлю на "для разработчиков"
17) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1557913)
Posted 15 days ago by Profile Raistmer

Sorry, but i don't know of any profiling application, which runs standalone and has an user interface suitable for laymens. :/


CodeXL for ATi, Nsight for NV.
Still have to find free good profiler for iGPU though.

EDIT: not for laymans perhaps though. But then продвинутые опции comes in play. Perhaps I need to write in own native language instead of english next time, to make it more clear if english "advanced" word not comprehendable :-/
18) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1557911)
Posted 15 days ago by Profile Raistmer
I want to thank everyone for assisting me in understanding and applying the -tune option to the AP cmdline file. I did correct the argument for the -tune option and I've learned a very important lesson when copy & paste is involved for suggestions - copy & paste, but verify before applying. I do have 5 AP tasks in the queue, but were not affected by the error because of their deadline, so we will wait and see what happens.

Note to all developers - Attempt to ensure that parameters, options, arguments, etc. are described in more laymen terms so that those of us who are not proficient is C++ or any other PC language, who still want to attempt more advanced options, can understand them. Does not have to be a tutorial, but should be enough to comprehend.

For example -
Advanced level options (some app code reading and understanding of algorithms used is recommended before use, not fool-proof even in same degree as
options above):
-tune N Mx My Mz : to make app more tunable this param allows user to fine tune kernel launch sizes of most important kernels.
N - kernel ID (see below)
Mxyz - workgroup size of kernel. For 1D workgroups Mx will be size of first dimension and My=Mz=1 should be 2 other ones.
N should be one of values from this list:
FFA_FETCH_WG=1,
FFA_COMPARE_WG=2
For best tuning results its recommended to launch app under profiler to see how particular WG size choice affects particular kernel.
This option mostly for developers and hardcore optimization enthusiasts wanting absolute max from their setups.
No big changes in speed expected but if you see big positive change over default please report.
Usage example: -tune 2 32 1 1 (set workgroup size of 32 for 1D FFA comparison kernel).


I would probably win a bet that most of us have not heard of the profiler mentioned above, but would understand better if the following were inserted in the text -

The tune param will define the kernel size of the GPU into junks.
Your GPU has work group size of 256 for example.

So tune 1 64 4 means it will load 64 4 times until 256 is reached.

Possible values are 128 2 or 32 8 or 16 16.
Using unequal number results in lower speed because value of 256 can`t be reached in this case.

Without tune params the kernel will be loaded without any structure.


Or even mentioning something from the following would help -

The tune param will define the kernel size of the GPU into chunks.
Your GPU has work group size of 256 for example.


Mike, please explain HOW you know his GPU has a work group size of 256. How can someone determine their own GPU parameters?



Look into stderr output of GPU task.
Our apps list WG sizes for all GPUs in system and WG for used GPU too.



I still don't know what the profiler is or how to apply it.


Sorry, I would better try to code, I'm not a writer ;)
And to have smth new to learn - what better could be for Homo Sapiens specie? ;) (joke).
Actually, if feel uncomfortable with description and ESPECIALLY if see advanced warning given - better to do as manual describe... or just leave it at default. And report bug if default fails.

General note regarding advanced options: I think better to have some tool for tweak than to have hardwired values and no way to change besides to write own code, right? But with tool comes responsibility for usage that tool. Yes, better fool-proof can and perhaps will be added. And this will cost time for coding new options/optimizations. Make your choice wisely.
19) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1557910)
Posted 15 days ago by Profile Raistmer
-tune N Mx My Mz
FFA_FETCH_WG=1,
FFA_COMPARE_WG=2


It is not 100% clear to me but is work group size calculated by Mx * My * Mz or is Mz for another use? It would make sense to me if -tune 1 8 8 8 = WG size 512.
Setting value for N sets the function to use fetch vs compare? This would be more "Try each and see if one is better for your hardware"?


As stated in ReadMe I classified this param as "advanced". That is, better to consult with app code before usage. Yes, 8x8x8 would give total WG size of 512 (not supported by ATi GPUs) but even 8x8x4 has no meaning for 2D kernel. Not all combos are possible for supported kernels. PErhas more kernels will be supported in future, but again, one need to understand where most load goes.. For now it's fetch kernel then FFT ones. Hence there is no much sense to "optimize" smth that takes only fraction of % of total execution time.
Version included in installer has 1D fetch kernel hance Mz should be 1 and My should be 1 for this type of kernel. Prev builds that support this key and APv7 include 2D fetch kernel, there Mz should be 1, MxxMy give workgroup size with that kernel will be called.
20) Message boards : Number crunching : Lunatics Windows Installer v0.42 Release Notes (Message 1557840)
Posted 15 days ago by Profile Raistmer
The tune param will define the kernel size of the GPU into chunks.
Your GPU has work group size of 256 for example.

Mike, please explain HOW you know his GPU has a work group size of 256. How can someone determine their own GPU parameters?


Look into stderr output of GPU task.
Our apps list WG sizes for all GPUs in system and WG for used GPU too.

EDIT: example from one of your own hosts:

Number of OpenCL platforms: 2


OpenCL Platform Name: Intel(R) OpenCL
Number of devices: 1
Max compute units: 20
Max work group size: 512
Max clock frequency: 400Mhz
Max memory allocation: 460587008
Cache type: Read/Write
Cache line size: 64
Cache size: 2097152
Global memory size: 1842348032
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Queue properties:
Out-of-Order: No
Name: Intel(R) HD Graphics 4600
Vendor: Intel(R) Corporation
Driver version: 9.18.10.3089
Version: OpenCL 1.2
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_depth_images cl_khr_gl_depth_images cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_image2d_from_buffer


OpenCL Platform Name: NVIDIA CUDA
Number of devices: 1
Max compute units: 4
Max work group size: 1024
Max clock frequency: 719Mhz
Max memory allocation: 536870912
Cache type: Read/Write
Cache line size: 128
Cache size: 65536
Global memory size: 2147483648
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Queue properties:
Out-of-Order: Yes
Name: GeForce GTX 760M
Vendor: NVIDIA Corporation
Driver version: 311.30
Version: OpenCL 1.1 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64


Next 20

Copyright © 2014 University of California