Posts by petri33


log in
1) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1720169)
Posted 5 days ago by Profile petri33Project donor
petri33, I suppose you tested this yourself before posting here, so how much did you gain by this?


The time for one task was 1100 and after the change it was 900.

Now I'm running two at a time.

It's quite a big speedup.
Please when you will have time provide data from profiler. What kernels affected most?
Regarding AMD - it's possible to leave IL inside bin file. I'm not sure though will machine code part recompiled in case of IL manually changed or IL part will just be ignored and bin loaded again.

P.S. try to rebuild w/o -fno-bin-amdil option in build string for CL compiler.


Thanks Raistmer for the info. I'll try to find time to test with profiler and do the suggested rebuilds.
2) Message boards : Number crunching : Work units processing performance has collapsed!! (Message 1713684)
Posted 20 days ago by Profile petri33Project donor
About the only scenario that makes sense is if your laptop is overheating. That does happen with laptops when crunching if steps aren't taken to cool them. Usually raising the laptop onto something that allows airflow underneath and a fan blowing on them helps. The GPU times are very long also, so something is slowing it down. A similar laptop that you can compare with is here, http://setiathome.berkeley.edu/results.php?hostid=7159696&offset=220
Note he is using the Apps I compiled so his tasks are labeled as Anonymous platform. An easy way to cool down the CPUs is to just have them run fewer tasks. You can set the 'On multiprocessor systems, us at most' setting to 50% and it will run 4 cpu tasks instead of 7. See if that helps.


I second to the advice on air circulation. The laptops need to grow legs. Mine has four CD's under its corners. A 10 mm or about a half of an inch will do miracles. Please do not forget Hoovering/Vaccuum cleaning the air intakes of the laptop.
3) Message boards : Number crunching : Panic Mode On (99) Server Problems? (Message 1713678)
Posted 20 days ago by Profile petri33Project donor
Strangely still the Boinc manager shows an increase in credits and RAC.

My Linux1 has hit 95 000 RAC on screen but shows 94 797 on stats.
4) Message boards : Number crunching : Q: How much time for a 0.42 MB task? (Message 1713611)
Posted 20 days ago by Profile petri33Project donor
4:11
5) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1713559)
Posted 20 days ago by Profile petri33Project donor
We can`t bother Eric with everything.
He is busy enough and some people do know what they are doing.

I take it that Cuda 6.50, run by Petri here on main is also approved by Eric then? Then I can start running it too, full bore.

Strange that it hasn't been released though as stock....

Also a bit strange is how much work Petri is trashing here on main, (again, why on main, with a Beta at best, app), mainly with the unreleased "setiathome enhanced x41zc, Cuda 6.50 special", and also to a lesser extent, with the AP GPU app.

http://setiathome.berkeley.edu/results.php?hostid=7475713

So, Beta is not for Beta testing with other words, main is for Beta testing. Ok, good to know...



The trashing hurts only me (lost processing time). The system takes care of faulty results and gives the work for others to process.

The amount of 'trash' is quite small though compared to the validated work.

The invalids are mainly in 30/30 AP's (different processing order, different results returned. And for MB there is an accuracy problem in my version and in main. The last 7 meaningful digits of a sum or an average depend on the order of float additions. Same numbers added in different order give a different (valid) result.

2.7% invalids. Errors are due to 4 GPU heat.
6) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1713557)
Posted 20 days ago by Profile petri33Project donor
petri33, I suppose you tested this yourself before posting here, so how much did you gain by this?


The time for one task was 1100 and after the change it was 900.

Now I'm running two at a time.
7) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1713284)
Posted 20 days ago by Profile petri33Project donor
For those with NV cards: (an optimisation) (optimize -- my ear says optimization)

1) open your AP_clFFTplan_GeForceGTX*bin* file with a text editor. {a * means any text}
2) find lines saying (in plain english assembler code) bar.sync
3) Of those "bar.sync" lines comment out (add // at the beginning of the line) those that have no aftercomming lines having NO more ld.s... There shold be 2 lines saying bar.sync before a return (ret) and no ld.s between.

I have commented out (//) two lines: L 1049 and L 1565.

Feel free to try.

... like this ...
--- ld.shared.f32 %f161, [%rd4+12336]; ld.shared.f32 %f163, [%rd4+13364]; ld.shared.f32 %f165, [%rd4+14392]; ld.shared.f32 %f167, [%rd4+15420]; // bar.sync 0; add.s32 %r70, %r10, %r1; add.s32 %r71, %r70, %r4; mul.wide.s32 %rd15, %r71, 8; add.s64 %rd16, %rd5, %rd15; st.global.v2.f32 [%rd16], {%f104, %f137}; st.global.v2.f32 [%rd16+2048], {%f106, %f139}; st.global.v2.f32 [%rd16+4096], {%f108, %f141}; st.global.v2.f32 [%rd16+6144], {%f110, %f143}; st.global.v2.f32 [%rd16+8192], {%f112, %f145}; st.global.v2.f32 [%rd16+10240], {%f114, %f147}; st.global.v2.f32 [%rd16+12288], {%f116, %f149}; st.global.v2.f32 [%rd16+14336], {%f118, %f151}; st.global.v2.f32 [%rd16+16384], {%f120, %f153}; st.global.v2.f32 [%rd16+18432], {%f122, %f155}; st.global.v2.f32 [%rd16+20480], {%f124, %f157}; st.global.v2.f32 [%rd16+22528], {%f126, %f159}; st.global.v2.f32 [%rd16+24576], {%f128, %f161}; st.global.v2.f32 [%rd16+26624], {%f130, %f163}; st.global.v2.f32 [%rd16+28672], {%f132, %f165}; st.global.v2.f32 [%rd16+30720], {%f134, %f167}; ret; }


For my AMD and an Intel a bin file is a bin file. I'd do the same if I knew how to.
(for the generating code leave out the last open.cl.BARRIER.or.something please)



Petri, what does this do?


It may give some speed.

It gives a GPU core a permission to continue calculations after reading from shared memory.
Since these loads are preceded by bar.sync (a wait) and no writes are done to shared memory after these loads it is not necessary to wait all reads to be finished before continuing. Nothing can alter the state of the shared memory when all cores are doing load operations.
8) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1713098)
Posted 21 days ago by Profile petri33Project donor
// bar.sync 0;
st.shared.f32 [%r7], %f7;
st.shared.f32 [%r7+512], %f8;
st.shared.f32 [%r7+1024], %f9;
st.shared.f32 [%r7+1536], %f10;
st.shared.f32 [%r7+2048], %f11;
st.shared.f32 [%r7+2560], %f12;
st.shared.f32 [%r7+3072], %f13;
st.shared.f32 [%r7+3584], %f14;

Would that be correct?(spaces are not showing)



Definitely no.

Just the two places where there are no ld.shared... or st.shared.. lines before a "ret;"
9) Message boards : Number crunching : Optimizing GPUs (Message 1713097)
Posted 21 days ago by Profile petri33Project donor
Howabout.. if You have two different types of lets say NVIDIA cards like 780 and 980....

1) Is there a way of telling BOINC to give different attributes/parameters to a task run by a kind of card or in a given PCIE slot
2) or to run a different executable for a different *NVIDIA" GPU

?
10) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1713048)
Posted 21 days ago by Profile petri33Project donor
For those with NV cards: (an optimisation) (optimize -- my ear says optimization)

1) open your AP_clFFTplan_GeForceGTX*bin* file with a text editor. {a * means any text}
2) find lines saying (in plain english assembler code) bar.sync
3) Of those "bar.sync" lines comment out (add // at the beginning of the line) those that have no aftercomming lines having NO more ld.s... There shold be 2 lines saying bar.sync before a return (ret) and no ld.s between.

I have commented out (//) two lines: L 1049 and L 1565.

Feel free to try.

... like this ...
--- ld.shared.f32 %f161, [%rd4+12336]; ld.shared.f32 %f163, [%rd4+13364]; ld.shared.f32 %f165, [%rd4+14392]; ld.shared.f32 %f167, [%rd4+15420]; // bar.sync 0; add.s32 %r70, %r10, %r1; add.s32 %r71, %r70, %r4; mul.wide.s32 %rd15, %r71, 8; add.s64 %rd16, %rd5, %rd15; st.global.v2.f32 [%rd16], {%f104, %f137}; st.global.v2.f32 [%rd16+2048], {%f106, %f139}; st.global.v2.f32 [%rd16+4096], {%f108, %f141}; st.global.v2.f32 [%rd16+6144], {%f110, %f143}; st.global.v2.f32 [%rd16+8192], {%f112, %f145}; st.global.v2.f32 [%rd16+10240], {%f114, %f147}; st.global.v2.f32 [%rd16+12288], {%f116, %f149}; st.global.v2.f32 [%rd16+14336], {%f118, %f151}; st.global.v2.f32 [%rd16+16384], {%f120, %f153}; st.global.v2.f32 [%rd16+18432], {%f122, %f155}; st.global.v2.f32 [%rd16+20480], {%f124, %f157}; st.global.v2.f32 [%rd16+22528], {%f126, %f159}; st.global.v2.f32 [%rd16+24576], {%f128, %f161}; st.global.v2.f32 [%rd16+26624], {%f130, %f163}; st.global.v2.f32 [%rd16+28672], {%f132, %f165}; st.global.v2.f32 [%rd16+30720], {%f134, %f167}; ret; }


For my AMD and an Intel a bin file is a bin file. I'd do the same if I knew how to.
(for the generating code leave out the last open.cl.BARRIER.or.something please)
11) Message boards : Number crunching : Typical time for a GTX 980 to crunch a work unit? (Message 1713036)
Posted 21 days ago by Profile petri33Project donor
To get an idea of run times take a look at my results. I run one at a time with all my GPUs (2 x GTX780 + 2 x GTX980).

Linux1 with 2 x GTX 780 and 2 x GTX980


A correction. AP's run two at a time.
12) Message boards : Number crunching : Typical time for a GTX 980 to crunch a work unit? (Message 1712561)
Posted 22 days ago by Profile petri33Project donor
To get an idea of run times take a look at my results. I run one at a time with all my GPUs (2 x GTX780 + 2 x GTX980).

Linux1 with 2 x GTX 780 and 2 x GTX980
13) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703811)
Posted 21 Jul 2015 by Profile petri33Project donor
My version has still some accuracy problems.


Still didn't find the complete story there, though have the full team winding up to put each bit in and see what breaks (watch that Github soon). The Chirp explains a little, but not all of the issue. It'll be interesting what falls out the next few weeks in the background.


My answer is off topic. Not a server problem. I'll pm.
14) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703799)
Posted 21 Jul 2015 by Profile petri33Project donor
Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)


you could pretend they are CPU instead.
And revive my old @teammod@ to process both CPU and GPU apps via CPU-only BOINc scheduling. Good old days before BOINC even know what GPU is :DDD


I thought of that. I remember seeing a bit of code that determined what GPU to use by looking which one had most free memory or something similar.

I coded once a solution that had a variable in shared (CPU) memory and an increment counter modulo N (N=number of GPUs) to put the next task to that GPU..

But none of that is not necessary since the normal MB work is flowing in again.

I used the down-time to upgrade my Linux (Ubuntu) version to a newer one.
15) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703486)
Posted 20 Jul 2015 by Profile petri33Project donor
Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)

No idea. But maybe you can use the Rescheduler to move them from CPU to GPU?


I could reschedule one task by hand. How? Editing some file?
I will not try to find a rescheduler for my linux machine.

Just edit the client state file by changing the <results> entry to the version number and plan class of your CUDA App in your app_info. Works for ATIs.
Just watch the estimated time, it may be too short after the change.
Best way is to suspend the VLAR, stop BOINC, then edit the entry containing the suspended line. In my case I would change;
<version_num>700</version_num>
to
<version_num>708</version_num>
<plan_class>opencl_ati5_sah</plan_class>

Don't make a mistake...or you Lose All your cache...



Thanks TBar, that is what I did. I had to do it to both clien_state and client_state_prev .xml files.

Now I'm going to check how it did (WU). It may require a third run by someone else. My version has still some accuracy problems.

Hmmm, not much difference. Except mine didn't overflow;
Yours, Run time: 28 min 46 sec
Mine, Run time: 28 min 37 sec
I think the Grump found the nVidia OpenCL App was a little faster on MBs, but it ate a whole CPU core.
Or, maybe it was someone else...


Here is one that did not overflow. Runtime a bit high. (WU)
16) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703442)
Posted 20 Jul 2015 by Profile petri33Project donor
Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)

No idea. But maybe you can use the Rescheduler to move them from CPU to GPU?


I could reschedule one task by hand. How? Editing some file?
I will not try to find a rescheduler for my linux machine.

Just edit the client state file by changing the <results> entry to the version number and plan class of your CUDA App in your app_info. Works for ATIs.
Just watch the estimated time, it may be too short after the change.
Best way is to suspend the VLAR, stop BOINC, then edit the entry containing the suspended line. In my case I would change;
<version_num>700</version_num>
to
<version_num>708</version_num>
<plan_class>opencl_ati5_sah</plan_class>

Don't make a mistake...or you Lose All your cache...



Thanks TBar, that is what I did. I had to do it to both clien_state and client_state_prev .xml files.

Now I'm going to check how it did (WU). It may require a third run by someone else. My version has still some accuracy problems.
17) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703428)
Posted 20 Jul 2015 by Profile petri33Project donor
I managed to do the edit manually...

It started to crunch ...

root@Linux1:~/Downloads/BOINC# cat slots/6/stderr.txt setiathome_CUDA: Found 4 CUDA device(s): Device 1: GeForce GTX 780, 3071 MiB, regsPerBlock 65536 computeCap 3.5, multiProcs 12 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 780, 3071 MiB, regsPerBlock 65536 computeCap 3.5, multiProcs 12 pciBusID = 1, pciSlotID = 0 Device 3: GeForce GTX 780, 3071 MiB, regsPerBlock 65536 computeCap 3.5, multiProcs 12 pciBusID = 3, pciSlotID = 0 Device 4: GeForce GTX 780, 3071 MiB, regsPerBlock 65536 computeCap 3.5, multiProcs 12 pciBusID = 4, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 3 setiathome_CUDA: CUDA Device 3 specified, checking... Device 3: GeForce GTX 780 is okay SETI@home using CUDA accelerated device GeForce GTX 780 Using pfb = 4 from command line args Using pfp = 192 from command line args setiathome enhanced x41zc, Cuda 6.50 special Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.012579 Sigma 127 Thread call stack limit is: 1k


now I'm waiting it to finish. Estimate (bad) was 7 minutes.


EDIT:
minutes done
4:00 10%
5:00 12.57%
7:00 17.67%
8:00 20.32%
9:00 23.16%
10:00 26.16%
14:00 40.39%
15:00 43.61%
16:00 46.59%
17:00 49.81%
18:00 52.98%
19:00 55.00%
20:00 59.07%
21:00 62.30%
22:00 65.00%
23:00 68.10%
24:00 71.18%
25:00 73.90%
26:00 76.75%
27:00 79.60%
28:00 82.25%
28:46 100.00% (sudden jump, maybe 30/30)

End of test.
18) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703398)
Posted 20 Jul 2015 by Profile petri33Project donor
Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)

No idea. But maybe you can use the Rescheduler to move them from CPU to GPU?


I could reschedule one task by hand. How? Editing some file?
I will not try to find a rescheduler for my linux machine.
19) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703392)
Posted 20 Jul 2015 by Profile petri33Project donor
I have a CUDA MB app built from the source (modified to use CUDA streams to achieve 95% GPU occupancy when running just one MB at a time). This is not the beta OpenCL app.

I'd need the right information for the app_info.xml to make the server think this is an OpenCL app even though it is not.

A user selectable option "compute VLAR" y/n in the preferences would be good too.
20) Message boards : Number crunching : Panic Mode On (98) Server Problems? (Message 1703383)
Posted 20 Jul 2015 by Profile petri33Project donor
Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)


Next 20

Copyright © 2015 University of California