Posts by petri33


log in
1) Message boards : Number crunching : Low RAC with GTX 1080 (Message 1812477)
Posted 20 hours ago by Profile petri33Project Donor
Thanks to petri33 who send me nicely its own binaries, my GTX 1080 rocks now !
It takes around 4 minutes to complete a wu (blc or classic :)

Thanks for your help :)


Thank you for volunteering to test a Linux version.
2) Message boards : Number crunching : Thought(s) on changing S@h limit of: 100tasks/mobo ...to a: ##/CPUcore (Message 1811924)
Posted 2 days ago by Profile petri33Project Donor
Hi,

We've hit the moment when 100/GPU is not enough for 300 minutes (5 hr) even if all are GBT tasks.

Just my $25/month (even though it is not showing properly).
3) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811912)
Posted 2 days ago by Profile petri33Project Donor
Also v. quick before outage, preliminary list of the reasons why tasks were sent to tie-breakers.

anon linux Petri v anon cuda50
Opti ATi_HD5 v mac intel_gpu
Petri v opt intel_gpu v opt nvidia SoG
Sock CPU v stock Apple CPU
stock ati5_SoG v mac intel_gpu
Stock CPU v intel_gpu
Stock CPU v stock cuda overflow
stock CPU v stock cuda32 overflow
Stock CPU v stock cuda50 overflow
Stock CPU v stock intel_gpu
Stock CPU v stock mac CPU overflow
Stock CPU v stock mac intel_gpu
Stock CPU v stock nvidia_mac
Stock CPU v stock nvidia_mac
Stock CPU v stock nvidia_mac
Stock CPU v stock nvidia_mac
Stock CPU v stock nvidia_mac v petri special
Stock CPU v stock nvidia_SoG
Stock CPU v stock nvidia_SoG
Stock CPU v stock SoG - SoG only overflow
Stock CPU vs cuda50 late overflow
Stock linux CPU v cuda42
Stock linux CPU v stock nvidia_mac
Stock nvidia_mac v stock ati5_mac
stock nvidia_mac v stock mac intel_gpu v anon Sog v stock
Stock nvidia_sah v stock nvidia_mac
Stock nvidia_sah v stock_cuda42
Stock nvidia_SoG v stock cuda42
Stock nvidia_SoG v stock intel_gpu
Stock v stock nvidia_mac

I'll normalise those descriptions and count them while we're off.


And the outcome % or deemed as invalid rate % would be nice too.
4) Message boards : Number crunching : Philosophy: To CPU or NOT to CPU (Message 1811884)
Posted 2 days ago by Profile petri33Project Donor
Hi,

I have i7-3930K@3800 ht enabled (6+6 cores) and 4*GTX1080

I run GPU with 4 cores and one WU per GPU.
I run CPU with 6 cores.
I leave 2 cores for OS and surfing & other work.

The system is responsive and productive.

If I didn't have that many CPU cores I'd have to run a different configuration reducing first the number of CPU tasks.

The guppi WUs take an hour or so on CPU and under 3 minutes on GPU. No need for queue management here.

Petri
5) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811774)
Posted 3 days ago by Profile petri33Project Donor
I may have found it. It could be the autocorrelation blocking sync added a few days ago. I thought it started before then, but, maybe not. I removed it from the AC section and for now it seems alright, but, it's working GUPPIs now, and the problem seems to occur on the Arecibo tasks AFAIK.
For now the settings below seem to be working;
cudaEventCreateWithFlags(&chirpDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&fftDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&summaxDoneEvent, cudaEventDisableTiming|(blockingSync ? cudaEventBlockingSync : 0)); cudaEventCreateWithFlags(&powerspectrumDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&autocorrelationDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&autocorrelationRepackDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&ac_reduce_partialEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&tripletsDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&tripletsDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&pulseDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&pulseDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync); cudaEventCreateWithFlags(&gaussDoneEvent, cudaEventDisableTiming); cudaEventCreateWithFlags(&gaussDoneEvent2, cudaEventDisableTiming|cudaEventBlockingSync);

CPU use is up a little on the Arecibo tasks but about the same on the GUPPIs.
This will work, IF it keeps working.


I'll check that I don't use the same event in two places...
6) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811762)
Posted 3 days ago by Profile petri33Project Donor
@TBar
I did change the chirp and powerspetrum code. That may explain autocorr error reappearance.
7) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811761)
Posted 3 days ago by Profile petri33Project Donor
Jason, could you email me the problematic WU and the correct result file to compare to so I can start debugging?
Both cases the extra pulse and autocorr error.
8) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811556)
Posted 3 days ago by Profile petri33Project Donor
Hi,

I've got now dozens of guppi tasks in the queue so soon they'll start flowing in to the database.

I'll be waiting the results. (A guess: 1-4 bit errors in the lsb of the mantissa of a float 'peak' snr/thresh)

EDIT: And now they are coming: (Those 164 seconds/task ar 0.07)
http://setiathome.berkeley.edu/results.php?hostid=7475713&offset=0&show_names=0&state=2&appid=29

Petri
9) Message boards : Number crunching : 1080 underclocking (Message 1811472)
Posted 3 days ago by Profile petri33Project Donor
Now this is weird, on this screen, is says the GPU core clock is running at 2062.5, and the memory clock at 2256, still with a temp around 30 at 100% fan.


I don't know what to make of this, but I am considering tossing in my 2 1060s that I have running now in a rig upstairs, so I can use the proper Precision software to control them correctly, kind of frustrating. Unless it actually is running at the higher clock speed. Confused.


The power consumption tells that the GPU is not fully utilized. That's why it has low temps. So tweak parameters, run multiple units, free a core...

The GPU load is measured from the first SMX. Your card has 20 of them. Most of them idle.

Well, I am running 4 tasks concurrently, running one core per task, what do you suggest for parameters to set the card for? And are you running Precision X OC or 16, are you mixing series in your system as well? I couldn't find a way to get the new version to be happy with the 9 series cards, and EVGA was useless. Suggestions are appreciated.

980Ti that I Hybrid'ized, and one that is still running on air:


Do these appear that they are working up to their full potential? I don't want to take anything to the bleeding edge, but do want them all to be working and not loafing.. ;-)


These pictures and the needed command line options are from the windows world. The last two pictures seem ok. 60-70% TDP reveals that they are running quite good.

I'll leave it up to Mike or another windows guru to help you.
10) Message boards : Number crunching : Monitoring inconclusive GBT validations and harvesting data for testing (Message 1811451)
Posted 3 days ago by Profile petri33Project Donor
And one other thing to consider. Both triplets and pulses need an average. To calculate average you must sum all values and divide by that value.

a) with cuda fast_math the division is an approximation. However you can specify for that single division to be fdiv_prec or something (that i'll try next when I get guppi work).

b) The sum for the average can be done sequentially, pairwise, in small batches, in a tree like parallel sum, ... all producing different 'sum' due to cumulating rounding errors and loss of precision (especially in sequential sum).

But isn't there the original data in the reported packet? The berkeley guys can run a double precision version on that. An error in the power at Nth decimal should not matter. The completely missed/extraneous pulses should trigger a third oppinion run though.
11) Message boards : Number crunching : 1080 underclocking (Message 1811426)
Posted 3 days ago by Profile petri33Project Donor
Now this is weird, on this screen, is says the GPU core clock is running at 2062.5, and the memory clock at 2256, still with a temp around 30 at 100% fan.


I don't know what to make of this, but I am considering tossing in my 2 1060s that I have running now in a rig upstairs, so I can use the proper Precision software to control them correctly, kind of frustrating. Unless it actually is running at the higher clock speed. Confused.


The power consumption tells that the GPU is not fully utilized. That's why it has low temps. So tweak parameters, run multiple units, free a core...

The GPU load is measured from the first SMX. Your card has 20 of them. Most of them idle.
12) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1811422)
Posted 3 days ago by Profile petri33Project Donor
I should add that the GPU data is per card -- I can't tell which card did which task (part of why I have to exclude multi-gpu setups from my scans).

Your quad 1080 rig is more like 12871 cr/h in GPU throughput -- pretty amazing.


Thanks, I think it is one of a kind.

The cr/Wh is not too bad either. The 750 does well in that too.
13) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1811393)
Posted 3 days ago by Profile petri33Project Donor
You first, then Mr Kevvy's Linux Hosts, -= Vyper =- last.

C:\Scripts\SETI>aggregate.pl -anon 7475713 7985986 7939003 7961370 7957088 7945042 8018193 7944247 8053171
Host, API, Device, Credit, Seconds, Credit/Hour, Work Units

7475713, cpu, Intel Core i7-3930K @ 3.20GHz, 3886.42, 13523.6966666667, 1034.56268983653, 39
7475713, gpu, [4] NVIDIA GeForce GTX 1080, 166123.22, 185855.74, 3217.78381447891, 1961

7985986, cpu, Intel Core i7-5820K @ 3.30GHz, 33539.71, 66401.3341666667, 1818.38147554289, 456
7985986, gpu, [4] NVIDIA GeForce GTX 980, 126160.17, 393070.560000001, 1155.45822612611, 1544

7939003, cpu, Intel Core i7-3930K @ 3.20GHz, 32818.28, 61109.2008333333, 1933.35547493455, 395
7939003, gpu, [3] NVIDIA GeForce GTX 980, 125373.4, 525530.36, 858.835710271811, 1432

7961370, cpu, Intel Core 2 Quad Q6600 @ 2.40GHz, 6153.29, 46733.015, 474.008449914905, 72
7961370, gpu, [2] NVIDIA GeForce GTX 970, 83152.92, 296877.1, 1008.3314341187, 990

7957088, cpu, Intel Core 2 Quad @ 2.40GHz, 6647.92, 38441.1175, 622.575865542931, 80
7957088, gpu, NVIDIA GeForce GTX 970, 40557.12, 275369.12, 530.217883544821, 490

7945042, cpu, AMD FX(tm)-8350 Eight-Core Processor, 16051.1, 74594.60125, 774.639974364097, 216
7945042, gpu, NVIDIA GeForce GTX 970, 29634.02, 86945.7199999999, 1227.0008460451, 349

8018193, cpu, AMD Phenom(tm) 9500 Quad-Core Processor, 7523.62, 42741.99, 633.686732882582, 99
8018193, gpu, NVIDIA GeForce GTX 970, 38106.68, 185526.88, 739.42949938036, 456

7944247, cpu, AMD Phenom(tm) II X6 1045T Processor, 21680.56, 69026.7316666666, 1130.72159314897, 280
7944247, gpu, NVIDIA GeForce GTX 970, 24682.34, 177843.93, 499.631469007685, 311

8053171, gpu, [4] NVIDIA GeForce GTX 750 Ti, 93908.0399999999, 388460.45, 870.278928009272, 1060


Thank You! The gtx 750 Ti is putting out some serious numbers!
14) Message boards : Number crunching : GPU FLOPS: Theory vs Reality (Message 1811364)
Posted 3 days ago by Profile petri33Project Donor
Hi Shaggie,

Would it be possible to run a targeted scan of my (petri33) host and Mr Kevvys Linux hosts to get insight of Arecibo performance with special CUDA app. -=Vyper=- is running special too on his Linux.

All specials run one at a time.

My power Draw is between 120-170W, mostly 136W with Arecibo work load.


That is of interest since I wonder if productivity comes wit a cost.
15) Message boards : Number crunching : 1080 underclocking (Message 1811139)
Posted 4 days ago by Profile petri33Project Donor
Hi,
I've got 4 GTX1080 reference design cards. I run the fan at 95% and the temps are 57-65 C. The nvidia-smi reports the cards drawing each 130W-140W most of the time and 170W under high load. I run P2 state, 2020Mhz for GPU and 10126MHz for memory.

How much can the PCIE + 8 pin connector supply together?

Petri

Supposedly, 75w through the PCIe motherboard connector and 100W 150W through the 8pin PCIe power supply cable. That is only 175W total and looks like 5 watts shy of maximum TDP. I see where the nvsmi document states the maximum power for the 1080 card is 225W. Not sure how that works, must be 225W instantaneous load and more likely 180 watts for sustained load.

[Edit] corrected power delivery


Thank you. So I'm getting near max but there is still something to improve.
16) Message boards : Number crunching : 1080 underclocking (Message 1811101)
Posted 4 days ago by Profile petri33Project Donor
Hi,
I've got 4 GTX1080 reference design cards. I run the fan at 95% and the temps are 57-65 C. The nvidia-smi reports the cards drawing each 130W-140W most of the time and 170W under high load. I run P2 state, 2020Mhz for GPU and 10126MHz for memory.

How much can the PCIE + 8 pin connector supply together?

Petri
17) Message boards : Number crunching : 1080 underclocking (Message 1810835)
Posted 5 days ago by Profile petri33Project Donor
If the card is well under it's maximum possible thermal load, and well under it's maximum possible power load then there is no reason for it to drop it's clock speed down.


But you actually don't know that.

As I indicated in my msg - 1807934 the GPU might have reached it's limit on one or more power input pins, while others, like the video output, because in our BOINC/Seti crunching we don't use it, are drawing the minimum power necessary.

And that would indicate a design flaw, and under Australian Consumer legislation people would (after a lot of time & effort, naturally) be able to get a refund on their purchase as it doesn't meet the claims of the sales literature.
If the reported power & temperature readings are well below the rated limit for the card, there is no good reason for it not to sustain it's rated Boost speed.

I noticed in the opening post that Zalster has modified the cooling on the card- are the onboard regulators receiving enough cooling as a result of this? is one thought that comes to mind.


Also on temps, there are probably more than one sensors, probably all connected to the same circuit which interrupts overclocking, these sensors act immediately, much faster than the one that measures and reports the GPU temperature. So again you don't know that one small part of the GPU has reached its temperature limit, except for the fact that the GPU has reduced its clock speed.

Possible, but unlikely IMHO.
In the case of Intel CPUs you mentioned, as well as the case temperature, there are also sensors for each core which are readable by most hardware monitoring software.
If there were multiple GPU temperature sensors, I would expect them to be displayed by such hardware.


Hi,

With the Gtx1080 comes a handy installation manual. There is a statement that says that You can and should use a screwdriver to remove a part of the next gtx1080 backplate to enable better cooling.

So, heat may be a problem. The High end pascal cards (Titan and the workstation cards) have a lot lower MHz.
18) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1810814)
Posted 5 days ago by Profile petri33Project Donor
From here, Message 1809468
A new thingy to try: When creating events in the cudaAcceleration.cu
use a new flag pair
cudaEventDisableTiming|cudaEventBlockingSync
instead of the old cudaEventDisableTiming alone.

1) Apply this at least to pulseDoneEvent. Not the ones with number at the end.
2) and probably to gaussDoneEvent, tripletsDoneEvent, autocorrelationDoneEvent and maybe summaxDoneEvent. Not the ones with number at the end.

It will drop CPU usage but may slow things down. The GPU usage drops too, but if you have enough RAM you can try running 2 instances at a time. Watch out for the system going into constant swap state (running out of available RAM).

This actually works on the Arecibo tasks, CPU use is reduced with little change in run time. It doesn't work so well with the GUPPIs though. The CPU usage begins around 60-70% and then about a third of the way through increases to around 95% usage. Any ideas on which Events might produce better results on the GUPPIs?


Hi,

this is something I have to say No Can Do/Guess. I have not had time to investigate.

The guppi tasks spend most of the time in pulse finding. The ar 0.08 variants may do something else. If you have time, place some printf(stderrr, "%s, %s, %s", "i'm going here\r\n", __file__, __line__ ); statements to the code to see where it is going.

The best way would be to enable timers on events and copy the code from NV cuda examples to see how long an event is being waited for.
19) Message boards : Number crunching : SETI/BOINC Milestones [ v2.0 ] - XXIX (Message 1810778)
Posted 5 days ago by Profile petri33Project Donor
Average processing rate 1,237.60 GFLOPS
20) Message boards : Number crunching : 1080 underclocking (Message 1810770)
Posted 5 days ago by Profile petri33Project Donor
Hi,

My settings are a workaround to make P2 performance equal P0.
The latest driver does not copy P0 settings to P2 but the one at the time of the initial release (late May/early June) does. That is why I do not use the latest driver.

A couple of years ago I bought a 780 and it did the same thing i.e. P2 on compute. The later drivers fixed that. I'm waiting for a new driver that will allow P0 for compute workloads.

Not sure how I am able to achieve my P2 settings then as I am using the latest Nvidia Windows 7 driver 372.54.


To start with you could experiment with an earlier version available from NVIDIA. I'm not sure if you have the nvidia-settings.exe and nvidia-smi.exe in windows. Or if you can use a more advanced tool to set up NV parameters.

A windows guru could help here.


Next 20

Copyright © 2016 University of California