have more GPUs than actually exist

Message boards : Number crunching : have more GPUs than actually exist
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile bloodrain
Volunteer tester
Avatar

Send message
Joined: 8 Dec 08
Posts: 231
Credit: 28,112,547
RAC: 1
Antarctica
Message 1991713 - Posted: 27 Apr 2019, 16:33:44 UTC - in response to Message 1991706.  

i miss something on this. what is it relating to.
ID: 1991713 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991714 - Posted: 27 Apr 2019, 16:50:11 UTC - in response to Message 1991713.  
Last modified: 27 Apr 2019, 16:51:09 UTC

i miss something on this. what is it relating to.

Petri says a version 0.99 of the special app is working and close to release. That would enable you to set <ngpus>0.5</ngpus or <count>0.5</count> for the app and run two tasks at the same time on the card. It won't really run both simultaneously like you could do with the older CUDA42 or CUDA50 apps or the SoG app. What it will do is load two tasks and prep both for running my pre-initalizing the fft search mechanism. As soon as the current task finishes, it will immediately start crunching the other staged task on the card and then preload another task to be ready to start when the current task again finishes. Because of the large reduction in memory usage that the 0.98 special app achieved, there is plenty of room to load two tasks in the gpu memory, even in the lesser cards with only 3 or 4GB of memory.

This will save 2-4 seconds of idle time in the task loading transaction for each task. Over an hour or a day, that will allow more tasks being crunched and production will go up.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991714 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1991721 - Posted: 27 Apr 2019, 18:00:57 UTC - in response to Message 1991714.  

i miss something on this. what is it relating to.

Petri says a version 0.99 of the special app is working and close to release. That would enable you to set <ngpus>0.5</ngpus or <count>0.5</count> for the app and run two tasks at the same time on the card. It won't really run both simultaneously like you could do with the older CUDA42 or CUDA50 apps or the SoG app. What it will do is load two tasks and prep both for running my pre-initalizing the fft search mechanism. As soon as the current task finishes, it will immediately start crunching the other staged task on the card and then preload another task to be ready to start when the current task again finishes. Because of the large reduction in memory usage that the 0.98 special app achieved, there is plenty of room to load two tasks in the gpu memory, even in the lesser cards with only 3 or 4GB of memory.

This will save 2-4 seconds of idle time in the task loading transaction for each task. Over an hour or a day, that will allow more tasks being crunched and production will go up.

What about Cards with 2GB VRAM? (EVGA GTX-1050 2GB GDDR5 VRAM.)


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1991721 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991730 - Posted: 27 Apr 2019, 20:15:31 UTC - in response to Message 1991721.  

That might be cutting it too close. Only Petri knows for sure. I am seeing about 1600MB of gpu memory in use for a gpu task on the 0.98b1CUDA10.1 app.

Think that two task loaded on a 2GB card will be unlikely.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991730 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 1991731 - Posted: 27 Apr 2019, 20:21:01 UTC - in response to Message 1991714.  
Last modified: 27 Apr 2019, 20:26:49 UTC

This will save 2-4 seconds of idle time in the task loading transaction for each task. Over an hour or a day, that will allow more tasks being crunched and production will go up.


Not sure how useful this would be, but I have a windows program that reads the BoincTasks history file and shows idle time for various projects and systems. It is at https://github.com/BeemerBiker/Gridcoin/tree/master/BTHistoryReader and would have to be built with VS2017.

Here is a sample output that shows an idle problem on milkyway

https://github.com/BeemerBiker/Gridcoin/blob/master/BTHistoryReader/BTHistory_Demo3.png
ID: 1991731 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1991911 - Posted: 29 Apr 2019, 18:42:28 UTC - in response to Message 1991706.  

I guess you managed to wrangle the code snippet oddbjornik threw in here for pre-initialization.
I have turned that code snippet inside out and upside down several times since I first aired it here. The current version has run for two weeks without incident on my Linux hosts, and as Petri indicates, it should be in the pipeline for release with 0.99. As usual, Petri and TBar will handle the full testing/compiling/packaging/release of 0.99 to the public.
ID: 1991911 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991913 - Posted: 29 Apr 2019, 19:10:21 UTC - in response to Message 1991911.  

Hi Oddbjornik, so does the extra code for the mutex lock handle error conditions gracefully? Or have neither you nor Petri run into that experimental condition yet?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991913 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 1991914 - Posted: 29 Apr 2019, 19:17:44 UTC - in response to Message 1991913.  

Hi Oddbjornik, so does the extra code for the mutex lock handle error conditions gracefully? Or have neither you nor Petri run into that experimental condition yet?
That's what the turning inside out of the code has been for. I believe it is pretty near bulletproof by now. PM me if you want a link to the source.
ID: 1991914 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991947 - Posted: 30 Apr 2019, 1:04:40 UTC - in response to Message 1991914.  

Thanks, I'll wait for the official release. I'm in no hurry. My time right now is trying to get a gpu app working on my new Jetson Nano.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991947 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1991948 - Posted: 30 Apr 2019, 1:11:58 UTC - in response to Message 1991947.  

My time right now is trying to get a gpu app working on my new Jetson Nano.


He tasks me. He tasks me, and I shall have him.

I'll chase him round the Moons of Nibia and round the Antares Maelstrom and round Perdition's flames before I give him up!
ID: 1991948 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991952 - Posted: 30 Apr 2019, 2:46:34 UTC - in response to Message 1991948.  

My time right now is trying to get a gpu app working on my new Jetson Nano.


He tasks me. He tasks me, and I shall have him.

I'll chase him round the Moons of Nibia and round the Antares Maelstrom and round Perdition's flames before I give him up!

Love the quote! Ha ha. LOL.

I'm close. I got tasks this time but errored out on amount of disk space required. Bumped up everything I could think of plus suspending Seti for the time being so just Einstein is able to run unobstructed hopefully.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991952 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991959 - Posted: 30 Apr 2019, 4:06:20 UTC

Back to the penalty box again for another 24 hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991959 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1991984 - Posted: 30 Apr 2019, 9:49:24 UTC - in response to Message 1991948.  

My time right now is trying to get a gpu app working on my new Jetson Nano.


He tasks me. He tasks me, and I shall have him.

I'll chase him round the Moons of Nibia and round the Antares Maelstrom and round Perdition's flames before I give him up!

Hi Zalster,

ST: The Wrath of Khan. :) Love that movie!

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1991984 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1991986 - Posted: 30 Apr 2019, 9:53:09 UTC - in response to Message 1991952.  

I'm close. I got tasks this time but errored out on amount of disk space required. Bumped up everything I could think of plus suspending Seti for the time being so just Einstein is able to run unobstructed hopefully.
Diagnosis and suggestion posted at BOINC message 91274
ID: 1991986 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1992062 - Posted: 1 May 2019, 1:13:33 UTC - in response to Message 1991986.  

Thanks for explaining the misleading error message. See that the message has nothing to do with BOINC disk usage limits now. My third 24 hour delay period just expired and then BOINC set another 24 hour delay. So still unable to test if my present app_info will work until I can finally get some work to test with tomorrow.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1992062 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1992095 - Posted: 1 May 2019, 7:17:58 UTC - in response to Message 1992062.  

Thanks for explaining the misleading error message. See that the message has nothing to do with BOINC disk usage limits now. My third 24 hour delay period just expired and then BOINC set another 24 hour delay. So still unable to test if my present app_info will work until I can finally get some work to test with tomorrow.
There must be a reason behind that delay. I'll go and look for it.
ID: 1992095 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1992103 - Posted: 1 May 2019, 10:10:24 UTC

does Cuda V0.98B1 made some errors on overflow wu ?


3452554813

Lan Computer here
7632732562

Tâche 7632732562
Nom 26ap19aa.18849.20517.10.37.5_0
Unité de travail (WU) 3452554813
Créé 27 Apr 2019, 14:11:31 UTC
Envoyé 27 Apr 2019, 19:40:45 UTC
Date limite de rapport 18 May 2019, 6:50:27 UTC
Reçu 28 Apr 2019, 5:29:57 UTC
État du serveur Sur
Résultats Succès
État du client Fait
État à la sortie 0 (0x00000000)
ID de l'ordinateur 8589036
Temps de fonctionnement 2 sec
Temps de CPU 2 sec
Valider l'état Valide
Crédit 20.80
FLOPS maximum de l'appareil 7,948.80 GFLOPS
Version de l'application SETI@home v8
Plateforme anonyme (GPU NVIDIA)
Peak disk usage 0.03 MB
Stderr output
<core_client_version>7.15.0</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 10 CUDA device(s):
Device 1: GeForce RTX 2070, 7949 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 1, pciSlotID = 0
Device 2: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 2, pciSlotID = 0
Device 3: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 3, pciSlotID = 0
Device 4: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 4, pciSlotID = 0
Device 5: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 5, pciSlotID = 0
Device 6: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 130, pciSlotID = 0
Device 7: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 131, pciSlotID = 0
Device 8: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 132, pciSlotID = 0
Device 9: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 133, pciSlotID = 0
Device 10: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 134, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce RTX 2070 is okay
SETI@home using CUDA accelerated device GeForce RTX 2070
Unroll autotune 36. Overriding Pulse find periods per launch. Parameter -pfp set to 36

setiathome v8 enhanced x41p_V0.98b1, Cuda 10.00 special
Compiled with NVCC, using static libraries. Modifications done by petri33 and released to the public by TBar.



Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is : 207.829614
Sigma 0
Thread call stack limit is: 1k
Spike: peak=24.19673, time=13, d_freq=1420051978.83, chirp=0, fft_len=8k
Spike: peak=25.12381, time=54.53, d_freq=1420046203.14, chirp=0, fft_len=16k
Spike: peak=54.97297, time=1.678, d_freq=1420048765.24, chirp=0, fft_len=32k
Spike: peak=34.74489, time=3.355, d_freq=1420048808.6, chirp=0, fft_len=64k
Spike: peak=25.78017, time=16.78, d_freq=1420051674.54, chirp=0, fft_len=64k
Spike: peak=29.55749, time=23.49, d_freq=1420046203.14, chirp=0, fft_len=64k
Autocorr: peak=80.70203, time=6.711, delay=0.04608, d_freq=1420048828.12, chirp=0, fft_len=128k
Autocorr: peak=36.14323, time=20.13, delay=0.04608, d_freq=1420048828.12, chirp=0, fft_len=128k
Autocorr: peak=19.7078, time=33.55, delay=0.04608, d_freq=1420048828.12, chirp=0, fft_len=128k
Autocorr: peak=20.54296, time=46.98, delay=0.04608, d_freq=1420048828.12, chirp=0, fft_len=128k
Spike: peak=74.41464, time=6.711, d_freq=1420048765.17, chirp=0, fft_len=128k
Spike: peak=53.57521, time=20.13, d_freq=1420046203.14, chirp=0, fft_len=128k
Spike: peak=24.59947, time=46.98, d_freq=1420049546.81, chirp=0, fft_len=128k
Autocorr: peak=80.85102, time=6.711, delay=0.04608, d_freq=1420048828.13, chirp=0.00092426, fft_len=128k
Autocorr: peak=36.72198, time=20.13, delay=0.04608, d_freq=1420048828.14, chirp=0.00092426, fft_len=128k
Autocorr: peak=20.53432, time=33.55, delay=0.04608, d_freq=1420048828.16, chirp=0.00092426, fft_len=128k
Autocorr: peak=20.73564, time=46.98, delay=0.04608, d_freq=1420048828.17, chirp=0.00092426, fft_len=128k
Spike: peak=76.2974, time=6.711, d_freq=1420048765.17, chirp=0.00092426, fft_len=128k
Spike: peak=49.81211, time=20.13, d_freq=1420046203.16, chirp=0.00092426, fft_len=128k
Spike: peak=25.69483, time=46.98, d_freq=1420049112.56, chirp=0.00092426, fft_len=128k
Autocorr: peak=80.42193, time=6.711, delay=0.04608, d_freq=1420048828.12, chirp=-0.00092426, fft_len=128k
Autocorr: peak=35.71114, time=20.13, delay=0.04608, d_freq=1420048828.11, chirp=-0.00092426, fft_len=128k
Autocorr: peak=20.01222, time=33.55, delay=0.04608, d_freq=1420048828.09, chirp=-0.00092426, fft_len=128k
Autocorr: peak=20.68423, time=46.98, delay=0.04608, d_freq=1420048828.08, chirp=-0.00092426, fft_len=128k
Spike: peak=70.3869, time=6.711, d_freq=1420048765.16, chirp=-0.00092426, fft_len=128k
Spike: peak=39.86379, time=20.13, d_freq=1420052325.93, chirp=-0.00092426, fft_len=128k
Autocorr: peak=80.84055, time=6.711, delay=0.04608, d_freq=1420048828.14, chirp=0.0018485, fft_len=128k
Autocorr: peak=36.33689, time=20.13, delay=0.04608, d_freq=1420048828.16, chirp=0.0018485, fft_len=128k
Autocorr: peak=19.46532, time=33.55, delay=0.04608, d_freq=1420048828.19, chirp=0.0018485, fft_len=128k
Autocorr: peak=20.77025, time=46.98, delay=0.04608, d_freq=1420048828.21, chirp=0.0018485, fft_len=128k
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Best spike: peak=76.2974, time=6.711, d_freq=1420048765.17, chirp=0.00092426, fft_len=128k
Best autocorr: peak=80.85102, time=6.711, delay=0.04608, d_freq=1420048828.13, chirp=0.00092426, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=2.474142, time=24.45, period=0.003891, d_freq=1420050048.83, score=0.5972, chirp=0, fft_len=8
Best triplet: peak=0, time=-2.124e+11, period=0, d_freq=0, chirp=0, fft_len=0
Spike count: 14
Autocorr count: 16
Pulse count: 0
Triplet count: 0
Gaussian count: 0
01:28:26 (8595): called boinc_finish(0)

</stderr_txt>
]]>


then mine
7632732563

Tâche 7632732563
Nom 26ap19aa.18849.20517.10.37.5_1
Unité de travail (WU) 3452554813
Créé 27 Apr 2019, 14:11:31 UTC
Envoyé 27 Apr 2019, 19:40:53 UTC
Date limite de rapport 18 May 2019, 6:50:35 UTC
Reçu 29 Apr 2019, 16:41:05 UTC
État du serveur Sur
Résultats Succès
État du client Fait
État à la sortie 0 (0x00000000)
ID de l'ordinateur 7019416
Temps de fonctionnement 15 min 17 sec
Temps de CPU 4 min 18 sec
Valider l'état Valide
Crédit 20.80
FLOPS maximum de l'appareil 2,048.00 GFLOPS
Version de l'application SETI@home v8
Plateforme anonyme (GPU ATI)
Peak working set size 588.50 MB
Peak swap size 613.93 MB
Peak disk usage 0.02 MB
Stderr output
<core_client_version>7.8.3</core_client_version>
<![CDATA[
<stderr_txt>
CPU affinity adjustment disabled
High-performance path selected. If GUI lags occur consider to remove -high_perf option from tuning line
System timer will be set in high resolution mode
Number of period iterations for PulseFind set to:1
Target kernel sequence time set to 3400ms
Maximum single buffer size set to:512MB
TUNE: kernel 1 now has workgroup size of (64,4,1)
TUNE: kernel 2 now has workgroup size of (64,4,1)
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW AMD specific USE_SSE2 x86
CPUID: AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

Cache: L1=64K L2=1024K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
OpenCL-kernels filename : MultiBeam_Kernels_r3557.cl
ar=207.829614 NumCfft=99183 NumGauss=0 NumPulse=7627642274 NumTriplet=7627642274
Currently allocated 585 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768

Windows optimized setiathome_v8 application
Based on Intel, Core 2-optimized v8-nographics V5.13 by Alex Kan
SSE2xj Win32 Build 3557 , Ported by : Raistmer, JDWhale

SETI8 update by Raistmer

OpenCL version by Raistmer, r3557

AMD HD5 version by Raistmer

Number of OpenCL platforms: 1


OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Max compute units: 8
Max work group size: 256
Max clock frequency: 800Mhz
Max memory allocation: 603193344
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Queue properties:
Out-of-Order: No
Name: Capeverde
Vendor: Advanced Micro Devices, Inc.
Driver version: 1573.4 (VM)
Version: OpenCL 1.2 AMD-APP (1573.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event


Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 207.829614
Used GPU device parameters are:
Number of compute units: 8
Single buffer allocation size: 512MB
Total device global memory: 1024MB
max WG size: 256
local mem type: Real
LotOfMem path: yes
LowPerformanceGPU path: no
HighPerformanceGPU path: yes
period_iterations_num=1
Spike: peak=24.19674, time=13, d_freq=1420051978.83, chirp=0, fft_len=8k
Spike: peak=25.1238, time=54.53, d_freq=1420046203.14, chirp=0, fft_len=16k
Spike: peak=54.97303, time=1.678, d_freq=1420048765.24, chirp=0, fft_len=32k
Spike: peak=34.74492, time=3.355, d_freq=1420048808.6, chirp=0, fft_len=64k
Spike: peak=25.78018, time=16.78, d_freq=1420051674.54, chirp=0, fft_len=64k
Spike: peak=29.55744, time=23.49, d_freq=1420046203.14, chirp=0, fft_len=64k
Spike: peak=74.41463, time=6.711, d_freq=1420048765.17, chirp=0, fft_len=128k
Autocorr: peak=80.7021, time=6.711, delay=0.04608, d_freq=1420048828.13, chirp=0, fft_len=128k
Spike: peak=53.57516, time=20.13, d_freq=1420046203.14, chirp=0, fft_len=128k
Autocorr: peak=36.14322, time=20.13, delay=0.04608, d_freq=1420048828.13, chirp=0, fft_len=128k
Autocorr: peak=19.70778, time=33.55, delay=0.04608, d_freq=1420048828.13, chirp=0, fft_len=128k
Spike: peak=24.59947, time=46.98, d_freq=1420049546.81, chirp=0, fft_len=128k
Autocorr: peak=20.54297, time=46.98, delay=0.04608, d_freq=1420048828.13, chirp=0, fft_len=128k
Spike: peak=76.29741, time=6.711, d_freq=1420048765.17, chirp=0.00092426, fft_len=128k
Autocorr: peak=80.85107, time=6.711, delay=0.04608, d_freq=1420048828.13, chirp=0.00092426, fft_len=128k
Spike: peak=49.81207, time=20.13, d_freq=1420046203.16, chirp=0.00092426, fft_len=128k
Autocorr: peak=36.72198, time=20.13, delay=0.04608, d_freq=1420048828.14, chirp=0.00092426, fft_len=128k
Autocorr: peak=20.53431, time=33.55, delay=0.04608, d_freq=1420048828.16, chirp=0.00092426, fft_len=128k
Spike: peak=25.69483, time=46.98, d_freq=1420049112.56, chirp=0.00092426, fft_len=128k
Autocorr: peak=20.73565, time=46.98, delay=0.04608, d_freq=1420048828.17, chirp=0.00092426, fft_len=128k
Spike: peak=70.38692, time=6.711, d_freq=1420048765.16, chirp=-0.00092426, fft_len=128k
Autocorr: peak=80.42195, time=6.711, delay=0.04608, d_freq=1420048828.12, chirp=-0.00092426, fft_len=128k
Spike: peak=39.86383, time=20.13, d_freq=1420052325.93, chirp=-0.00092426, fft_len=128k
Autocorr: peak=35.71115, time=20.13, delay=0.04608, d_freq=1420048828.11, chirp=-0.00092426, fft_len=128k
Autocorr: peak=20.01221, time=33.55, delay=0.04608, d_freq=1420048828.09, chirp=-0.00092426, fft_len=128k
Autocorr: peak=20.68424, time=46.98, delay=0.04608, d_freq=1420048828.08, chirp=-0.00092426, fft_len=128k
Spike: peak=75.61421, time=6.711, d_freq=1420048765.18, chirp=0.0018485, fft_len=128k
Autocorr: peak=80.84062, time=6.711, delay=0.04608, d_freq=1420048828.14, chirp=0.0018485, fft_len=128k
Spike: peak=38.15005, time=20.13, d_freq=1420051674.58, chirp=0.0018485, fft_len=128k
Autocorr: peak=36.3369, time=20.13, delay=0.04608, d_freq=1420048828.16, chirp=0.0018485, fft_len=128k
OpenCL queue synchronized
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Best spike: peak=78.94281, time=3.355, d_freq=1420048765.16, chirp=-0.025879, fft_len=64k
Best autocorr: peak=80.85107, time=6.711, delay=0.04608, d_freq=1420048828.13, chirp=0.00092426, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+011, d_freq=0,
score=-12, null_hyp=0, chirp=0, fft_len=0
Best pulse: peak=2.474142, time=24.45, period=0.003891, d_freq=1420050048.83, score=0.5972, chirp=0, fft_len=8
Best triplet: peak=0, time=-2.124e+011, period=0, d_freq=0, chirp=0, fft_len=0


Flopcounter: 936152860.907576

Spike count: 16
Autocorr count: 14
Pulse count: 0
Triplet count: 0
Gaussian count: 0
Wallclock time elapsed since last restart: 909.3 seconds

class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0


class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0


class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0


class PC_triplet_find_hit: total=0, N=0, <>=0, min=0 max=0
class PC_triplet_find_miss: total=0, N=0, <>=0, min=0 max=0


class PC_pulse_find_hit: total=0, N=0, <>=0, min=0 max=0
class PC_pulse_find_miss: total=0, N=0, <>=0, min=0 max=0
class PC_pulse_find_early_miss: total=0, N=0, <>=0, min=0 max=0
class PC_pulse_find_2CPU: total=1, N=1, <>=1, min=1 max=1


class PoT_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class PoT_transfer_needed: total=1, N=1, <>=1, min=1 max=1

class SleepQuantum: total=0, N=0, <>=0, min=0 max=0

GPU device sync requested... ...GPU device synched
18:38:22 (3872): called boinc_finish(0)

</stderr_txt>
]]>


then third wingmen
7638624591

Tâche 7638624591
Nom 26ap19aa.18849.20517.10.37.5_2
Unité de travail (WU) 3452554813
Créé 29 Apr 2019, 16:41:29 UTC
Envoyé 29 Apr 2019, 22:20:29 UTC
Date limite de rapport 20 May 2019, 9:30:11 UTC
Reçu 30 Apr 2019, 10:26:35 UTC
État du serveur Sur
Résultats Succès
État du client Fait
État à la sortie 0 (0x00000000)
ID de l'ordinateur 8501448
Temps de fonctionnement 21 sec
Temps de CPU 19 sec
Valider l'état Valide
Crédit 20.80
FLOPS maximum de l'appareil 3.56 GFLOPS
Version de l'application SETI@home v8 v8.00
x86_64-pc-linux-gnu
Peak working set size 104.84 MB
Peak swap size 106.38 MB
Peak disk usage 0.27 MB
Stderr output
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_v8 8.00 Revision: 3290 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is : 207.829614
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_avxGetPowerSpectrum 0.000058 0.00000
avx_ChirpData_d 0.002066 0.00000
v_vTranspose4x16ntw 0.000977 0.00000
BH SSE folding 0.000228 0.00000
SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected equals the storage space allocated.

Flopcounter: 5394573636.899149

Spike count: 16
Autocorr count: 14
Pulse count: 0
Triplet count: 0
Gaussian count: 0
09:28:32 (26769): called boinc_finish(0)

</stderr_txt>
]]>

ID: 1992103 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1992106 - Posted: 1 May 2019, 10:31:15 UTC - in response to Message 1992103.  

does Cuda V0.98B1 made some errors on overflow wu ?
Not errors - all three tasks are 'similar enough' to have validated and been granted credit.

But I believe it has been acknowledged that with these short-running overflow tasks, there is some imprecision in the process of selecting the 'first' 30 signals to report (processing performed in a different order, I think), resulting in the inconclusive validation between the first two results returned.

This makes no difference to end users, but does place some additional strain on the project (servers have to create an extra task replication in the database, network has to support an additional data file download). Whether that matters depends on whether you are thinking as a user, or as a project administrator.
ID: 1992106 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1994249 - Posted: 18 May 2019, 23:30:05 UTC
Last modified: 19 May 2019, 0:00:59 UTC

Did we ever get a final fix for this issue?

Just bought myself a late birthday present, which required upgrading the video driver. Tried other suggestions for downgrading the driver, but am still stuck with 2 OpenCL lines in the BOINC event log for each video card, resulting in only one of them processing work.

19/05/2019 08:33:08 |  | Starting BOINC client version 7.6.33 for windows_x86_64
19/05/2019 08:33:08 |  | log flags: file_xfer, sched_ops, task
19/05/2019 08:33:08 |  | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
19/05/2019 08:33:08 |  | Data directory: C:\ProgramData\BOINC
19/05/2019 08:33:08 |  | Running under account USER
19/05/2019 08:33:09 |  | CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 418.81, CUDA version 10.1, compute capability 7.5, 4096MB, 3556MB available, 14054 GFLOPS peak)
19/05/2019 08:33:09 |  | CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 418.81, CUDA version 10.1, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak)
19/05/2019 08:33:09 |  | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3556MB available, 14054 GFLOPS peak)
19/05/2019 08:33:09 |  | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3556MB available, 14054 GFLOPS peak)
19/05/2019 08:33:09 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 418.81, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak)
19/05/2019 08:33:09 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 418.81, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak)
19/05/2019 08:33:09 | SETI@home | Found app_info.xml; using anonymous platform
19/05/2019 08:33:09 |  | Host name: Grant-PC
19/05/2019 08:33:09 |  | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [Family 6 Model 158 Stepping 10]
19/05/2019 08:33:09 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
19/05/2019 08:33:09 |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)
19/05/2019 08:33:09 |  | Memory: 15.95 GB physical, 18.33 GB virtual
19/05/2019 08:33:09 |  | Disk: 930.56 GB total, 850.84 GB free
19/05/2019 08:33:09 |  | Local time is UTC +9 hours
19/05/2019 08:33:09 | SETI@home | Found app_config.xml
19/05/2019 08:33:09 | SETI@home Beta Test | Found app_config.xml
19/05/2019 08:33:09 |  | Config: use all coprocessors


Even with the oldest driver I can use, the coproc_info.xml gets overwritten with the extra entries, so I've used BeemerBiker's workaround for now (many thanks for that BTW).


Interestingly, checking out the results of work processed on my account, the new results show double entries for the OpenCL as well.

eg
SETI8 update by Raistmer

OpenCL version by Raistmer, r3557

Number of OpenCL platforms:				 2


 OpenCL Platform Name:					 NVIDIA CUDA
Number of devices:				 2
  Max compute units:				 30
  Max work group size:				 1024
  Max clock frequency:				 1830Mhz
  Max memory allocation:			 1610612736
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 491520
  Global memory size:				 6442450944
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce RTX 2060
  Vendor:					 NVIDIA Corporation
  Driver version:				 418.81
  Version:					 OpenCL 1.2 CUDA
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer
  Max compute units:				 15
  Max work group size:				 1024
  Max clock frequency:				 1784Mhz
  Max memory allocation:			 2147483648
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 245760
  Global memory size:				 8589934592
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce GTX 1070
  Vendor:					 NVIDIA Corporation
  Driver version:				 418.81
  Version:					 OpenCL 1.2 CUDA
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer


 OpenCL Platform Name:					 NVIDIA CUDA
Number of devices:				 2
  Max compute units:				 30
  Max work group size:				 1024
  Max clock frequency:				 1830Mhz
  Max memory allocation:			 1610612736
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 491520
  Global memory size:				 6442450944
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce RTX 2060
  Vendor:					 NVIDIA Corporation
  Driver version:				 418.81
  Version:					 OpenCL 1.2 CUDA
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer
  Max compute units:				 15
  Max work group size:				 1024
  Max clock frequency:				 1784Mhz
  Max memory allocation:			 2147483648
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 245760
  Global memory size:				 8589934592
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce GTX 1070
  Vendor:					 NVIDIA Corporation
  Driver version:				 418.81
  Version:					 OpenCL 1.2 CUDA
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer


Work Unit Info:

Grant
Darwin NT
ID: 1994249 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1994250 - Posted: 18 May 2019, 23:49:36 UTC - in response to Message 1989298.  

In the previous thread, Juha suggested that you inspect HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors, but I can't see any reply to that particular question. When I look at that key on my machine here, I see

[HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors]
"IntelOpenCL64.dll"=dword:00000000
"C:\\Windows\\System32\\nvopencl.dll"=dword:00000000
showing how two OpenCL libraries can co-exist. It would be worth checking that, since you seem to have found a workround for the problem but not yet isolated the root cause.


On my system
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors
C:\WINDOWS\System32\DriverStore\FileRepository\igdlh64.inf_amd64_9929e26743d53831\IntelOpenCL64.dll

Grant
Darwin NT
ID: 1994250 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : have more GPUs than actually exist


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.