AstroPulse v6 v6.04 (opencl_nvidia_100)


log in

Advanced search

Message boards : Number crunching : AstroPulse v6 v6.04 (opencl_nvidia_100)

1 · 2 · Next
Author Message
Roger Clark
Send message
Joined: 6 Dec 12
Posts: 5
Credit: 2,990,609
RAC: 0
United States
Message 1409423 - Posted: 29 Aug 2013, 17:30:28 UTC

I just happened to look at this running task 3136251711 and saw that it was stuck at 55.885% completion for several minutes while running on 0.1xx CPU + 1 NVIDIA GPU, when I brought up the GPU monitor the Power% was fluttering between 70 and 95% whereas other GPU tasks from SETI@home usually sit at ~80% power and burn through in about 10 minutes. I aborted the task cause I'm not sure why it was stuck there.

Anyone else seen this kind of issue?

Profile petri33
Volunteer tester
Send message
Joined: 6 Jun 02
Posts: 375
Credit: 67,039,088
RAC: 14,511
Finland
Message 1409440 - Posted: 29 Aug 2013, 17:57:02 UTC - in response to Message 1409423.
Last modified: 29 Aug 2013, 17:57:34 UTC

On GPU the 10 min tasks are Multibeam tasks.
The Astropulse tasks take about 1000-3000 seconds on GPU or more if
a) the have high blanking and thus must use CPU to do some calculations
b) you run your 3930K with hyper threading enabled (12 cores) and you have "use 100% of the processors".

The remedy - "use 50% of the processors".

--
petri33
____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1409456 - Posted: 29 Aug 2013, 18:38:11 UTC - in response to Message 1409423.
Last modified: 29 Aug 2013, 18:41:59 UTC

I just happened to look at this running task 3136251711 and saw that it was stuck at 55.885% completion for several minutes while running on 0.1xx CPU + 1 NVIDIA GPU, when I brought up the GPU monitor the Power% was fluttering between 70 and 95% whereas other GPU tasks from SETI@home usually sit at ~80% power and burn through in about 10 minutes. I aborted the task cause I'm not sure why it was stuck there.

Anyone else seen this kind of issue?


Also, while progress on MB tasks is relatively linear, i.e., the "progress bar" will continue to advance regularly throughout processing, the same is not the case with AP tasks. IMX, it isn't unusual for "freezes" in progress of upwards of 30 seconds or so on a high-end (I use 580's and 590's) GPU. I suspect this could be longer on lesser or less powerful cards. I'd say you shouldn't have aborted the WU, it most likely would have finished normally.

EDIT: Your GPU finished this task normally, so you don't have a problem with AP's, or your GPU.
____________

Roger Clark
Send message
Joined: 6 Dec 12
Posts: 5
Credit: 2,990,609
RAC: 0
United States
Message 1409480 - Posted: 29 Aug 2013, 19:08:34 UTC - in response to Message 1409456.

Thanks for the quick feedback. The previous Astropulse unit took 2258sec, this one hit the 55.xx% at 1991sec.

I'm running a GTX670 with 4GB DDR5 and 1344 CUDA cores, the i7-3930 running 50% of the processors 100% of CPU time.

Might have finished fine, guess I was just a little jumpy with a 2 week old rig that I'm "burning in" to check everything out and noticed power fluttering rather cycling... It's also the first time I'd seen a AstroPulse WU come down with GPU capabilities (others have been v6 6.01 running on the main processor)

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1409494 - Posted: 29 Aug 2013, 19:45:59 UTC - in response to Message 1409480.

Thanks for the quick feedback. The previous Astropulse unit took 2258sec, this one hit the 55.xx% at 1991sec.

I'm running a GTX670 with 4GB DDR5 and 1344 CUDA cores, the i7-3930 running 50% of the processors 100% of CPU time.

Might have finished fine, guess I was just a little jumpy with a 2 week old rig that I'm "burning in" to check everything out and noticed power fluttering rather cycling... It's also the first time I'd seen a AstroPulse WU come down with GPU capabilities (others have been v6 6.01 running on the main processor)


The completed AP was only 2.44% blanked. Most likely the aborted one had a higher blanking % and therefore would run longer. While most of my AP's complete in under an hour, some highly blanked ones will take upwards of 2-2.5 hours.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3422
Credit: 46,774,180
RAC: 21,164
Russia
Message 1409751 - Posted: 30 Aug 2013, 12:02:46 UTC - in response to Message 1409494.

It's inevitable with current app design.
Blanking uses Mersenne twister random number generator in double precision on CPU.
If someone wants to implement it on GPU welcome.

____________

Roger Clark
Send message
Joined: 6 Dec 12
Posts: 5
Credit: 2,990,609
RAC: 0
United States
Message 1409774 - Posted: 30 Aug 2013, 14:07:15 UTC - in response to Message 1409751.

I'm curious what the app design looks like, NVIDIA CUDAZone says it's got a "drop-in" library cuRAND to do the twister on GPU???

Profile petri33
Volunteer tester
Send message
Joined: 6 Jun 02
Posts: 375
Credit: 67,039,088
RAC: 14,511
Finland
Message 1409775 - Posted: 30 Aug 2013, 14:08:35 UTC - in response to Message 1409751.

How about creating an eight megabyte random file just once and storing to disk and blanking using that data on all subsequent AP tasks?
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 612
Credit: 139,938,981
RAC: 154,535
United Kingdom
Message 1409777 - Posted: 30 Aug 2013, 14:10:16 UTC - in response to Message 1409774.
Last modified: 30 Aug 2013, 14:24:04 UTC

I'm curious what the app design looks like, NVIDIA CUDAZone says it's got a "drop-in" library cuRAND to do the twister on GPU???

The Mersenne Twister in that is single-precision (i.e. float rather than double). Not all CUDA devices can handle doubles a) natively or b) efficiently.
[Edit] Although, looking at the header file, the generator can return doubles, the engine appears to be 32-bit, CURAND_RNG_PSEUDO_MTGP32. [/Edit]
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1286
Credit: 48,687,408
RAC: 112,643
United States
Message 1409784 - Posted: 30 Aug 2013, 14:30:46 UTC

It appears we are being set up for another multiweek AP outage. There is now a large number of 'tapes' and a large disparity in 'total channels to do:'. This usually means a long wait for the channel number to equalize. We could go another couple of weeks without APs since most semi-fast machines run out in a few days. Is there some reason this is being done? I would much rather see the AP outage last for just a few days verses weeks.

Is there some reason so many files are being loaded at once?
Server status page

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11601
Credit: 14,350,917
RAC: 13,477
United States
Message 1409787 - Posted: 30 Aug 2013, 14:46:45 UTC - in response to Message 1409751.

It's inevitable with current app design.
Blanking uses Mersenne twister random number generator in double precision on CPU.
If someone wants to implement it on GPU welcome.

Does that mean that an AP will use the CPU for some of its work even if the host is set to use GPU only?

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3422
Credit: 46,774,180
RAC: 21,164
Russia
Message 1409826 - Posted: 30 Aug 2013, 16:08:26 UTC - in response to Message 1409775.

How about creating an eight megabyte random file just once and storing to disk and blanking using that data on all subsequent AP tasks?

Could work if seed always the same...
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3422
Credit: 46,774,180
RAC: 21,164
Russia
Message 1409828 - Posted: 30 Aug 2013, 16:09:52 UTC - in response to Message 1409787.

It's inevitable with current app design.
Blanking uses Mersenne twister random number generator in double precision on CPU.
If someone wants to implement it on GPU welcome.

Does that mean that an AP will use the CPU for some of its work even if the host is set to use GPU only?

Surprised? CPU always used, in some or another degree. Only CPU can handle other PC devices interrupts.

____________

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11601
Credit: 14,350,917
RAC: 13,477
United States
Message 1409831 - Posted: 30 Aug 2013, 16:14:44 UTC - in response to Message 1409828.

It's inevitable with current app design.
Blanking uses Mersenne twister random number generator in double precision on CPU.
If someone wants to implement it on GPU welcome.

Does that mean that an AP will use the CPU for some of its work even if the host is set to use GPU only?

Surprised? CPU always used, in some or another degree. Only CPU can handle other PC devices interrupts.

Well I know the CPU is used a little (.04 for Seti and .2 for Einstein (or vice versa)), but I didn't know it got into it more than that. I think I'll have to turn off AP for my GPU-only machine.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3410
Credit: 20,261,955
RAC: 24,204
Sweden
Message 1409866 - Posted: 30 Aug 2013, 17:43:53 UTC - in response to Message 1409784.

It appears we are being set up for another multiweek AP outage. There is now a large number of 'tapes' and a large disparity in 'total channels to do:'. This usually means a long wait for the channel number to equalize. We could go another couple of weeks without APs since most semi-fast machines run out in a few days. Is there some reason this is being done? I would much rather see the AP outage last for just a few days verses weeks.

Is there some reason so many files are being loaded at once?
Server status page


Yeah, it's not good, to throw in so many AP files to be split at the same time. They'll just last for at best two days, and then it'll take up to 14 days before the MB files catch up, and there can be some new AP files added to the splitters.

Not good, not good at all. I do not understand why they do it in this way.
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8420
Credit: 57,439,772
RAC: 74,808
United Kingdom
Message 1409875 - Posted: 30 Aug 2013, 17:59:03 UTC

It might be something to do with the size of discs the "tapes" have been loaded onto. There appears to be about 50 tapes in a batch, which would make sense as they are using 3 and 3 tetra byte drives for transfer.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4252
Credit: 1,050,227
RAC: 244
United States
Message 1409879 - Posted: 30 Aug 2013, 18:19:12 UTC - in response to Message 1409826.

How about creating an eight megabyte random file just once and storing to disk and blanking using that data on all subsequent AP tasks?

Could work if seed always the same...

The seed is the same at the beginning of each AP WU, but not restarted for each of the 14080 passes through the input data. The generater continues, with current state saved in the state file so starting from a checkpoint will work. So rather than 8 megabytes you'd need to have ~118 Gigabytes of saved data.

The Mersenne Twister actually produces 32 bit unsigned values, which may then be converted to other numerical forms. The original version which will run on any CPU is what is used, the later SIMD version does not produce the same sequence of pseudo-random numbers although it is equally as good.

Conversion of the equally distributed Twister output to normally distributed pseudo-random numbers using the Box–Muller transform probably accounts for much of the time required for blanking.
Joe

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 612
Credit: 139,938,981
RAC: 154,535
United Kingdom
Message 1409979 - Posted: 30 Aug 2013, 23:00:24 UTC - in response to Message 1409879.
Last modified: 30 Aug 2013, 23:01:08 UTC

How about creating an eight megabyte random file just once and storing to disk and blanking using that data on all subsequent AP tasks?

Could work if seed always the same...

The seed is the same at the beginning of each AP WU, but not restarted for each of the 14080 passes through the input data. The generater continues, with current state saved in the state file so starting from a checkpoint will work. So rather than 8 megabytes you'd need to have ~118 Gigabytes of saved data.

The Mersenne Twister actually produces 32 bit unsigned values, which may then be converted to other numerical forms. The original version which will run on any CPU is what is used, the later SIMD version does not produce the same sequence of pseudo-random numbers although it is equally as good.

Conversion of the equally distributed Twister output to normally distributed pseudo-random numbers using the Box–Muller transform probably accounts for much of the time required for blanking.
Joe

That's starting to sound to me like the curand library is usable after all, especially if it's the *quality* of the PRNG that's the issue, rather than the reproducability of the sequence.
As a matter of interest (since I hope to have my Xeon Phi compiling OpenCL code in the next month...) where are the OpenCL AstroPulse sources available?
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4101
Credit: 33,140,318
RAC: 8,738
United Kingdom
Message 1409980 - Posted: 30 Aug 2013, 23:17:39 UTC - in response to Message 1409979.

As a matter of interest (since I hope to have my Xeon Phi compiling OpenCL code in the next month...) where are the OpenCL AstroPulse sources available?

https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AP

Claggy

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4252
Credit: 1,050,227
RAC: 244
United States
Message 1409996 - Posted: 31 Aug 2013, 0:53:28 UTC - in response to Message 1409979.

How about creating an eight megabyte random file just once and storing to disk and blanking using that data on all subsequent AP tasks?

Could work if seed always the same...
...
The Mersenne Twister actually produces 32 bit unsigned values, which may then be converted to other numerical forms. The original version which will run on any CPU is what is used, the later SIMD version does not produce the same sequence of pseudo-random numbers although it is equally as good.

Conversion of the equally distributed Twister output to normally distributed pseudo-random numbers using the Box–Muller transform probably accounts for much of the time required for blanking.
Joe

That's starting to sound to me like the curand library is usable after all, especially if it's the *quality* of the PRNG that's the issue, rather than the reproducability of the sequence.

Unfortunately the sequence must be exactly the same.

As a matter of interest (since I hope to have my Xeon Phi compiling OpenCL code in the next month...) where are the OpenCL AstroPulse sources available?

https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt is our repository for AP as well as MB sources.
Joe

1 · 2 · Next

Message boards : Number crunching : AstroPulse v6 v6.04 (opencl_nvidia_100)

Copyright © 2014 University of California