SETI@home v8.12 Windows GPU applications support thread

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 17 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812140 - Posted: 24 Aug 2016, 11:29:55 UTC
Last modified: 24 Aug 2016, 11:32:45 UTC

Please keep overflowed results out of this thread until solid evidence of false positive or real signal omitting will be acquired.
I strongly refuse to spend time on any discussion of partial subsets reported.

FYI those enthusiasts who still don't know what "overflow" is:
SETI@Home Informational message -9 result_overflow in stderr means overflow.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812140 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812142 - Posted: 24 Aug 2016, 11:35:33 UTC - in response to Message 1812102.  


The interesting thing about this one is that NV SoG r3500 was invalid, while ATi SoG r3430 was valid.

What exact interesting here? The evidence that GPU is multiprocessor device w/o strong ordering indeed? This fact represented in any GPGPU review.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812142 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812144 - Posted: 24 Aug 2016, 11:48:12 UTC
Last modified: 24 Aug 2016, 11:51:08 UTC

How to acquire meaningful data from suspicious overflow:
1) download corresponding task for offline testing
2) edit corresponding task for signal restriction removal/increase (big increase!)
3) re-run edited task with reference CPU app
4) compare all results from that run with reported subset by app under investigation.
5) report if there are false positives, with considerable excess power over threshold in that subset (that is, no match for particular subset result versus full list of reported signals for task).

Now details:
3*, 5* :
<analysis_cfg>
<spike_thresh>24</spike_thresh>
<spikes_per_spectrum>1</spikes_per_spectrum>
<autocorr_thresh>17.8</autocorr_thresh>
<autocorr_per_spectrum>1</autocorr_per_spectrum>
<autocorr_fftlen>131072</autocorr_fftlen>
<gauss_null_chi_sq_thresh>2.43685937</gauss_null_chi_sq_thresh>
<gauss_chi_sq_thresh>1.41999996</gauss_chi_sq_thresh>
<gauss_power_thresh>3</gauss_power_thresh>
<gauss_peak_power_thresh>3.20000005</gauss_peak_power_thresh>
<gauss_pot_length>64</gauss_pot_length>
<pulse_thresh>19.7340908</pulse_thresh>
<pulse_display_thresh>0.5</pulse_display_thresh>
<pulse_max>40960</pulse_max>
<pulse_min>16</pulse_min>
<pulse_fft_max>8192</pulse_fft_max>
<pulse_pot_length>256</pulse_pot_length>
<triplet_thresh>9.73841</triplet_thresh>
<triplet_max>131072</triplet_max>
<triplet_min>16</triplet_min>
<triplet_pot_length>256</triplet_pot_length>
<pot_overlap_factor>0.5</pot_overlap_factor>
<pot_t_offset>1</pot_t_offset>
<pot_min_slew>0.00209999993</pot_min_slew>
<pot_max_slew>0.0104999999</pot_max_slew>
<chirp_resolution>0.333</chirp_resolution>
<analysis_fft_lengths>262136</analysis_fft_lengths>
<bsmooth_boxcar_length>8192</bsmooth_boxcar_length>
<bsmooth_chunk_size>32768</bsmooth_chunk_size>
<chirps>
<chirp_parameter_t>
<chirp_limit>3</chirp_limit>
<fft_len_flags>262136</fft_len_flags>
</chirp_parameter_t>
<chirp_parameter_t>
<chirp_limit>10</chirp_limit>
<fft_len_flags>65528</fft_len_flags>
</chirp_parameter_t>
</chirps>
<pulse_beams>1</pulse_beams>
<max_signals>30</max_signals>
<max_spikes>8</max_spikes>
<max_gaussians>0</max_gaussians>
<max_pulses>0</max_pulses>
<max_triplets>0</max_triplets>
<keyuniq>-7344129</keyuniq>
<credit_rate>2.8499999</credit_rate>
</analysis_cfg>

Points of interest in bold.

Also, my builds have extended ability regarding signals info in stderr.
Per ReadMe:
Levels from 2 to 5 reserved for increasing verbosity, higher levels reserved for specific usage.
-v 2 enables all signals output.

So, -v 2 will allow to follow "bests" formation through whole task processing.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812144 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1812148 - Posted: 24 Aug 2016, 12:03:01 UTC

In my understanding interesting part is -tt 60 in this case (overflow) not revision number.
Because of longer kernel time signal logging is different because kernel now 60ms instead of 15ms.
One of the reasons we think validator needs adjustment for overflows.


With each crime and every kindness we birth our future.
ID: 1812148 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812153 - Posted: 24 Aug 2016, 12:06:27 UTC - in response to Message 1812148.  

In my understanding interesting part is -tt 60 in this case (overflow) not revision number.
Because of longer kernel time signal logging is different because kernel now 60ms instead of 15ms.
One of the reasons we think validator needs adjustment for overflows.


And, of course, "ms" is the time measurement unit from real world. Amount of work different GPU models can do during similar time intervals is different.
As I said there is no sense to compare subsets!
Acquire full set first then do comparison. Or just waste of time occurs.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812153 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1812169 - Posted: 24 Aug 2016, 12:22:55 UTC - in response to Message 1812153.  

Sure - we tried to obtain the data file for that example, but it had already been deleted from the server before the invalid result for NV_SoG was drawn to our attention. That's why I'm placing the emphasis on identifying the WUs at the earlier inconclusive stage, when the data may be held locally, and can certainly still be retrieved from the server.

I'm probably processing over 500 guppies a day currently - possibly well over. I can't possibly check every one, which is why I'm trying to filter out the potentially interesting anomalies, in the hope that they can contribute to even better first-time validation rates in the future.
ID: 1812169 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1812766 - Posted: 26 Aug 2016, 13:42:28 UTC

Happened to visit a computer I don't normally watch closely, and found that this had just happened:

26/08/2016 14:04:06 | SETI@home | Task postponed: Suspicious pulse results, host needs reboot or maintenance

- only symptom was that the oldest r3500 task was 'waiting to run', and the second-oldest was running instead.

Anyway, the task restarted normally, and validated at the first attempt - even against a stock apple-darwin. WU 2246185168

The only clue in my stderr is the repeated lines starting with

Priority of worker thread raised successfully

(again) and later

Restarted at 41.25 percent.

Would it be a good idea to log these interruptions, with the data found to be 'suspicious'? And I second the suggestion a few days ago of pushing the task name into the Event Log - I actually had seven guppies running at the time, so it wouldn't have been easy to track down the offender, if I hadn't happened to notice the task waiting. Anyway, I've got both data and result, if they're worth looking at.
ID: 1812766 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812802 - Posted: 26 Aug 2016, 17:27:00 UTC - in response to Message 1812766.  

I would say that "offender" not the task but device that processes it.
Of course, if there is multy-device host adding what device triggered task restart would be good. Though this info can be devised from stderr already if needed.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812802 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1812871 - Posted: 27 Aug 2016, 0:55:53 UTC - in response to Message 1812802.  

I would say that "offender" not the task but device that processes it.
Of course, if there is multy-device host adding what device triggered task restart would be good. Though this info can be devised from stderr already if needed.


If so, more information is needed about what the device did wrong. I've seen that warning a few times, with no other indication of problems with the device. Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

Currently, there is no indication of which task had this warning, which makes it hard to determine which stderr to inspect.
ID: 1812871 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812933 - Posted: 27 Aug 2016, 7:11:26 UTC - in response to Message 1812871.  

I would say that "offender" not the task but device that processes it.
Of course, if there is multy-device host adding what device triggered task restart would be good. Though this info can be devised from stderr already if needed.


If so, more information is needed about what the device did wrong. I've seen that warning a few times, with no other indication of problems with the device. Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

Currently, there is no indication of which task had this warning, which makes it hard to determine which stderr to inspect.

Device did wrong computation. That's all app knows. It's up to human owner to find the reason.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812933 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1812934 - Posted: 27 Aug 2016, 7:21:55 UTC - in response to Message 1812871.  
Last modified: 27 Aug 2016, 7:24:41 UTC

Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

you had excess number of invalids also. What another indication of broken setup one need to start troubleshooting??

http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1810663
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1812934 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1813066 - Posted: 27 Aug 2016, 22:08:03 UTC - in response to Message 1812934.  

Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

you had excess number of invalids also. What another indication of broken setup one need to start troubleshooting??

http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1810663


What excess number of invalids? My task list shows one task list marked as invalid, eight days ago, and none before or since. Why should I assume that you are looking at a task list for the correct user, and why should I assume that the application can tell whether a problem is due to a specific graphics board instead of due to a problem with the way the application handles older graphics boards?
ID: 1813066 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1813067 - Posted: 27 Aug 2016, 22:11:27 UTC - in response to Message 1813066.  

Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

you had excess number of invalids also. What another indication of broken setup one need to start troubleshooting??

http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1810663


What excess number of invalids? My task list shows one task list marked as invalid, eight days ago, and none before or since. Why should I assume that you are looking at a task list for the correct user, and why should I assume that the application can tell whether a problem is due to a specific graphics board instead of due to a problem with the way the application handles older graphics boards?


I have found 3 invalids on your card.
And i am his alpha tester.


With each crime and every kindness we birth our future.
ID: 1813067 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1813122 - Posted: 28 Aug 2016, 3:01:53 UTC - in response to Message 1813067.  

Should I eliminate the warning by setting SETI@home to No new tasks, since I currently have no other indication that anything is wrong?

you had excess number of invalids also. What another indication of broken setup one need to start troubleshooting??

http://setiathome.berkeley.edu/forum_thread.php?id=79760&postid=1810663


What excess number of invalids? My task list shows one task list marked as invalid, eight days ago, and none before or since. Why should I assume that you are looking at a task list for the correct user, and why should I assume that the application can tell whether a problem is due to a specific graphics board instead of due to a problem with the way the application handles older graphics boards?


I have found 3 invalids on your card.
And i am his alpha tester.


Then why aren't the two others still listed as invalid? Were they with the stock SoG application and therefore not what I'm currently testing?

And how can you tell that they're a problem with the card instead of a problem with the way r3500 handles a GTX 560?
ID: 1813122 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1813131 - Posted: 28 Aug 2016, 4:16:43 UTC

Then why aren't the two others still listed as invalid?

Because once the tasks validate they're deleted 24hrs later. ;-)

Cheers.
ID: 1813131 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1813133 - Posted: 28 Aug 2016, 5:00:55 UTC - in response to Message 1813122.  

And how can you tell that they're a problem with the card instead of a problem with the way r3500 handles a GTX 560?

If your card is the only one that's having the issue, then the card (or the system it's in) is most likely the cause.
Grant
Darwin NT
ID: 1813133 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1813137 - Posted: 28 Aug 2016, 6:16:07 UTC - in response to Message 1813131.  

Then why aren't the two others still listed as invalid?

Because once the tasks validate they're deleted 24hrs later. ;-)

Cheers.


That's not what I am seeing. The only one of my tasks marked invalid has been marked that way for several days.

Or are you implying that some of the tasks the validator marks as valid are in fact invalid, but still deleted 24 hours later since they were marked valid?
ID: 1813137 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1813138 - Posted: 28 Aug 2016, 6:17:02 UTC - in response to Message 1813122.  


And how can you tell that they're a problem with the card instead of a problem with the way r3500 handles a GTX 560?

Try to find hosts with other GTX560 running OpenCL NV MB. And preferably - on beta (cause there results last longer). How they behave?
Did you use some tuning other than proposed in ReadMe? What tuning line?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1813138 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1813139 - Posted: 28 Aug 2016, 6:19:02 UTC - in response to Message 1813137.  
Last modified: 28 Aug 2016, 6:19:25 UTC


Or are you implying that some of the tasks the validator marks as valid are in fact invalid, but still deleted 24 hours later since they were marked valid?

When workunit validates all results (including invalids ones and computation errored ones) purged from BOINC database. Usually it happens after 24h from validation. Sometimes task can hand for much longer times but it's issue with BOINC backend in Berkeley, not rule of thumb.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1813139 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1813140 - Posted: 28 Aug 2016, 6:19:24 UTC - in response to Message 1813133.  

And how can you tell that they're a problem with the card instead of a problem with the way r3500 handles a GTX 560?

If your card is the only one that's having the issue, then the card (or the system it's in) is most likely the cause.


Where have you found tasks run on a different GTX 560 running on another host running Windows Vista so you can pin the cause to something in this host?
ID: 1813140 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 17 · Next

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.