OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1794998 - Posted: 10 Jun 2016, 8:24:16 UTC - in response to Message 1794995.  

Actually, the error message is on line 232 of the current file. Are we using an outdated version of seti_header.cpp?

Apart from Jason's provision for Android in 2014 (r2181), most of the file size growth was Eric's r3113 and r3212 for GBT - and the file does "Write a SETI work unit header to a file".

Could the written file be missing some information, if we use an old version?
ID: 1794998 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1794999 - Posted: 10 Jun 2016, 8:27:01 UTC - in response to Message 1794998.  
Last modified: 10 Jun 2016, 8:27:30 UTC

Actually, the error message is on line 232 of the current file. Are we using an outdated version of seti_header.cpp?

Apart from Jason's provision for Android in 2014 (r2181), most of the file size growth was Eric's r3113 and r3212 for GBT - and the file does "Write a SETI work unit header to a file".


Not sure what missing a parameter addition would do. Is that a possibility for this build?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1794999 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1795001 - Posted: 10 Jun 2016, 8:42:14 UTC
Last modified: 10 Jun 2016, 8:44:43 UTC

If that is Raistmer's file he'll have build from https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8/client/seti_header.cpp

at revision 3377 changes from Urs. you'd have to diff that against vanilla seti and Jason's.

edit i.e. https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/seti_header.cpp

and https://setisvn.ssl.berkeley.edu/trac/browser/seti_boinc/client/seti_header.cpp
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1795001 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795002 - Posted: 10 Jun 2016, 8:43:02 UTC - in response to Message 1794999.  

Searching this board for previous occurences of "Bad workunit header" - which date back to 2005 - reminds me that the -6 error code was triggered by Raistmer's old "VLAR Kill" workround in 2009.

Now, that has no relationship to the current occurrence, except to suggest that interrupting any SETI app at that point in startup has the capability to trigger that exit route. We shouldn't be too literal minded and start peering at the workunit itself.
ID: 1795002 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795003 - Posted: 10 Jun 2016, 8:49:46 UTC

I think I wrote that better last time round, in message 1711911:

SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\seti_header.cpp

We see this occasionally, but only from the CUDA applications running on NVidia cards (as this task was - that host has both types of card).

"SETI@home error -nn" comes from some very old SETI code, probably dating back to Classic days: my personal suspicion is that the modern applications sometimes wander down this error-handling path for reasons quite different from the ostensible fault reported - but they happen so rarely that the true cause is very hard to track down. There's a SETI@home error -1 "can't create file --disk full?" which similarly crops up sometimes in contexts which imply something completely different to do with autocorrelations.

I think the best thing to do is to ignore the specific error message, but take it as a general indication that the machine is feeling poorly.
ID: 1795003 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1795004 - Posted: 10 Jun 2016, 8:54:20 UTC - in response to Message 1795002.  

It looks like we have a <no_opencl/> "KILL" in the latest client via a cc_config.xml flag. That ought to make a lot of people happy. Looks like a LOT of changes since the last stable release.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1795004 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1795005 - Posted: 10 Jun 2016, 9:00:58 UTC

well that was a dead end. Raistmer's last changes (before the extra lines for linux) exactly match Eric's and I've not had enough coffee to certify that Jason's internal switching matches that [@Jason rev. 3315 line 125 I assume you moved that into the temp variable at lines 227 ff. ?]

So, unless it's reproducible - see Richard's comments on odd errors [or is that edd errors? ]
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1795005 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795006 - Posted: 10 Jun 2016, 9:03:16 UTC - in response to Message 1795004.  

It looks like we have a <no_opencl/> "KILL" in the latest client via a cc_config.xml flag. That ought to make a lot of people happy. Looks like a LOT of changes since the last stable release.

ROFL - I'd forgotten that. That was Jacob Klein's request for a way of preventing OpenCL applications from running, during a temporary period when an NVidia driver bug caused errors at PrimeGrid and POEM (only, so far as we know, and the bug is fixed at source now). I did suggest it was overkill, or ought to be matched with a <no_cuda/> tag as well - but since it exists now, there's nothing to stop people testing it.
ID: 1795006 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795012 - Posted: 10 Jun 2016, 10:08:52 UTC - in response to Message 1794984.  

Just got two bad work units with missing header information it looks like.
4294967290 (0xfffffffa) Unknown exit code

These work units:
Task 4975612198
Task 4975612087

Worth to run scandisk on affected partition.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795012 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1795013 - Posted: 10 Jun 2016, 10:28:34 UTC - in response to Message 1795012.  

Just got two bad work units with missing header information it looks like.
4294967290 (0xfffffffa) Unknown exit code

These work units:
Task 4975612198
Task 4975612087

Worth to run scandisk on affected partition.

Well I assumed that anybody who runs into an error that may be due to file damage does first check the integrity of the hard disc before taking it any further.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1795013 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795014 - Posted: 10 Jun 2016, 10:31:05 UTC - in response to Message 1795013.  

Just got two bad work units with missing header information it looks like.
4294967290 (0xfffffffa) Unknown exit code

These work units:
Task 4975612198
Task 4975612087

Worth to run scandisk on affected partition.

Well I assumed that anybody who runs into an error that may be due to file damage does first check the integrity of the hard disc before taking it any further.

Especially after a BSOD.
ID: 1795014 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1795151 - Posted: 10 Jun 2016, 17:33:02 UTC - in response to Message 1795014.  
Last modified: 10 Jun 2016, 17:36:16 UTC

Correct assumption. I had run a full AV scan earlier in the day since I had moved to different software recently and needed to find out how it handled it. Also, had run integrity checks on the partition and done a /scannow. Nothing showed up. I just think I caught the manager and client in a bad spot at just the right time during program exit. I haven't had a BSOD on either of my crunchers in over a year if I remember correctly. Very solid machines. Or just crack the BSOD up to a case of cosmic ray impacts as Joe or Jason would suggest.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1795151 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1795183 - Posted: 10 Jun 2016, 18:41:52 UTC

What exactly makes a SOG a SOG?

Or what is the difference between:

MB8_win_x86_SSE3_OpenCL_NV_r3430.exe

and

MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe?
ID: 1795183 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1795187 - Posted: 10 Jun 2016, 18:45:41 UTC - in response to Message 1795183.  

What exactly makes a SOG a SOG?

Or what is the difference between:

MB8_win_x86_SSE3_OpenCL_NV_r3430.exe

and

MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe?

That's exactly what I would like to know. Both operate within the <plan_class>opencl_nvidia_SoG</plan_class>. That is the only plan_class listed in their respective aistub files in their application 7z files.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1795187 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1795207 - Posted: 10 Jun 2016, 21:03:38 UTC - in response to Message 1795187.  
Last modified: 10 Jun 2016, 21:05:21 UTC

What exactly makes a SOG a SOG?

Or what is the difference between:

MB8_win_x86_SSE3_OpenCL_NV_r3430.exe

and

MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe?

That's exactly what I would like to know. Both operate within the <plan_class>opencl_nvidia_SoG</plan_class>. That is the only plan_class listed in their respective aistub files in their application 7z files.


From Raistmer:
SoG build is more unique case cause it's first time implemented real parallel execution of different searches completely on GPU device.
Progress marks embedded in CPU code and with high AR that CPU code does only non-blocking kernel enqueuing. This done very fast and then GPU process data w/o interaction with CPU code. So no more marks of progress going to BOINC - highly non-linear behavior on VHARs.

non_SoG does process some signals on CPU instead of GPU.


With each crime and every kindness we birth our future.
ID: 1795207 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1795211 - Posted: 10 Jun 2016, 21:18:50 UTC

Thanks, Mike.

If they are so different, they probably should have given them different revision numbers.

By not doing so indicates a strong similarity, as indicated by Keith Myers:

Both operate within the <plan_class>opencl_nvidia_SoG</plan_class>
ID: 1795211 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1795227 - Posted: 10 Jun 2016, 22:23:28 UTC - in response to Message 1795211.  

Thanks, Mike.

If they are so different, they probably should have given them different revision numbers.

By not doing so indicates a strong similarity, as indicated by Keith Myers:

Both operate within the <plan_class>opencl_nvidia_SoG</plan_class>


The revision number just makes sure all app versions have same bug fixes and improved switches included.


With each crime and every kindness we birth our future.
ID: 1795227 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795236 - Posted: 10 Jun 2016, 23:03:00 UTC - in response to Message 1795227.  

To be sure what flavor used one can look into result's stderr. It has lot of interesting info, including this line:
Build features: SETI8	Non-graphics	OpenCL	USE_OPENCL_NV	OCL_ZERO_COPY	SIGNALS_ON_GPU	OCL_CHIRP3	FFTW	USE_SSE3	x86	


SIGNALS_ON_GPU in this line means app build with SoG path enabled. So, it's SoG.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795236 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1795298 - Posted: 11 Jun 2016, 4:32:39 UTC - in response to Message 1795236.  

OK, still really confused. I've gone through and looked at my completed tasks with both of the apps I've listed in this thread. Both of them have SIGNALS_ON_GPU listed in their respective stderr.txt files. So how is it you're still saying that MB8_win_x86_SSE3_OpenCL_NV_r3430.exe is not performing analysis with signals on GPU??
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1795298 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1795303 - Posted: 11 Jun 2016, 6:30:57 UTC

Just wondering if any of the SoG stuff has hit a Lunatics build yet? Assuming no ...
ID: 1795303 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.