OpenCL NV MultiBeam v8 SoG edition for Windows

Author	Message
Harri Liljeroos Send message Joined: 29 May 99 Posts: 4892 Credit: 85,281,665 RAC: 126	Message 1794507 - Posted: 8 Jun 2016, 18:00:27 UTC - in response to Message 1794471. If "1" never shown what about app's device capabilities listing? Does it show sometime other GPU selected? Yes, sometimes it has used the device 0 and shown it correctly on both lines of stderr. Unfortunately I had to revert back to the cuda applications, too many driver and computer crashes while running SoG. I may try again after tuning becomes easier. Maybe these cards (GTX970 and GTX650 Ti) are too different to run SoG smoothly on a same computer. ID: 1794507 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1794540 - Posted: 8 Jun 2016, 20:38:52 UTC - in response to Message 1794507. Last modified: 8 Jun 2016, 20:39:59 UTC Maybe these cards (GTX970 and GTX650 Ti) are too different to run SoG smoothly on a same computer. There is special treatment possible that was developed specially for the case of very different GPUs of same vendor in single host. Look ReadMe for For device-specific settings in multi-GPU systems it's possible to override some of command-line options via application config file. Name of this config file: MultiBeam_<vendor>_config.xml where vendor can be ATi, NV or iGPU. File structure: <deviceN> <period_iterations_num>N</period_iterations_num> <spike_fft_thresh>N</spike_fft_thresh> <sbs>N</sbs> <oclfft_plan> <size>N</size> <global_radix>N</global_radix> <local_radix>N</local_radix> <workgroup_size>N</workgroup_size> <max_local_size>N</max_local_size> <localmem_banks>N</localmem_banks> <localmem_coalesce_width>N</localmem_coalesce_width> </oclfft_plan> <no_caching> </deviceN> ID: 1794540 ·

Harri Liljeroos Send message Joined: 29 May 99 Posts: 4892 Credit: 85,281,665 RAC: 126	Message 1794635 - Posted: 9 Jun 2016, 5:29:05 UTC - in response to Message 1794540. Thank you for the information. I'll keep it in mind for the next time. For now I don't have time to experiment more. ID: 1794635 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1794862 - Posted: 9 Jun 2016, 23:56:41 UTC . . Hi Raistmer, . . This is probably nothing, it is the only error so far in probably over 500 WU's but here it is. http://setiathome.berkeley.edu/result.php?resultid=4974924416 . . . Very little information in the output but I thought it might be better to add it to the database :) ID: 1794862 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1794874 - Posted: 10 Jun 2016, 0:19:15 UTC - in response to Message 1794862. http://www.ghacks.net/2015/10/16/fixing-the-application-was-unable-to-start-correctly-0xc0000018-in-windows/ SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1794874 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1794970 - Posted: 10 Jun 2016, 6:58:51 UTC 0xC0000018 STATUS_CONFLICTING_ADDRESSES {Conflicting Address Range} The specified address range conflicts with the address space. ID: 1794970 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794981 - Posted: 10 Jun 2016, 7:28:14 UTC - in response to Message 1794970. Last modified: 10 Jun 2016, 7:28:28 UTC 0xC0000018 STATUS_CONFLICTING_ADDRESSES {Conflicting Address Range} The specified address range conflicts with the address space. Probably if not repeatable, then a genuine bitflip (e.g from cosmic rays or radioactive carbon in the processor/ram). Workstation grade components with ECC memory reduce the probability of that. We've been referring to that as ' "Eddys in the spacetime continuum", "Eddie Who's Eddie?", "No Not WHo's Eddie, What's Eddie?", "What? What's Eddie doing in the spacetime continuum ?" "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794981 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1794984 - Posted: 10 Jun 2016, 7:35:27 UTC Just got two bad work units with missing header information it looks like. 4294967290 (0xfffffffa) Unknown exit code These work units: Task 4975612198 Task 4975612087 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1794984 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1794985 - Posted: 10 Jun 2016, 7:39:15 UTC - in response to Message 1794984. Just got two bad work units with missing header information it looks like. 4294967290 (0xfffffffa) Unknown exit code These work units: Task 4975612198 Task 4975612087 Both wingmates completed successfully, which suggests that the raw datafile had headers intact. ID: 1794985 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1794986 - Posted: 10 Jun 2016, 7:42:56 UTC - in response to Message 1794985. Last modified: 10 Jun 2016, 7:43:28 UTC So does that mean that computer mangled the work units just when it grabbed them for processing? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1794986 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794987 - Posted: 10 Jun 2016, 7:45:49 UTC - in response to Message 1794986. So does that mean that computer mangled the work units just when it grabbed them for processing? Many possible layers between the server and client CPU, from download through reading from disk. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794987 ·

William Volunteer tester Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0	Message 1794989 - Posted: 10 Jun 2016, 7:52:57 UTC - in response to Message 1794987. So does that mean that computer mangled the work units just when it grabbed them for processing? Many possible layers between the server and client CPU, from download through reading from disk. there's an MD5 check after DL isn't there? And/or a size check? A person who won't read has no advantage over one who can't read. (Mark Twain) ID: 1794989 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1794990 - Posted: 10 Jun 2016, 7:53:26 UTC - in response to Message 1794987. So does that mean that computer mangled the work units just when it grabbed them for processing? Many possible layers between the server and client CPU, from download through reading from disk. And we have seen that error message before, in other applications including CUDA, with no conclusive evidence that the data file has suffered any corruption at all. It seemed (IIRC) to be more prevalent on task restarts than initial runs. I think that the code generating that error message dates from the original Berkeley CPU code: checking that for trigger points might give us a better handle on what's really happening under the hood. ID: 1794990 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794991 - Posted: 10 Jun 2016, 7:57:36 UTC - in response to Message 1794990. Last modified: 10 Jun 2016, 7:58:35 UTC So does that mean that computer mangled the work units just when it grabbed them for processing? Many possible layers between the server and client CPU, from download through reading from disk. And we have seen that error message before, in other applications including CUDA, with no conclusive evidence that the data file has suffered any corruption at all. It seemed (IIRC) to be more prevalent on task restarts than initial runs. I think that the code generating that error message dates from the original Berkeley CPU code: checking that for trigger points might give us a better handle on what's really happening under the hood. Any prevalence more common than about once every 3 months on a given host, would indicate either a configuration, system or indeed client or application issue. Less frequently than that on sub-workstation grade componentry indicates noise (radiation). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794991 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1794992 - Posted: 10 Jun 2016, 7:58:08 UTC - in response to Message 1794987. I actually think it was the BOINC shutdown that froze on exit and then blue-screened the computer that did it. Strange thing is that I always wait till a quiescent period in BOINC activity before I initiate a shutdown. That means no work units are close to finishing, all recently completed work units have successfully uploaded and BOINC is not close to asking for network communication. Only when all those cases are met do I shutdown the Manager and close the client. I can only conclude that BOINC was reading those tasks when the computer blue-screened. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1794992 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794993 - Posted: 10 Jun 2016, 7:59:30 UTC - in response to Message 1794992. Last modified: 10 Jun 2016, 8:00:22 UTC I actually think it was the BOINC shutdown that froze on exit and then blue-screened the computer that did it. Strange thing is that I always wait till a quiescent period in BOINC activity before I initiate a shutdown. That means no work units are close to finishing, all recently completed work units have successfully uploaded and BOINC is not close to asking for network communication. Only when all those cases are met do I shutdown the Manager and close the client. I can only conclude that BOINC was reading those tasks when the computer blue-screened. I'd class that as possibly reproducible [Rather than Eddy/Eddie]. Can you try that ? (could take substantial hammering :) ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794993 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1794994 - Posted: 10 Jun 2016, 8:08:20 UTC - in response to Message 1794993. Last modified: 10 Jun 2016, 8:08:52 UTC At least the error message gives us a file name and line number: !swi.data_type \|\| !found \|\| !swi.nsamples File: ..\seti_header.cpp Line: 216 // Allow old style headers to be parsed correctly. // jeffc - need this? //swi.fft_len=2048; //swi.ifft_len=8; do { fgets(buf, 256, f); } while (!feof(f) && !xml_match_tag(buf,"<workunit_header")) ; Looks like we've dropped through to some legacy code - perhaps we should have branched to a more modern path higher up? ID: 1794994 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 1794995 - Posted: 10 Jun 2016, 8:12:30 UTC Actually, the error message is on line 232 of the current file. Are we using an outdated version of seti_header.cpp? ID: 1794995 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1794996 - Posted: 10 Jun 2016, 8:13:14 UTC - in response to Message 1794993. Last modified: 10 Jun 2016, 8:16:44 UTC I believe that type of failure and exit status is the first I've experienced. I am running the latest beta BOINC Manager 7.6.29(x64) which I believe has had some code changed recently to fix Manager exits compared to the last stable release 7.6.22(x64). Richard probably could say just what the code jockeys played with in the latest beta. [Edit] Looks like my copy of the beta is not the latest now. We're up to 7.6.33(x64) Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1794996 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794997 - Posted: 10 Jun 2016, 8:20:22 UTC - in response to Message 1794996. I believe that type of failure and exit status is the first I've experienced. I am running the latest beta BOINC Manager 7.6.29(x64) which I believe has had some code changed recently to fix Manager exits compared to the last stable release 7.6.22(x64). Richard probably could say just what the code jockeys played with in the latest beta. [Edit] Looks like my copy of the beta is not the latest now. We're up to 7.6.33(x64) Yeah, not a 'normal' situation IMO. Would need to be reproducible on demand to localise better. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794997 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.