Changing Monitor/Card Arrangement in Linux Results in AP Computation Error

Message boards : AstroPulse : Changing Monitor/Card Arrangement in Linux Results in AP Computation Error
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51202 - Posted: 17 Jun 2014, 13:22:26 UTC
Last modified: 17 Jun 2014, 13:45:11 UTC

I have Two Linux(Ubuntu 12.04.4) Hosts that have suffered AP Computation Errors after changing Cards or choosing a different Main Monitor. I'm working on a new install on one host, the second host developed a similar problem after changing to a different card and then selecting a different main monitor. The Error is;
SIGSEGV: segmentation violation
Stack trace (23 frames):
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64(boinc_catch_signal+0x4d)[0x4b1a2d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f73941e1cb0]
/usr/lib/libamdocl64.so(+0x55d421)[0x7f7391438421]
/usr/lib/libamdocl64.so(+0x4e0753)[0x7f73913bb753]
/usr/lib/libamdocl64.so(+0x4e0afd)[0x7f73913bbafd]
/usr/lib/libamdocl64.so(+0x4a3b0c)[0x7f739137eb0c]
/usr/lib/libamdocl64.so(+0x4a3c6d)[0x7f739137ec6d]
/usr/lib/libamdocl64.so(+0x4a540b)[0x7f739138040b]
/usr/lib/libamdocl64.so(+0x497710)[0x7f7391372710]
/usr/lib/libamdocl64.so(+0x46ee64)[0x7f7391349e64]
/usr/lib/libamdocl64.so(+0x46f058)[0x7f739134a058]
/usr/lib/libamdocl64.so(+0x470722)[0x7f739134b722]
/usr/lib/libamdocl64.so(+0x470d39)[0x7f739134bd39]
/usr/lib/libamdocl64.so(+0x434610)[0x7f739130f610]
/usr/lib/libamdocl64.so(+0x4347c6)[0x7f739130f7c6]
/usr/lib/libamdocl64.so(+0x42ca5b)[0x7f7391307a5b]
/usr/lib/libamdocl64.so(clEnqueueNDRangeKernel+0x3b3)[0x7f73912e34f3]
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64[0x47852f]
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64[0x47f5be]
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64[0x467817]
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64[0x468f4e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f739393176d]
../../projects/setiathome.berkeley.edu/ap_6.07r1844_sse2_clATI_linux64[0x40baa9]

Exiting...

The first host received the error immediately when the second task launched, the second host appears to receive it upon restarting BOINC and also on the second card. I have tried reinstalling drivers(13.12), etc. I tried deleting the /etc/X11/xorg.conf file and then matching the new file with /etc/X11/xorg.conf.fglrx-0. Spent too much time in aticonfig as well. The problem still exists. The current xorg.conf.fglrx is;
Section "ServerLayout"
	Identifier     "aticonfig Layout"
	Screen      0  "amdcccle-Screen[1]-0" 0 0
	Screen         "aticonfig-Screen[0]-0" 1280 222
EndSection

Section "Module"
EndSection

Section "Monitor"
	Identifier   "aticonfig-Monitor[0]-0"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
EndSection

Section "Monitor"
	Identifier   "0-CRT1"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
	Option	    "PreferredMode" "1280x1024"
	Option	    "TargetRefresh" "60"
	Option	    "Position" "0 0"
	Option	    "Rotate" "normal"
	Option	    "Disable" "false"
EndSection

Section "Monitor"
	Identifier   "1-CRT1"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
	Option	    "PreferredMode" "800x600"
	Option	    "TargetRefresh" "60"
	Option	    "Position" "0 0"
	Option	    "Rotate" "normal"
	Option	    "Disable" "false"
EndSection

Section "Monitor"
	Identifier   "1-CRT1"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
	Option	    "PreferredMode" "800x600"
	Option	    "TargetRefresh" "60"
	Option	    "Position" "0 0"
	Option	    "Rotate" "normal"
	Option	    "Disable" "false"
EndSection

Section "Device"
	Identifier  "amdcccle-Device[1]-0"
	Driver      "fglrx"
	Option	    "Monitor-CRT1" "0-CRT1"
	BusID       "PCI:1:0:0"
EndSection

Section "Device"
	Identifier  "aticonfig-Device[8]-0"
	Driver      "fglrx"
	Option	    "Monitor-CRT1" "1-CRT1"
	BusID       "PCI:8:0:0"
EndSection

Section "Screen"
	Identifier "amdcccle-Screen[1]-0"
	Device     "amdcccle-Device[1]-0"
	DefaultDepth     24
	SubSection "Display"
		Viewport   0 0
		Depth     24
	EndSubSection
EndSection

Section "Screen"
	Identifier "amdcccle-Screen[8]-0"
	Device     "amdcccle-Device[8]-0"
	DefaultDepth     24
	SubSection "Display"
		Viewport   0 0
		Depth     24
	EndSubSection
EndSection

The current startup is;
Tue 17 Jun 2014 08:46:06 AM EDT |  | Starting BOINC client version 7.2.39 for x86_64-pc-linux-gnu
Tue 17 Jun 2014 08:46:06 AM EDT |  | log flags: file_xfer, sched_ops, task
Tue 17 Jun 2014 08:46:06 AM EDT |  | Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Tue 17 Jun 2014 08:46:06 AM EDT |  | Data directory: /home/tbar/BOINC
Tue 17 Jun 2014 08:46:06 AM EDT |  | OpenCL: AMD/ATI GPU 0: Capeverde (driver version 1348.5 (VM), device version OpenCL 1.2 AMD-APP (1348.5), 1806MB, 1806MB available, 531 GFLOPS peak)
Tue 17 Jun 2014 08:46:06 AM EDT |  | OpenCL: AMD/ATI GPU 1: Juniper (driver version 1348.5, device version OpenCL 1.2 AMD-APP (1348.5), 512MB, 512MB available, 720 GFLOPS peak)
Tue 17 Jun 2014 08:46:06 AM EDT |  | OpenCL CPU: Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.5 (sse2), device version OpenCL 1.2 AMD-APP (1348.5))
Tue 17 Jun 2014 08:46:06 AM EDT | SETI@home | Found app_info.xml; using anonymous platform
Tue 17 Jun 2014 08:46:06 AM EDT |  | Host name: TBar-iSETI
Tue 17 Jun 2014 08:46:06 AM EDT |  | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz [Family 6 Model 23 Stepping 10]
Tue 17 Jun 2014 08:46:06 AM EDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
Tue 17 Jun 2014 08:46:06 AM EDT |  | OS: Linux: 3.2.0-64-generic
Tue 17 Jun 2014 08:46:06 AM EDT |  | Memory: 1.96 GB physical, 3.01 GB virtual
Tue 17 Jun 2014 08:46:06 AM EDT |  | Disk: 68.82 GB total, 58.81 GB free
Tue 17 Jun 2014 08:46:06 AM EDT |  | Local time is UTC -4 hours
Tue 17 Jun 2014 08:46:06 AM EDT | SETI@home | Found app_config.xml
Tue 17 Jun 2014 08:46:06 AM EDT |  | Config: run apps at regular priority
Tue 17 Jun 2014 08:46:06 AM EDT |  | Config: use all coprocessors

Note BOINC is only seeing Half the Ram on the 6770 just as it only sees Half on my 6850s & 6870.

Any Ideas? I might have the first host running later today with a new Ubuntu system with the same BOINC folder in my Home folder.
ID: 51202 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51207 - Posted: 17 Jun 2014, 20:23:01 UTC - in response to Message 51202.  
Last modified: 17 Jun 2014, 20:39:33 UTC

Try this in your "xorg.conf" if a single Monitor is connected to "aticonfig-Device[1]-0" :
Section "ServerLayout"
Identifier "aticonfig Layout"
Screen 0 "aticonfig-Screen[1]-0" 0 0
Screen "aticonfig-Screen[0]-0" RightOf "aticonfig-Screen[1]-0"
EndSection

Section "Monitor"
Identifier "aticonfig-Monitor[0]-0"
Option "VendorName" "ATI Proprietary Driver"
Option "ModelName" "Generic Autodetecting Monitor"
Option "DPMS" "true"
EndSection

Section "Device"
Identifier "aticonfig-Device[1]-0"
Driver "fglrx"
# Option "aticonfig-Monitor[0]-0"
BusID "PCI:1:0:0"
EndSection

Section "Device"
Identifier "aticonfig-Device[0]-0"
Driver "fglrx"
# Option "aticonfig-Monitor[0]-0"
BusID "PCI:8:0:0"
EndSection

Section "Screen"
Identifier "aticonfig-Screen[1]-0"
Device "aticonfig-Device[1]-0"
Monitor "aticonfig-Monitor[0]-0"
DefaultDepth 24
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection

Section "Screen"
Identifier "aticonfig-Screen[0]-0"
Device "aticonfig-Device[0]-0"
Monitor "aticonfig-Monitor[0]-0"
EndSection


You'll need to reboot after changing the xorg.conf to see the effect.

Try to run "clinfo" from commandline/in a terminal to check if your GPUs are detected correct.
example: In clinfo-output 'Device Topology: PCI[ B#1, D#0, F#0 ]' equals 'BusID "PCI:1:0:0" ' in xorg.conf

Once your xorg.conf is working you do not need to change or recreate it if you switch to a newer driver. (Cat 13.12 i remember to be very troublesome, recommending 14.4 now!)
_\|/_
U r s
ID: 51207 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51208 - Posted: 18 Jun 2014, 4:31:57 UTC - in response to Message 51207.  
Last modified: 18 Jun 2014, 4:52:15 UTC

Looks like the host suffered another couple of Errors while I was away. I suppose I'm lucky it didn't Error out all the remaining GPU APs...

I changed xorg.conf as per your example and deleted the extra monitor from xorg.conf.fglrx-0. After restart the xorg.conf file reads;
Section "ServerLayout"
	Identifier     "aticonfig Layout"
	Screen      0  "aticonfig-Screen[1]-0" 0 0
	Screen         "aticonfig-Screen[0]-0" 1280 206
EndSection

Section "Monitor"
	Identifier   "aticonfig-Monitor[0]-0"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
EndSection

Section "Monitor"
	Identifier   "0-CRT1"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
	Option	    "PreferredMode" "1280x1024"
	Option	    "TargetRefresh" "60"
	Option	    "Position" "0 0"
	Option	    "Rotate" "normal"
	Option	    "Disable" "false"
EndSection

Section "Monitor"
	Identifier   "1-CRT1"
	Option	    "VendorName" "ATI Proprietary Driver"
	Option	    "ModelName" "Generic Autodetecting Monitor"
	Option	    "DPMS" "true"
	Option	    "PreferredMode" "800x600"
	Option	    "TargetRefresh" "60"
	Option	    "Position" "0 0"
	Option	    "Rotate" "normal"
	Option	    "Disable" "false"
EndSection

Section "Device"

# Option            "aticonfig-Monitor[0]-0"
	Identifier  "aticonfig-Device[1]-0"
	Driver      "fglrx"
	Option	    "Monitor-CRT1" "0-CRT1"
	BusID       "PCI:1:0:0"
EndSection

Section "Device"

# Option            "aticonfig-Monitor[0]-0"
	Identifier  "aticonfig-Device[0]-0"
	Driver      "fglrx"
	Option	    "Monitor-CRT1" "1-CRT1"
	BusID       "PCI:8:0:0"
EndSection

Section "Screen"
	Identifier "aticonfig-Screen[1]-0"
	Device     "aticonfig-Device[1]-0"
	DefaultDepth     24
	SubSection "Display"
		Viewport   0 0
		Depth     24
	EndSubSection
EndSection

Section "Screen"
	Identifier "aticonfig-Screen[0]-0"
	Device     "aticonfig-Device[0]-0"
EndSection

It started OK, I let it run for about 30 minutes then restarted BOINC. It restarted fine. I'm going to let it run for a while and see how it does. I think I've tried Catalyst 14.4 on one of the hosts and it failed while compiling the Kernel. I think I'll wait on that. The clinfo output seems to only show Half the Ram also;
Device Type:					 CL_DEVICE_TYPE_GPU
  Device ID:					 4098
  Board name:					 AMD Radeon HD 6700 Series  
  Device Topology:				 PCI[ B#8, D#0, F#0 ]
  Max compute units:				 10
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 0
  Max clock frequency:				 900Mhz
  Address bits:					 32
  Max memory allocation:			 134217728
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x00007f306ee03380
  Name:						 Juniper
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 1348.5
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (1348.5)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only

I'll try the new Driver in a day or so...
ID: 51208 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51214 - Posted: 18 Jun 2014, 10:39:25 UTC
Last modified: 18 Jun 2014, 10:57:42 UTC

It's still giving the errors on Device/Card 1, the 6770. It finished a task, gave 2 errors in a row, then successfully started the 3rd task;

rev 1844
06:03:31 (2767): called boinc_finish

</stderr_txt>
]]>
</stderr_out>
    <ready_to_report/>
    <completed_time>1403085820.441825</completed_time>
    <wu_name>ap_21au08ad_B2_P1_00368_20140612_30828.wu</wu_name>
....

<result>
    <name>ap_22au08af_B0_P0_00344_20140612_29248.wu_2</name>
    <final_cpu_time>1.372085</final_cpu_time>
    <final_elapsed_time>5.077867</final_elapsed_time>
    <exit_status>193</exit_status>
    <state>3</state>
    <platform>x86_64-pc-linux-gnu</platform>
    <version_num>607</version_num>
    <plan_class>opencl_ati_linux</plan_class>
<stderr_out>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Not using ap_cmdline.txt-file, using commandline options.
DATA_CHUNK_UNROLL set to:8
FFA thread block override value:8192
FFA thread fetchblock override value:4096
Maximum single buffer size set to:256MB
Running on device number: 1
....

<result>
    <name>ap_10dc08ac_B4_P1_00393_20140611_17349.wu_3</name>
    <final_cpu_time>1.620101</final_cpu_time>
    <final_elapsed_time>5.131171</final_elapsed_time>
    <exit_status>193</exit_status>
    <state>3</state>
    <platform>x86_64-pc-linux-gnu</platform>
    <version_num>607</version_num>
    <plan_class>opencl_ati_linux</plan_class>
<stderr_out>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Not using ap_cmdline.txt-file, using commandline options.
DATA_CHUNK_UNROLL set to:8
FFA thread block override value:8192
FFA thread fetchblock override value:4096
Maximum single buffer size set to:256MB
Running on device number: 1

Same as with the Mac Ubuntu host. I removed the Gigabyte 7750 and replaced it with a 6 series card. Now I get Errors...
????
ID: 51214 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51225 - Posted: 19 Jun 2014, 16:43:43 UTC
Last modified: 19 Jun 2014, 17:05:54 UTC

Still no luck. The machine works for a couple tasks then gives an Error or two on the 6770. Now it's getting Invalids as well. I switched back to Vista on that machine.

I did manage to install Catalyst 14.4 on the host. Didn't seem to help any. I guess it was the other host that couldn't install the standard 14.4, probably why I decided to reinstall Ubuntu.

After installing Cat 14.4 I tried deleting both the xorg files and have it create new ones...didn't help. Then I tried switching PCIe slots with the cards, all I got was download errors...for some reason. Restarting the machine cured the download errors. Oh, BOINC Still only sees Half the Video Ram on the 6770 even with Catalyst 14.4 & BOINC 7.2.42. Strange it will work for a couple hours then give a couple errors, then work again.

So, now I have two dead Ubuntu systems...
ID: 51225 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51228 - Posted: 19 Jun 2014, 22:03:15 UTC - in response to Message 51225.  

Sorry to hear about your problems.

Just a few more ideas/hints :
- Read my last posting again, do NOT alter xorg.conf after a driver change if things worked before. Start with the one that i have posted.
- How are your monitor(s) connected to your computer ? VGA/DVI/HDMI/DP/KVM ?
- Check /var/log/X.org.0.log for errors (EE).
- Check with clinfo that everything on OpenCL driverside is working normal.
- fglrxinfo shows info about both GPUs like you expected.
- Could GPU temps be an issue ? See lunatics download section for a little tool to watch temps on AMD GPUs (soon).
- After the first few wus on GPU have finished check that the .bin files in the projects directory are present for both GPUs (there should be a set for Juniper and a set for CapeVerde)
- After a new driver has been installed the .bin files have to be recreated, so delete the old ones as the r1844 apps did not respect driver versioning.
- Did you uninstall the old Catalyst drivers completely before installing new ones ? Especially on Ubuntu/Debian that could be sometimes difficult.
- About amount of GPU-RAM halfed : BOINC only reports what the driver tells it, so that issue should be reported to AMD not BOINC or SETI@home.

Do only one step at a time and check its effect. My KUbuntu 12.04.4 host shows no such issues, maybe i should try to simulate your problem and pair a 6000 series card with one of my CapeVerde's ?
_\|/_
U r s
ID: 51228 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51230 - Posted: 19 Jun 2014, 23:12:03 UTC - in response to Message 51228.  
Last modified: 19 Jun 2014, 23:36:09 UTC

....maybe i should try to simulate your problem and pair a 6000 series card with one of my CapeVerde's ?
If not too much trouble, I'd be interested in the results. My remaining Ubuntu 12.04.4 host was running a Gigabyte 7750 & MSI 7750. The MSI card is a low powered, slow card. The Gigabyte card is over twice as fast and due to CreditFew receives low credit when matched with the slower 7750. I thought it would be better to match the MSI card with the slower 6770 verses the Gigabyte card. So, start with two 7750s and swap one for a ~6770ish card. The other Ubuntu host was on my Mac and the last attempt at reinstall failed, and, I haven't tried it again...yet.

- Read my last posting again, do NOT alter xorg.conf after a driver change if things worked before.
I saved both files, changing between them has no effect.

- How are your monitor(s) connected to your computer ? VGA/DVI/HDMI/DP/KVM ?
I've tried various DVI/VGA connections/monitors, doesn't make any difference.

- Check /var/log/X.org.0.log for errors (EE).
I'll have to boot into Linux to check that, next time I'll check.

- Check with clinfo that everything on OpenCL driverside is working normal.
clinfo says all is well, there's a copy of the output in an earlier post.

- fglrxinfo shows info about both GPUs like you expected.
CCC & aticonfig says all is well, I've never tried fglrxinfo.

- Could GPU temps be an issue ? See lunatics download section for a little tool to watch temps on AMD GPUs (soon).
aticonfig says they run about the same as they do in Vista. At 97% load the 7750 runs around 62C, the 6770 around 67C.

- After the first few wus on GPU have finished check that the .bin files in the projects directory are present for both GPUs (there should be a set for Juniper and a set for CapeVerde)
No problem with missing files.

- After a new driver has been installed the .bin files have to be recreated, so delete the old ones as the r1844 apps did not respect driver versioning.
New .bin files have been tried numerous times.

- Did you uninstall the old Catalyst drivers completely before installing new ones ? Especially on Ubuntu/Debian that could be sometimes difficult.
I've run the uninstall script and --purge command a few times.

- About amount of GPU-RAM halfed : BOINC only reports what the driver tells it, so that issue should be reported to AMD not BOINC or SETI@home.
I had the same problem on my Mac Ubuntu host, seems most series 6 cards are only showing Half Ram in Ubuntu 12.04.4 with Cat 13.12/14.4. I also had the exact same Ubuntu BOINC Error problem on my Mac Ubuntu host. I changed from Two series 6 and one 7750 card to Three series 6 cards. The BOINC Errors were so bad it was unusable. Coincidence?
ID: 51230 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51231 - Posted: 20 Jun 2014, 0:19:50 UTC
Last modified: 20 Jun 2014, 1:11:38 UTC

For a test i only could use HD6450 or HD6670, each 1GB, guess i try the later.
ADD:
Fr 20 Jun 2014 02:39:38 CEST | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Cape Verde) (CAL version 1.4.1848, 1024MB, 909MB available, 2048 GFLOPS peak)
Fr 20 Jun 2014 02:39:38 CEST | | CAL: ATI GPU 1: AMD Radeon HD 6570/6670/7570/7670 series (Turks) (CAL version 1.4.1848, 1024MB, 1001MB available, 1536 GFLOPS peak)
Fr 20 Jun 2014 02:39:38 CEST | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Cape Verde) (driver version 1411.4 (VM), device version OpenCL 1.2 AMD-APP (1411.4), 1024MB, 909MB available, 2048 GFLOPS peak)
Fr 20 Jun 2014 02:39:38 CEST | | OpenCL: AMD/ATI GPU 1: AMD Radeon HD 6570/6670/7570/7670 series (Turks) (driver version 1411.4, device version OpenCL 1.2 AMD-APP (1411.4), 1024MB, 1001MB available, 1536 GFLOPS peak)
Popped in the Turks and it just picked up where the second CapeVerde left, built fresh binaries for Turks, but i'm currently running some newer MB apps.
After kernel updates (to your level) and when all wus have run down i can start testing if there is a problem with the released AP app and using a 7000series together with a 6000series AMD GPU. Also i will have todo a BOINC update to 7.2.42, currently 7.0.65 is running.

One issue seems to be on your 6770 solely : Amount of GPU-RAM is reported ok in BOINC (see above) and clinfo.
_\|/_
U r s
ID: 51231 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51236 - Posted: 20 Jun 2014, 16:10:07 UTC
Last modified: 20 Jun 2014, 16:10:36 UTC

Ok, nearly completed my updates and downloading some AP wus now. The driver update i moved to a later point in time. BOINC 7.2.42 shows the same amounts of GPU-RAM like in the last post for my cards.


So, which Cat14.4 driver did you try, "april 17" or "may 6" ?
Also, how did you do your monitor change/card arrangement exactly ?
Only with a change in xorg.conf ?
Or also by physically connecting the cable to another card ?
Or by switching the GPUs PCIe-slots ?
Or did you try all options, then i simply go on and also try out every option.


On review of the earlier posts in this thread i've seen that you're still missing the "Monitor" lines in the "Screen" section.
See my example...
_\|/_
U r s
ID: 51236 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51237 - Posted: 20 Jun 2014, 16:41:15 UTC - in response to Message 51236.  
Last modified: 20 Jun 2014, 16:52:45 UTC

I decided to remove the drivers from AMD and go back to the ones from Ubuntu. I followed the instructions from the 'Revert back to the open source drivers' section here, http://askubuntu.com/questions/74171/is-my-ati-graphics-card-supported-in-ubuntu, then I used the 'Additional Drivers' panel to install the only choice there.
You can go back on this host and see back when I had the XFX 6850 & XFX 6770 installed. BOTH cards were listed as having Half the Ram;
Task 3559929015
OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices:				 2
  Max compute units:				 12
  Max work group size:				 256
  Max clock frequency:				 775Mhz
  Max memory allocation:			 134217728
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Barts
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1348.5
  Version:					 OpenCL 1.2 AMD-APP (1348.5)

Max compute units:				 10
  Max work group size:				 256
  Max clock frequency:				 900Mhz
  Max memory allocation:			 134217728
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Juniper
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1348.5
  Version:					 OpenCL 1.2 AMD-APP (1348.5)

It's the same Now with the Ubuntu supplied driver;
Device Type:					 CL_DEVICE_TYPE_GPU
  Device ID:					 4098
  Board name:					 AMD Radeon HD 6700 Series 
  Device Topology:				 PCI[ B#8, D#0, F#0 ]
  Max compute units:				 10
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 0
  Max clock frequency:				 900Mhz
  Address bits:					 32
  Max memory allocation:			 134217728
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x00007f47d595c460
  Name:						 Juniper
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 1359.4
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (1359.4)

I have no idea where they came up with 1359.4. It's running now, we'll see how it goes.

Here are all Three 6800s showing Half the Ram on a different machine, Task 3557494955
OpenCL Platform Name: AMD Accelerated Parallel Processing
Number of devices:				 3
  Max compute units:				 12
  Max work group size:				 256
  Max clock frequency:				 775Mhz
  Max memory allocation:			 134217728
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Barts
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1348.5
  Version:					 OpenCL 1.2 AMD-APP (1348.5)

Max compute units:				 14
  Max work group size:				 256
  Max clock frequency:				 900Mhz
  Max memory allocation:			 134217728
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Barts
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1348.5
  Version:					 OpenCL 1.2 AMD-APP (1348.5)

Max compute units:				 12
  Max work group size:				 256
  Max clock frequency:				 775Mhz
  Max memory allocation:			 134217728
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Barts
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1348.5
  Version:					 OpenCL 1.2 AMD-APP (1348.5)
ID: 51237 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51238 - Posted: 20 Jun 2014, 17:12:21 UTC

Don't know if this guide is more actual or "precise" at least it works for me :
http://wiki.cchtml.com/index.php/Ubuntu_Precise_Installation_Guide

The first AP wus on both GPUs have finished, so far no errors.
_\|/_
U r s
ID: 51238 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51241 - Posted: 20 Jun 2014, 20:20:18 UTC

Well, it's a little better. I had 3 consecutive Valids before only 1 Computation Error. The next task started successfully. The same machine runs Perfectly Fine in Vista. This Ubuntu host worked Perfectly Fine up until I swapped out the Gigabyte 7750 for the Same 6770 that worked Perfectly Fine in the Same Ubuntu host a couple weeks ago.

What could possibly be different between the last swap and all the others that didn't bother a thing?
ID: 51241 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51242 - Posted: 20 Jun 2014, 20:42:24 UTC - in response to Message 51241.  
Last modified: 20 Jun 2014, 20:43:03 UTC

Well, it's a little better. I had 3 consecutive Valids before only 1 Computation Error. The next task started successfully. The same machine runs Perfectly Fine in Vista. This Ubuntu host worked Perfectly Fine up until I swapped out the Gigabyte 7750 for the Same 6770 that worked Perfectly Fine in the Same Ubuntu host a couple weeks ago.

What could possibly be different between the last swap and all the others that didn't bother a thing?

Maybe the GPU is no longer "Perfectly Fine". Did you try to run some stress testing on windows with it ?

I've meanwhile finished my first test and had plugged the monitor into the second GPU HD6670. The result was like expected : monitor stayed blank black, no bootscreens or GRUB menus. KUbuntu booted up normally from watching the hdd-led. Now i have plugged the monitor back and will run some more tasks before the next experiment with "xorg.conf".
_\|/_
U r s
ID: 51242 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51243 - Posted: 20 Jun 2014, 21:01:40 UTC - in response to Message 51242.  
Last modified: 20 Jun 2014, 21:14:58 UTC

Well, it's a little better. I had 3 consecutive Valids before only 1 Computation Error. The next task started successfully. The same machine runs Perfectly Fine in Vista. This Ubuntu host worked Perfectly Fine up until I swapped out the Gigabyte 7750 for the Same 6770 that worked Perfectly Fine in the Same Ubuntu host a couple weeks ago.

What could possibly be different between the last swap and all the others that didn't bother a thing?

Maybe the GPU is no longer "Perfectly Fine". Did you try to run some stress testing on windows with it ?

I've meanwhile finished my first test and had plugged the monitor into the second GPU HD6670. The result was like expected : monitor stayed blank black, no bootscreens or GRUB menus. KUbuntu booted up normally from watching the hdd-led. Now i have plugged the monitor back and will run some more tasks before the next experiment with "xorg.conf".

The Link to Vista is in the Quote. It was working in Vista up until a couple hours ago. It's the Same machine, and the Monitors work just fine. Up until a few days ago it was working Perfectly Fine in Windows 8.1, http://setiathome.berkeley.edu/results.php?hostid=6796475&offset=260&show_names=0&state=0&appid=12. There is nothing wrong with the card or the machine. Obviously, the problem is somewhere within Ubuntu or BOINC.
ID: 51243 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51248 - Posted: 21 Jun 2014, 11:50:11 UTC - in response to Message 51242.  

Look at that. Since the one Error on the 4th task, it hasn't had anymore problems since going to driver 1359.4. Except it still thinks my 6770 has half the ram. So, what is different between 1348.5, 1445.5, and 1359.4, other than the first two don't like my current two cards. You can go back and see where this host has used 1348.5 with a 6850 & 6770, 6850 & 7750, 7750 & 7750, and finally the 7750 & 6770. There wasn't a problem with the other sets of cards. I see you aren't using 1348.5 or 1445.5. Where did you find 1411.4? I might try 1348.5 again in a day or so just to see what happens. I wonder what would happen if you tried 1348.5 or 1445.5 from AMD on those cards. Of course, it still wouldn't be the same as my XFX 6770 and MSI 7750, but it would be closer to what I was using.

It all started when I decided to put a third card in my Mac...
ID: 51248 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51252 - Posted: 22 Jun 2014, 20:47:11 UTC - in response to Message 51248.  

I've just switched the GPUs physically, same driver and xorg.conf :
So 22 Jun 2014 22:35:14 CEST | | CAL: ATI GPU 0: AMD Radeon HD 6570/6670/7570/7670 series (Turks) (CAL version 1.4.1848, 1024MB, 992MB available, 1536 GFLOPS peak)
So 22 Jun 2014 22:35:14 CEST | | CAL: ATI GPU 1: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 1024MB, 918MB available, 2048 GFLOPS peak)
So 22 Jun 2014 22:35:14 CEST | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 6570/6670/7570/7670 series (Turks) (driver version 1411.4, device version OpenCL 1.2 AMD-APP (1411.4), 1024MB, 992MB available, 1536 GFLOPS peak)
So 22 Jun 2014 22:35:14 CEST | | OpenCL: AMD/ATI GPU 1: AMD Radeon HD 7700 series (Capeverde) (driver version 1411.4 (VM), device version OpenCL 1.2 AMD-APP (1411.4), 1024MB, 918MB available, 2048 GFLOPS peak)

1411.4 was the 14.2beta from February.
I'll try 1348.5 or any other if i can get my hands on the download.

Seems some kind of driver issue (from what you just reported).
I can't remember to have seen the "half amount of RAM reported" problem over here on any of my hosts.
_\|/_
U r s
ID: 51252 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51257 - Posted: 23 Jun 2014, 10:00:57 UTC - in response to Message 51252.  
Last modified: 23 Jun 2014, 10:30:48 UTC

Everything is working well at present. I even managed to reduce the DDR3 7750 times down to the mid 7k range from the low 8k range. It also shaved a couple minutes off the 6770 times. The most noticeable difference seems to be changing to -ffa_block 4096 -ffa_block_fetch 2048 from 8192 & 4096. It helped my other machines as well.

I went through pages of Google links trying to find the Beta 14.1/14.2 download but all of them lead back to AMD and all they offered was 14.6. If 14.6 works as well as 14.4 on this machine I don't think I'll bother. The only remaining problem is the 6 series cards showing half the ram in clinfo and BOINC, otherwise 1359.4 works acceptably. CCC shows All the Ram. You can find the previous release drivers but not the previous Beta drivers. The 14.4 I installed on the Q9400 machine is dated 6 May, I don't know about the one I tried on the Mac Ubuntu.
ID: 51257 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51258 - Posted: 23 Jun 2014, 10:26:38 UTC - in response to Message 51257.  

It's good to know that you have a working driver now (with a little uglyness about RAM).
No errors to report on my side, so switching the GPUs in there slots did not reproduce your problem. Will change to another driver later today.

The most noticeable difference seems to be changing to -ffa_block 4096 -ffa_block_fetch 2048 from 8192 & 4096. It helped my other machines as well.

That is one especiality where linux seems to differ from windows. You could even try to run with some setting like -unroll 16 -ffa_block 2280 -ffa_block_fetch 1140 or even higher unroll parameters. Linux does not like higher ffa_block/ffa_block_fetch settings, but higher unroll settings (tested upto 30). But the best unroll setting differs from GPU to GPU.
_\|/_
U r s
ID: 51258 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 51263 - Posted: 24 Jun 2014, 11:37:43 UTC

Using GPU driver 1124.4 (Catalyst 13.4) also caused no problems, not even the "half amount of GPU RAM" problem appeared.
For the moment i'm out of AP wus, so time to switch GPU driver Cat13.12 and wait for more wus.

Maybe we need to consider some more factors not only the driver. Also the mainboard and how linux detects its components (PCI-Bus driver) could cause such problems. The log-files ("messages" needs root rights!) would show, if there is trouble with a warning or an error.
_\|/_
U r s
ID: 51263 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 2 Jul 13
Posts: 505
Credit: 4,685,331
RAC: 27,647
United States
Message 51265 - Posted: 24 Jun 2014, 14:40:46 UTC - in response to Message 51263.  
Last modified: 24 Jun 2014, 15:37:04 UTC

From what I've experienced, it depends on the driver and the OS. Excellent example was when I first installed Vista. I just used the Auto Updater to install about 200 updates. It worked with my two 6 series cards but not the 7750. In fact, Vista was showing half the ram, Task 3559218761. Then I realized the Service Packs were missing. After installing the 1st service pack, my 7750 then worked. Since installing the Service Packs not only does the 7750 work, but, it also shows All the Ram on the 6 series cards, Task 3559564523, Task 3593047731.

This would lead you to think that something is Missing from Ubuntu. But, it happens on Two Different Machines, that are using different Kernels. This would point to some problem with Ubuntu, since it happens on Two Different Machines. I've already run the package check, even used the package manager to check AMDs package requirements, System Recommendations. According to Ubuntu, my system is Up to Date...oh well.

Don't forget that clinfo/BOINC recognizes All the Ram on the 7750s;
  Global memory size:                            1894776832
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Queue properties:				 
    Out-of-Order:				 No
  Name:						 Capeverde
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 1359.4 (VM)
  Version:					 OpenCL 1.2 AMD-APP (1359.4)

BTW, have you noticed the 7750 says;
Driver version: 1359.4 (VM)
Where the 6770 doesn't;
Driver version: 1359.4 ????
ID: 51265 · Report as offensive
1 · 2 · Next

Message boards : AstroPulse : Changing Monitor/Card Arrangement in Linux Results in AP Computation Error


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.