What causes this ERROR and how to prevent?


log in

Advanced search

Message boards : Number crunching : What causes this ERROR and how to prevent?

1 · 2 · 3 · Next
Author Message
Profile hiamps
Volunteer tester
Avatar
Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1005260 - Posted: 17 Jun 2010, 13:43:23 UTC

Name 18no09ai.19557.19290.9.10.193_1
Workunit 623323800
Created 17 Jun 2010 6:16:47 UTC
Sent 17 Jun 2010 6:43:37 UTC
Received 17 Jun 2010 12:25:26 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
Computer ID 5185924
Report deadline 2 Aug 2010 18:50:17 UTC
Run time 5,095.06
CPU time 53.04
Validate state Invalid
Credit 0.00
Application version Anonymous platform - NVIDIA GPU

Stderr output
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
setiathome_CUDA: Found 3 CUDA device(s):
Device 1 : GeForce GTX 285
totalGlobalMem = 1073741824
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1651388
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 2 : GeForce GTX 275
totalGlobalMem = 939524096
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1597891
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 30
Device 3 : GeForce GTX 260
totalGlobalMem = 939524096
sharedMemPerBlock = 16384
regsPerBlock = 16384
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1360593
totalConstMem = 65536
major = 1
minor = 3
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 27
setiathome_CUDA: CUDA Device 3 specified, checking...
Device 3: GeForce GTX 260 is okay
SETI@home using CUDA accelerated device GeForce GTX 260
V12 modification by Raistmer
Priority of worker thread rised successfully
Priority of process adjusted successfully
Total GPU memory 939524096 free GPU memory 876343296
setiathome_enhanced 6.02 Visual Studio/Microsoft C++

Build features: Non-graphics CUDA FFTW USE_SSE x86
CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is : 0.011781
After app init: total GPU memory 939524096 free GPU memory 876343296


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x765322A1

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.3.22


Dump Timestamp : 06/17/10 05:24:18
Install Directory :
Data Directory : C:\ProgramData\BOINC
Project Symstore :
LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\6;C:\ProgramData\BOINC\projects\setiathome.berkeley.edu;srv*C:\Users\Pete\AppData\Local\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\Users\Pete\AppData\Local\Temp\symbols*http://boinc.berkeley.edu/symstore


ModLoad: 00400000 00448000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\MB_6.08_CUDA_V12_noKill_FPLim2048.exe (6.2.0.0) (-nosymbols- Symbols Loaded)
Linked PDB Filename :
File Version : 6.02
Company Name : Space Sciences Laboratory
Product Name : setiathome_enhanced
Product Version : 6.02

ModLoad: 77040000 00180000 C:\Windows\SysWOW64\ntdll.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wntdll.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76570000 00100000 C:\Windows\syswow64\kernel32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wkernel32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76520000 00046000 C:\Windows\syswow64\KERNELBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wkernelbase.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 10000000 0004a000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\cudart.dll (6.14.11.2030) (-exported- Symbols Loaded)
Linked PDB Filename :
File Version : 6,14,11,2030
Company Name : NVIDIA Corporation
Product Name : NVIDIA CUDA 2.3 Runtime
Product Version : 6,14,11,2030

ModLoad: 00a60000 00845000 C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\cufft.dll (6.14.11.2030) (-exported- Symbols Loaded)
Linked PDB Filename :
File Version : 6,14,11,2030
Company Name : NVIDIA Corporation
Product Name : NVIDIA Windows XP CUDA 2.3 FFT Library
Product Version : 6,14,11,2030

ModLoad: 012b0000 00214000 C:\Windows\system32\nvcuda.dll (8.16.11.9107) (-exported- Symbols Loaded)
Linked PDB Filename :
File Version : 8.16.11.9107
Company Name : NVIDIA Corporation
Product Name : NVIDIA CUDA 2.3 driver
Product Version : 8.16.11.9107

ModLoad: 759d0000 00100000 C:\Windows\syswow64\USER32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wuser32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 762d0000 00090000 C:\Windows\syswow64\GDI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wgdi32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 77010000 0000a000 C:\Windows\syswow64\LPK.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wlpk.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76920000 0009d000 C:\Windows\syswow64\USP10.dll (1.626.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : usp10.pdb
File Version : 1.0626.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0626.7600.16385

ModLoad: 74c10000 000ac000 C:\Windows\syswow64\msvcrt.dll (7.0.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : msvcrt.pdb
File Version : 7.0.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 7.0.7600.16385

ModLoad: 767e0000 000a0000 C:\Windows\syswow64\ADVAPI32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : advapi32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76670000 00019000 C:\Windows\SysWOW64\sechost.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : sechost.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76430000 000f0000 C:\Windows\syswow64\RPCRT4.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wrpcrt4.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 74bb0000 00060000 C:\Windows\syswow64\SspiCli.dll (6.1.7600.16484) (-exported- Symbols Loaded)
Linked PDB Filename : wsspicli.pdb
File Version : 6.1.7600.16484 (win7_gdr.091210-1534)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16484

ModLoad: 74ba0000 0000c000 C:\Windows\syswow64\CRYPTBASE.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : cryptbase.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 769e0000 00060000 C:\Windows\system32\IMM32.DLL (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : wimm32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 76360000 000cc000 C:\Windows\syswow64\MSCTF.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : msctf.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 045d0000 0010f000 C:\Windows\system32\nvapi.dll (8.16.11.9107) (-exported- Symbols Loaded)
Linked PDB Filename : d:\bld\r190_93\drivers\nvapi\_out\win7_wow64_release\nvapi.pdb
File Version : 8.16.11.9107
Company Name : NVIDIA Corporation
Product Name : NVIDIA Windows drivers
Product Version : 8.16.11.9107

ModLoad: 76170000 0015c000 C:\Windows\syswow64\ole32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : ole32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

Get Product Name Failed.
ModLoad: 76690000 0008f000 C:\Windows\syswow64\OLEAUT32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : oleaut32.pdb
File Version : 6.1.7600.16385
Company Name : Microsoft Corporation
Product Name :
Product Version : 6.1.7600.16385

ModLoad: 76720000 00057000 C:\Windows\syswow64\SHLWAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : shlwapi.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 74cf0000 00c49000 C:\Windows\syswow64\SHELL32.dll (6.1.7600.16532) (-exported- Symbols Loaded)
Linked PDB Filename : shell32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 75ad0000 0019d000 C:\Windows\syswow64\SETUPAPI.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : setupapi.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 75dd0000 00027000 C:\Windows\syswow64\CFGMGR32.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : cfgmgr32.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 769c0000 00012000 C:\Windows\syswow64\DEVOBJ.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : devobj.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 735f0000 00009000 C:\Windows\system32\VERSION.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : version.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385

ModLoad: 73d70000 000eb000 C:\Windows\system32\dbghelp.dll (6.1.7600.16385) (-exported- Symbols Loaded)
Linked PDB Filename : dbghelp.pdb
File Version : 6.1.7600.16385 (win7_rtm.090713-1255)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.1.7600.16385



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 0, Write: 0, Other 0

- I/O Transfers Counters -
Read: 0, Write: 0, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0
QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0

- Virtual Memory Usage -
VirtualSize: 0, PeakVirtualSize: 0

- Pagefile Usage -
PagefileUsage: 0, PeakPagefileUsage: 0

- Working Set Size -
WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0

*** Dump of thread ID 2368 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x765322A1

- Registers -
eax=00000000 ebx=00000000 ecx=00811712 edx=033a613c esi=00000001 edi=00000000
eip=765322a1 esp=033afb74 ebp=033aff94
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
033aff94 77079d72 00000000 7686c201 00000000 00000000 KERNELBASE!DebugBreak+0x0
033affd4 77079d45 0045c9b0 00000000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0
033affec 00000000 0045c9b0 00000000 00000000 03bc0000 ntdll!RtlInitializeExceptionChain+0x0

*** Dump of thread ID 3196 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
eax=00000001 ebx=008963b0 ecx=00ae6294 edx=00ae6293 esi=00896330 edi=00875ad0
eip=10020b23 esp=0018f4e8 ebp=0018f514
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0018f514 1000a371 00896330 008932c0 00874b6c 00875ad0 cudart!cudaD3D10MapResources+0x0
0018f52c 100137c2 5c00adb7 01600b88 01600b60 0018f5c8 cudart!cudaCreateChannelDesc+0x0
0018f544 10020b28 00004000 00000001 00000001 00000040 cudart!cudaCreateChannelDesc+0x0
00000000 00000000 00000000 00000000 00000000 00000000 cudart!cudaD3D10MapResources+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

____________
Official Abuser of Boinc Buttons...
And no good credit hound!

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1005295 - Posted: 17 Jun 2010, 15:15:51 UTC - in response to Message 1005260.

Did you rund reschedule prior to this error?
____________
Morten Ross

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,855,898
RAC: 12,050
Australia
Message 1005296 - Posted: 17 Jun 2010, 15:20:04 UTC
Last modified: 17 Jun 2010, 15:24:14 UTC

...<message>
Maximum elapsed time exceeded
</message>


Please remove the flops estimates from your app info & restart Boinc. "someone" seems to be 'fiddling' with the server side fpop estimates ::S
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Morten Ross
Volunteer tester
Avatar
Send message
Joined: 30 Apr 01
Posts: 183
Credit: 378,289,433
RAC: 0
Norway
Message 1005303 - Posted: 17 Jun 2010, 15:41:17 UTC - in response to Message 1005296.
Last modified: 17 Jun 2010, 15:41:39 UTC

I thought this had to to with running reschedule on cuda 609s, but fortunatley you seem to be correct!

Hillarious really, as just a few days ago we had to make sure all app_infos contained a correct number to even get work as the task estimates were off the chart.

Now som of us must take it out. Wonder how log before I must put it back in.....

I only get this on one host, though. Do you have any ideas why I don't get it on any of my GTX 260s and 285s?

Morten
____________
Morten Ross

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,855,898
RAC: 12,050
Australia
Message 1005307 - Posted: 17 Jun 2010, 15:46:00 UTC

It is extremely funny indeed :). Since the flops estimates were 'our' workaround for a situation in view of an underdesigned benchmark system (for about one year), and now they're 'trying' something to fix it...

Well then the bandaid must come off ... would you like it fast or slow ?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005337 - Posted: 17 Jun 2010, 17:37:34 UTC

Virtualy all of my MB CPU tasks being run by AK_v8b_win_SSSE3x.exe are failing with this error. Funny thing is only on one of my computers. I did have an overclock by backed it off and still getting this same -177 error.

This computer was working fine before the massive outage.
____________
Boinc....Boinc....Boinc....Boinc....

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005346 - Posted: 17 Jun 2010, 18:22:20 UTC

Shut down. Cleaned 4 case air filters. (dirty) Blew out dust bunny's from cpu heatsink etc. CPU temp now down by 20 degrees F. So I'll let it crunch and see what happens.

____________
Boinc....Boinc....Boinc....Boinc....

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3557
Credit: 20,756,713
RAC: 23,446
Sweden
Message 1005351 - Posted: 17 Jun 2010, 18:39:01 UTC - in response to Message 1005296.

...<message>
Maximum elapsed time exceeded
</message>


Please remove the flops estimates from your app info & restart Boinc. "someone" seems to be 'fiddling' with the server side fpop estimates ::S


Flops removed, and some WU's went into high priority immediately.

In my low end ION which is the only computer with a Cuda capable GPU, and the only computer I have with the flops estimate put in, running MB, AP and CUDA, removing the flops estimates resulted in the following:

CUDA WU's normally with flops showed and run around 4.5 hours. Now without flops the initial estimate is 24.5 hours. (holy smoke)

AP WU's normally with flops showed and run around 105 hours. Now without flops the initial estimate is 288.5 hours. (geeze)

MB WU's normally with flops showed and run around 16 hours. Now without flops the initial estimate is 26.5 hours. (meow)


So, I sure won't get any new WU's for some time :-)

I'm going to wait and see how it looks when things settle down.



____________

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005398 - Posted: 17 Jun 2010, 21:11:42 UTC
Last modified: 17 Jun 2010, 21:13:01 UTC

Due to ongoing errors I have switched from AK_v8b_win_SSSE3x.exe to AK_v8b_win_SSE3_INTEL.exe on my main computer. My other 3 computers are running fine with the former exe.

Earlier today I cleaned air filters and heat sinks, clocked the computer back to stock settings but still it is returning errors only for MB and this app.

The new setup at Seti penilizes host computers that are returning errors and I didn't know this computer was a problem with this app. Hope it starts turning in good work!
____________
Boinc....Boinc....Boinc....Boinc....

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005426 - Posted: 17 Jun 2010, 22:53:57 UTC

Raistmer reports to me that the -107 error is "too much time used". Mine are failing always at about 90% complete, just 10% more to do. So who determines the maximum allowed crunch time?

My Boinc is estimating about 8 minutes to crunch these, in reality they take 20 to 30 minutes. So now what to do?

By the way, still failures with the AK_v8b_win_SSE3_INTEL.exe and I have reseated the memory. Still every one is failing with this error.
____________
Boinc....Boinc....Boinc....Boinc....

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005434 - Posted: 17 Jun 2010, 23:26:21 UTC
Last modified: 17 Jun 2010, 23:31:53 UTC

I am a loss as for what to do. I just watched one get to 27 minutes elapsed, at 91% complete and 2 minutes to go. -107 error. They are all failing at the same point, around 90% done. It's not CPU temp, not the app software.

At this time it is crunching 3 AstroPulse wu and 1 MB wu on the 4 cores. ALL the MB work done on this machine since the servers came back online have failed. This computer was fine before the outage at Berkeley.

Would someone point the developers to my postings here? Maybe they have an answer.

This is the computer in question.

[edit] These may be VLAR wu that were moved from the GPU to the CPU last night.
____________
Boinc....Boinc....Boinc....Boinc....

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005437 - Posted: 17 Jun 2010, 23:48:47 UTC

Currently working on wu 04no09af.2587.22562.14.10.20_2 which is <true_angle_range>0.44818403675756 so I don't think this would have been moved from the GPU to CPU.

Here is the workunit
but at the time of this writing my computer is still chewing on it.
____________
Boinc....Boinc....Boinc....Boinc....

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8686
Credit: 25,032,737
RAC: 30,075
United Kingdom
Message 1005445 - Posted: 18 Jun 2010, 0:40:10 UTC

Seems to me you are too busy posting and not reading.

In "Message from server: (reached daily quota of 100..." I put a link in referring to a Jason G msg in "176 hours to completion" that you need to remove the flops entries.

The info is all out there go do some reading before your next post.

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005448 - Posted: 18 Jun 2010, 0:48:45 UTC
Last modified: 18 Jun 2010, 1:10:02 UTC

This computer never did have any flop entry's in it. New install after the outage came back on.

That work unit errored at 1:29:43 and 93% complele, just a few minutes remaining.

Hmm.......maybe I should put some flops in for 603 only???

[edit] I have set NNW on this computer. I will crunch it down then reset, detach and re-attach. Unless I get a good suggestion.

[edit again] Actually when the AP and CUDA work are done I will abort all the 603 MB and report all them. Then do the above.
____________
Boinc....Boinc....Boinc....Boinc....

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,855,898
RAC: 12,050
Australia
Message 1005495 - Posted: 18 Jun 2010, 3:09:24 UTC - in response to Message 1005448.

Rerun CPU Benchmarks?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005496 - Posted: 18 Jun 2010, 3:11:18 UTC

Thanks Jason, I tried that earler today.
____________
Boinc....Boinc....Boinc....Boinc....

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,855,898
RAC: 12,050
Australia
Message 1005497 - Posted: 18 Jun 2010, 3:13:02 UTC - in response to Message 1005496.

Has your task duration correction factor gone weird on that host then ?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005506 - Posted: 18 Jun 2010, 3:55:09 UTC

Earlier today it was 2.XXXX. Don't remember the exact number. Now is 1.0012.

I will be resetting, de-attach etc.......including removing and reinstalling Boinc in a few minutes.

I had 3 AP work units complete in 10:43:09, 10:48:37 and 11:07:14 all without computation errors so I am having a hard time believing that my cpu is at fault.

I have the error files that Windows XP reported, (to Microsoft???), if anyone is interested. Zipped file is 20.8 MB and 7z file is 17.8 MB if anyone is interested in them. PM me if you would like with a personal email address.
____________
Boinc....Boinc....Boinc....Boinc....

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5051
Credit: 73,855,898
RAC: 12,050
Australia
Message 1005510 - Posted: 18 Jun 2010, 4:08:56 UTC
Last modified: 18 Jun 2010, 4:11:07 UTC

I highly doubt it's anything wrong with the CPU or applications running on it, but some quirk of how the fpops estimates are being done server side, in combination with the local time estimates on the host.

A reset might clear that, but seeing as the installation is behaving strangely, I would use the opportunity first to 'play around' with the duration correction factor just to see what happens (By stopping boinc & editing the client state file manually... look for the field like <duration_correction_factor>0.372434</duration_correction_factor> )

That's the current value of mine, and seems to be munching away happily. I would see what a slightly more conservative value of 0.5 does, then larger values, then perhaps smaller values.

That's of course the kind of experimentation I wouldn't recommend if the machine is happily crunching away, but since a reset is imminent, might as well turn it into a positive.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 826
United States
Message 1005513 - Posted: 18 Jun 2010, 4:16:54 UTC

As always Jason, thanks so much for your interest and help.

Removed Boinc and all it's directory's, including the work directory. Reinstalled everything, waiting now for work. Might be 3 hours before that happens.

____________
Boinc....Boinc....Boinc....Boinc....

1 · 2 · 3 · Next

Message boards : Number crunching : What causes this ERROR and how to prevent?

Copyright © 2014 University of California