Message boards :
Number crunching :
Warning when using 7.3.14
Message board moderation
Author | Message |
---|---|
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
BOINC 7.3.14, the latest alpha, will trash all work here at Seti with similar messages as the one at the end of this post. This is because this newest client will check the workunit.rsc_memory_bound value that the project has set as the maximum amount of memory that a task can use. The clients up till 7.3.14 did not check this value correctly. 7.3.14 does. And since all projects set this value to too low a value, your work will be aborted after a bit of a run. If this client is one that is going to be released to the public, all projects will have to redo their workunit.rsc_memory_bound values to more correct ones. E.g. 1024MB of x64, 512MB on x86. The project would have to set rsc_memory_bound to 1 GB. But then these tasks wouldn't be sent to x86 hosts with 512MB RAM. <core_client_version>7.3.14</core_client_version> <![CDATA[ <message> working set size > workunit.rsc_memory_bound: 36.82MB > 32.00MB </message> <stderr_txt> Running on device number: 0 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Intel(R) Corporation OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns device 0 Info: BOINC provided device ID used Build features: SETI7 Non-graphics OpenCL USE_OPENCL_HD5xxx OCL_ZERO_COPY OCL_CHIRP3 FFTW AMD specific USE_SSE x86 CPUID: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX OpenCL-kernels filename : MultiBeam_Kernels_r1843.cl ar=0.443860 NumCfft=192145 NumGauss= 1064875014 NumPulse= 114535999999 NumTriplet= 15768471666688 Currently allocated 145 MB for GPU buffers In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Windows optimized S@H Enhanced application by Alex Kan Version info: SSEx (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSEx Win32 Build 1843 , Ported by : Raistmer, JDWhale SETI7 update by Raistmer OpenCL version by Raistmer, r1843 AMD HD5 version by Raistmer Number of OpenCL platforms: 2 OpenCL Platform Name: Intel(R) OpenCL Number of devices: 0 OpenCL Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Max compute units: 20 Max work group size: 256 Max clock frequency: 1000Mhz Max memory allocation: 536870912 Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 2147483648 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Queue properties: Out-of-Order: No Name: Pitcairn Vendor: Advanced Micro Devices, Inc. Driver version: 1268.1 (VM) Version: OpenCL 1.2 AMD-APP (1268.1) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.443860 Used GPU device parameters are: Number of compute units: 20 Single buffer allocation size: 64MB max WG size: 256 period_iterations_num=20 GPU device synched Termination request detected or computations are finished. GPU device synched, exiting... </stderr_txt> ]]> |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Must set rsc_memory_bound correctly SETI Team: You need to change your work unit parameters, to properly set <rsc_memory_bound> correctly. BOINC 7.3.14 alpha (and potentially future versions also) will read that value, and compare it to the Working Set size, and will auto-abort the work unit if it exceeds the bound. As of right now, I am getting errors due to your incorrect settings. For example: http://setiathome.berkeley.edu/result.php?resultid=3467797216 Exit status 198 (0xc6) EXIT_MEM_LIMIT_EXCEEDED <core_client_version>7.3.14</core_client_version> <![CDATA[ <message> working set size > workunit.rsc_memory_bound: 97.08MB > 32.00MB </message> Could you please promptly fix this? Regards, Jacob Klein |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
It looks like this change is being reverted for now, per David's email below. So, there is no longer an immediate need to correct the value... But please consider setting it correctly at some point, in case it gets used by the client in the future. > Date: Mon, 31 Mar 2014 18:53:33 -0700 > From: d..a@ssl.berkeley.edu > To: b..c_alpha@ssl.berkeley.edu > Subject: Re: [boinc_alpha] 7.3.14 - Heads up - Memory bound enforcement > > On further thought, I'm going to change things back to the way they were, namely > > 1) workunit.rsc_memory_bound is used only by the server; > it won't send a job if rsc_memory_bound > host's available RAM > 2) the client aborts a job if working set size > host's available RAM > 3) the client will run a set of jobs only if the sum of their WSSs > fits in available RAM > (i.e. if a job's WSS is close to all available RAM, > it would run that job and nothing else) > > The reason for not aborting jobs when WSS > rsc_memory_bound is that > it requires projects to come up with very accurate estimates of RAM usage, > which I don't think is feasible in general. > Also, it will lead to lots of aborted jobs, which is bad for volunteer morale. > > -- David |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
I keep telling people that newer versions don't necessarily make them better.
Now that's strange hearing that from Dr. D.A., are you sure that he sent it? Cheers. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I keep telling people that newer versions don't necessarily make them better. The impressions I've been given from activity on the boincapi and CreditNew fronts, is that Dr. A's quite aware there are problems, and needs development help fixing them due to low resources. I'm not at all surprised that a change to tighten a questionable failsafe, might create more work than it alleviates, so be retracted for a rethink. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Does this imply that host with let say 256 MB will never recive this app? I think that host with 256MB of system memory can do OpenCL Ati MB task provided GPU allow it. Why project should exclude such hosts just to please new BOINC version? Or am I missing something ? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
And another question why the hell client should abort ANYTHING already recived ??? It always just causes frustration and resource waste. What a maniacal will to abortion ?? It was with "GPU missing" until objections were too strong and now task set on suspend instead of abortion... It with "too long running" tasks that aborted w/o really need just because BOINC thinks they run too long due to distorted estimates... And now because task workset exceeds available memory? so what, we live in world with paging many years already. If such task happened to land somehow on PC let it continue or die by itself. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
If the task's working set size exceeds the amount of RAM that the user has configured BOINC to use, then it does make sense to abort the task, because it cannot possibly be completed. That logic is not being changed. The proposal was to additionally check the working set size against a work unit parameter, rsc_memory_bound, and abort if the bound was exceeded. That proposed change, which was put into 7.3.14 alpha, will be reverted. Thus, even if the work unit exceeds rsc_memory_bound, it'll be allowed to continue, unless it happens to exceed the amount of RAM that the user has configured BOINC to use. Take a breath . . . |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Part of the problem here, is that under modern operating systems, limits on and with respect to physical RAM make no sense, as all the memory is virtualised. That makes 'working set size' and page commit behaviour technically more relevant, though pretty far removed from typical user perception (Except perhaps for the few that completely disable paging, and understand that up to 50% of Windows memory is non-paged Kernel memory) As the implementation appears to have been intended, the limit setting, along with subsequent aborts, will appear as nonsense, simply because applications do not use ANY physical memory directly. If Boinc wants 'safeties', it needs to use ones that make sense, like the obvious: "received an out of memory exception with XYZ application"... don't send that anymore, and somehow provide info as to why (with a possible reset)... application details page perhaps ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
And now because task workset exceeds available memory? so what, we live in world with paging many years already. If such task happened to land somehow on PC let it continue or die by itself. Next time please consider what paging does to computer's usability. If BOINC is allowed to do computations while the computer is in use and the science apps use so much memory that the computer starts paging it will severely affect the users experience with using whatever programs he wants to use. And even if the paging occurs when the computer is not in use otherwise it will slow down science apps very badly. You can't do any computations with your data if the data is not in RAM, can you? And if you combine this with Linux' rather optimistic memory allocation strategy and the typical fixed size swap (or even no swap at all) you'll get in extreme cases a computer that is complete unusable. Once you have too much data in hand Linux starts throwing program code out of the RAM. When all of the virtual memory is used the computer will spend all of its time loading a page of code in and throwing another out. I've seen that happen with programs leaking memory and starting one too many program when the computer is already swapping. The only way out that situation is the reset button. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.