Message boards :
Number crunching :
Mac Client Bug
Message board moderation
Author | Message |
---|---|
Havoc Send message Joined: 18 May 99 Posts: 38 Credit: 1,454,156 RAC: 0 |
Does this WU suggest a problem/bug in the Mac client? http://setiathome.berkeley.edu/workunit.php?wuid=2723540432 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Does this WU suggest a problem/bug in the Mac client?Yes - or possibly a recent update to the Mac operating system that the old client can't cope with. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Quite unlikely....was the last I heard. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
The applications page says Mac OS X/64-bit Intel 8.20 (opencl_ati5_mac) 17 Oct 2017, 23:49:50 UTC 17,835 GigaFLOPSso quite a lot of people have been completing quite a lot of valid work over the last two weeks. Doesn't sound like a deployment error to me. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Not a single Mac on Beta got that Error in almost a Year, then when it is moved to Main EVERY Mac gets that error. What does it sound like then? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
If every mac gets the error (on every task?), how does the apps page show a FLOPs count higher than any Mac app apart from the original v8.03 from January 2016? What's your evidence for that "every"? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Simply look at the list of machines, you can start with this one and work down, https://setiathome.berkeley.edu/results.php?hostid=8144040&state=6 Here, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=200 The next one would be, https://setiathome.berkeley.edu/results.php?hostid=2991797&state=6 , here, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=300 https://setiathome.berkeley.edu/results.php?hostid=7269192&state=6, here, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=340 You will find that in just about every active AMD Mac...on main. The try it on Beta, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=74242, here, https://setiweb.ssl.berkeley.edu/beta/top_hosts.php?sort_by=expavg_credit&offset=20 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
OK, let's talk about evidence. Havoc's post opening this thread linked to a workunit with two failed Mac OS X tasks, but successful completions for two windows computers. From which we can see that it was an Arecibo task with "WU true angle range is : 0.561684". The two failures were All tasks for computer 8312767 All tasks for computer 8022187 Both machines have vastly more successful, valid, results with application 'SETI@home v8 v8.20 (opencl_ati5_mac)' than they have errors. And the oldest valid tasks were issued before the newest errors - so it isn't simply Eric fixing the deployment. No, there's something else at play to cause those (rare) 'ERR_TOO_MANY_EXITS' outcomes. I don't own a Mac, so I'll have to leave you to track it down. For the record, it won't be the first time that an application has tested out fine at Beta, but has failed when exposed to the far wider range of task types distributed through Main - we had one a few years ago which suddenly used a huge amount of memory when, IIRC, it found a large number of pulses during the first main loop of the search. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Simply look at the list of machines, you can start with this one and work down, https://setiathome.berkeley.edu/results.php?hostid=8144040&state=6Sure, but look at https://setiathome.berkeley.edu/results.php?hostid=8144040&state=4 - same url, but just flipped the state number to look at the valid results. Lots of them with the same app. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've already tracked it down. The evidence is indisputable. The App was run on Beta for almost a Year without a single 'ERR_TOO_MANY_EXITS'. Once moved to Main Every Mac gets 'ERR_TOO_MANY_EXITS'. Look at the Slightly newer App on Main running Anonymous platform, not a single 'ERR_TOO_MANY_EXITS'. https://setiathome.berkeley.edu/results.php?hostid=6105482 https://setiathome.berkeley.edu/results.php?hostid=8243589 https://setiathome.berkeley.edu/results.php?hostid=8248108&state=2 Anonymous platform = Good, SETI Server = Bad. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Or perhaps 'old app bad, slightly newer app better'? (r3552 vs r3610) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The first thing I would check is the API. It should be version 7.5. But, I said that days ago... |
rob smith Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380 |
Passing thought.... Did someone re-build the application between the "successful" Beta operations and "problematic" main operations? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Passing thought....It would be unusual. Eric is usually extremely careful not to introduce any variation at that stage - I believe he only keeps one copy of the actual binary executable online, and soft-links to it from the different mount points used for Main and Beta deployments. The acid test would be to attach the same computer to both Main and Beta, and download both instances. Then, perform an exhaustive comparison of both the downloaded files, and the deployment metadata contained in client_state.xml |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
The first thing I would check is the API. It should be version 7.5.The key requirement is that the API version string - embedded in the application binary at the linker stage, from the compiled API library - matches the actual behaviour of the API codebase used. The embedded string is picked up by the deployment script and transferred to the appropriate place in the <app_version> declaration, so that the BOINC client uses the correct protocols when controlling the app behaviour. Actually, I would expect a version number of at least 7.7.0, or perhaps even 7.9.0, to pick up the changes made in the Mac OS X API by Charlie Fenton over the last month, to be compatible with both old and new versions of the Mac screensaver code (you are aware that Apple released a new version of OS X last month, I'm sure?) If this app has been 'running at Beta for a year', then it can't have that update yet. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Simply have someone check the client_state and see if it shows api version 7.5 in the apps section. This user has a Mac in both Main & Beta but hasn't run many tasks. It appears you need to run many tasks to see the error, http://setiathome.berkeley.edu/show_user.php?userid=7781668 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It appears you need to run many tasks to see the error.Then it CANNOT be a 'missing CL file' deployment error, as you tried to imply by referring me to the Beta thread this morning. I've posted in that thread myself now. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
When a task fails with ERR_TOO_MANY_EXITS, then it will have attempted to run 100 times before being killed. There will be 100 'boinc_temporary_exit()' reports in the Event Log (with reasons), and there will be 100 copies of stderr_txt in the task report. Unfortunately, Raistmer's stderr_txt files are too big for 100 copies to fit into the 64 KB of data reported by the failed task. But if you start reading at the bottom, the failure point seems to occur after Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 0.447592 Used GPU device parameters are: Number of compute units: 32 Single buffer allocation size: 128MB Total device global memory: 6144MB max WG size: 256 local mem type: Real LotOfMem path: no LowPerformanceGPU path: no HighPerformanceGPU path: no period_iterations_num=50and, by implication, the app started by writing OpenCL platform detected: Apple Number of OpenCL devices found : 2 BOINC assigns slot on device #1 of 2 devices. Info: BOINC provided OpenCL device ID usedThat's from one of the examples we looked at this morning, and it appears that the 100 attempts alternated between device #1 and device #2. That particular machine appeared to have two identical "AMD Radeon HD - FirePro D700 Compute Engine" - it's host 8144040 - but I could have picked any of them. TBar asserts that *every* Mac user running this app will encounter errors. By the law of averages, at least one of you must read this thread, and you will be able to continue the hunt for evidence from here. As I said this morning, I don't possess a (current) Mac (I do have an LC475), and I don't propose to go out and buy one just for this. I'll have to leave this one to the Mac community, but I hope I've left you enough clues about how to proceed. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Actually, I NEVER implied it was a missing CL file. What I DID imply was that Raistmer also thinks it's not likely an App can run on Beta, and Main, for almost a year without displaying a single error of the type that suddenly appeared on Main. People ran r3552 on Main last year under Anonymous platform and never saw that error. Seems it only appears when distributed by the Server on Main. What I did suggest was to check the API version, at this point I couldn't care less. Since it's all trash talk you can fix it yourself. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
No, it should be evidence-based analysis and diagnosis. But as I said, I don't possess the necessary equipment. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.