SETI@home 7.25 for ARM Android released

Message boards : News : SETI@home 7.25 for ARM Android released
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

AuthorMessage
Profile Eric J Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 15 Mar 05
Posts: 1547
Credit: 26,876,750
RAC: 920
United States
Message 49318 - Posted: 13 Feb 2014, 3:34:10 UTC - in response to Message 49317.  

I'm instrumenting boinc functions in hopes of figuring out where that segfault is coming from.

Don't worry about the missing functions. That's just because you don't have libcorkscrew.so
ID: 49318 · Report as offensive
Linux? You're kidding me!!
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1645
Credit: 12,221,415
RAC: 15,637
Sweden
Message 49320 - Posted: 13 Feb 2014, 8:34:13 UTC
Last modified: 13 Feb 2014, 8:34:27 UTC

And my third v7.25 finished and validated:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=16035874

All restarts OK, reason is that I use the phone as a phone (crazy me), and don't crunch while on battery.
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 49320 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49322 - Posted: 13 Feb 2014, 11:45:15 UTC - in response to Message 49318.  
Last modified: 13 Feb 2014, 12:03:25 UTC

I'm instrumenting boinc functions in hopes of figuring out where that segfault is coming from.


While you're doing that, I'll go look back at what I did for the Cuda patch to stop boinc_finish()/boinc_exit() killing threads without warning on exit. Android'll be using multithreaded C-runtimes now too.

[Edit:] logic was:
controlling thread side:
- on any kind of exit, set a (volatile) exit request flag for the worker(s) to see & respond to.
- Short timeout sleep/spin checking for acknwoledge.
- Exit 'nicely' if it was acknowledged within a few seconds.
- Use traditional boincapi extreme predjudice iff not acknowledged.

worker thread side:
- pepper main processing loop with inline check of (volatile) exit request flag. Preferably occurs often enough for subsecond shutdown response.
- if set, sync & cleanup everything needed as quickly as possible, set an acknowledge (volatile) flag.
- preacknowledge any worker induced exits (after sync and cleanup)

Additionally: some OS/C-Runtime cases require further measures to ensure all IO streams are committed (as opposed to only flushed).
ID: 49322 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 13
United Kingdom
Message 49323 - Posted: 13 Feb 2014, 11:59:14 UTC - in response to Message 49322.  

I'm instrumenting boinc functions in hopes of figuring out where that segfault is coming from.

While you're doing that, I'll go look back at what I did for the Cuda patch to stop boinc_finish()/boinc_exit() killing threads without warning on exit. Android'll be using multithreaded C-runtimes now too.

If you can persuade Joachim Fritzsch to build it into the BOINC Android sources (he seems to be building the Berkeley code single-handed), that might put some more leverage on David.
ID: 49323 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49324 - Posted: 13 Feb 2014, 12:08:39 UTC - in response to Message 49323.  

I'm instrumenting boinc functions in hopes of figuring out where that segfault is coming from.

While you're doing that, I'll go look back at what I did for the Cuda patch to stop boinc_finish()/boinc_exit() killing threads without warning on exit. Android'll be using multithreaded C-runtimes now too.

If you can persuade Joachim Fritzsch to build it into the BOINC Android sources (he seems to be building the Berkeley code single-handed), that might put some more leverage on David.


I noticed the android boinc client sources are quite clean so looks viable. looking at einstein code last night, I noticed they distribute a boincapi diff patch with their sources. That looks more for build system issues than any thread safety additions, but if the scope demands it then that's another viable route.
ID: 49324 · Report as offensive
Profile David S
Volunteer tester
Avatar

Send message
Joined: 10 Sep 13
Posts: 1187
Credit: 2,569,690
RAC: 1,536
United States
Message 49325 - Posted: 13 Feb 2014, 14:22:52 UTC

Interesting developments. Since I wrote yesterday, my phone has had 3 more SIGILLs, all <2 minutes of CPU time. It has also had 2 -9 overflows, now pending, also <2 minutes of CPU time. But the big whoopee is that it's actually working on a task now. It's coming up on 5 hours of elapsed time (is that total wall clock time since it started, or just CPU time? It's only been on battery for about 30-40 minutes of that time) and says it's 8.2% complete.
David
signature sent back to alpha testing
ID: 49325 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 13
United Kingdom
Message 49328 - Posted: 14 Feb 2014, 0:10:38 UTC

Jason, your host 67466 has thrown a lot of dodgy overflows with v7.25 (armv6-vfp) this afternoon.
ID: 49328 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49340 - Posted: 14 Feb 2014, 15:25:25 UTC - in response to Message 49328.  
Last modified: 14 Feb 2014, 15:37:56 UTC

Jason, your host 67466 has thrown a lot of dodgy overflows with v7.25 (armv6-vfp) this afternoon.



No big surprise there :). I'm letting it run despite that nothing's been working on it better than marginally since 7.23, on the off chance it gives Eric some useful data.

It has no problems with any of the installed apps, nor any test pieces I've built here with SDK or the native dev kit (including some NEON and vfp regression tests). So the failures are seti-specific [Or Boinc of course], and apparently not device configuration or operating conditions dependant. (It runs cold, set to one core of four just in case and stays fully awake)
ID: 49340 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 13
United Kingdom
Message 49343 - Posted: 14 Feb 2014, 15:52:39 UTC - in response to Message 49340.  

Jason, your host 67466 has thrown a lot of dodgy overflows with v7.25 (armv6-vfp) this afternoon.

No big surprise there :). I'm letting it run despite that nothing's been working on it better than marginally since 7.23, on the off chance it gives Eric some useful data.

It has no problems with any of the installed apps, nor any test pieces I've built here with SDK or the native dev kit (including some NEON and vfp regression tests). So the failures are seti-specific [Or Boinc of course], and apparently not device configuration or operating conditions dependant. (It runs cold, set to one core of four just in case and stays fully awake)

I noticed, because I saw the same host number in several inconclusive WUs. Those were all VLARs (the type that get sent to Keplers here) - could there be a similar VLAR problem with the android apps?

(it's hard to tell at the moment, with the tasks being sorted differently when names are shown. I've asked David to remove that, here at least - he rather overshot the mark when he accepted my suggestion for the search box)
ID: 49343 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49344 - Posted: 14 Feb 2014, 16:03:57 UTC - in response to Message 49343.  
Last modified: 14 Feb 2014, 16:11:57 UTC

Jason, your host 67466 has thrown a lot of dodgy overflows with v7.25 (armv6-vfp) this afternoon.

No big surprise there :). I'm letting it run despite that nothing's been working on it better than marginally since 7.23, on the off chance it gives Eric some useful data.

It has no problems with any of the installed apps, nor any test pieces I've built here with SDK or the native dev kit (including some NEON and vfp regression tests). So the failures are seti-specific [Or Boinc of course], and apparently not device configuration or operating conditions dependant. (It runs cold, set to one core of four just in case and stays fully awake)

I noticed, because I saw the same host number in several inconclusive WUs. Those were all VLARs (the type that get sent to Keplers here) - could there be a similar VLAR problem with the android apps?

(it's hard to tell at the moment, with the tasks being sorted differently when names are shown. I've asked David to remove that, here at least - he rather overshot the mark when he accepted my suggestion for the search box)


Absolutely there can be 'similar' issues. The long pulsefinds (in VLAR) are memory/cache intensive, so thrash and take a long time, especially using debug code that's relatively unoptimised.

Boinc client and api code is full of hard wired 'magic numbers' guaranteed to break under certain situations, just like the Windows Driver timeout detection and recovery with Cuda and OpenCL are safeties. [Except in the Boinc case, limited to no recovery, and questionable detection...]

Overtight, hardwired, safeties, especially hidden, non-configurable, or poorly chosen overtight ones that don't allow for every conceivable situation, can cause more problems than they solve.

e.g. old fashioned non-inertial vehicle seatbelts break a lot of necks.
ID: 49344 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49360 - Posted: 15 Feb 2014, 15:27:32 UTC
Last modified: 15 Feb 2014, 16:22:22 UTC

19 hours later, my device is still (seemingly happily) mucnhing away on the task (also a VLAR, now at 60% after ~19 hours) allocated after the exchange with Richard. I had rebooted the tablet at that time to see if any gremlins would clear.

What I'll be interested to see, is if and when the task errors, or gets the SIGSEGV after completion, the vm is then somehow polluted/damaged again and freaks out like before. i.e. Does the Boinc client perhaps need some sortof functionality to detect a segfault in its vm(s) and restart, to prevent subsequent trashing.

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=16064574

[A bit Later:] And voila, without intervention, must have been watching it in a way it didn't like. It appeared to overflow, exit, then attempt to restart at 100% and failed with a computation error:
02:41:14 (19186): called boinc_finish
Restarted at 100.00 percent.


looks like Boinc's doing some funky stuff alright :)
Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT

<message>
finish file present too long
</message>


No SIGSEGV (this time) though. (probably that tight 10 second timeout that shows up with overflows I guess. Unfortunately I kneejerked restarted Boinc to see if the task cleared, so will have to wait for another failure before I see if Boinc's vm(s) are trashed.
ID: 49360 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,439,427
RAC: 0
United Kingdom
Message 49361 - Posted: 15 Feb 2014, 15:43:00 UTC - in response to Message 49360.  

My 2012 Nexus 7 has had a few v7.25 task that crunched O.K, it's doing one now, while lots of others errored within ~90 secs, all v7.26 tasks have errored so far too.

Claggy
ID: 49361 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49362 - Posted: 15 Feb 2014, 16:21:34 UTC - in response to Message 49361.  
Last modified: 15 Feb 2014, 16:44:17 UTC

My 2012 Nexus 7 has had a few v7.25 task that crunched O.K, it's doing one now, while lots of others errored within ~90 secs, all v7.26 tasks have errored so far too.

Claggy



There's a .26 now ? I guess I'll keep the tablet going then.

[Edit:] Since no tasks any fair way in, I've hit reset project to see if I get the two allocated, reissuued as 7.26 (well got one anyway)
ID: 49362 · Report as offensive
Linux? You're kidding me!!
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1645
Credit: 12,221,415
RAC: 15,637
Sweden
Message 49363 - Posted: 15 Feb 2014, 18:51:09 UTC

Suspending my fifth v7.25 to run my first v7.26. This version one also seems to work just fine on my Catphone.
WARNING!! "THIS IS A SIGNATURE", of the "IT MAY CHANGE AT ANY MOMENT" type. It may, or may not be considered insulting, all depending upon HOW SENSITIVE THE VIEWER IS, to certain inputs to/from the nervous system.
ID: 49363 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49364 - Posted: 15 Feb 2014, 19:41:54 UTC - in response to Message 49361.  
Last modified: 15 Feb 2014, 19:43:13 UTC

My 2012 Nexus 7 has had a few v7.25 task that crunched O.K, it's doing one now, while lots of others errored within ~90 secs, all v7.26 tasks have errored so far too.

Claggy



Hmmm, the plot thickens. My 2013 Nexus 7 seems to be liking 7.26, going through very quickly (relatively, ~33% in 2 hours so far, a shortie perhaps).

If this keeps up, I'm only expecting the possible segfault after boinc_finish(), which is boincapi and the client's domain.
ID: 49364 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 13
United Kingdom
Message 49367 - Posted: 15 Feb 2014, 21:42:09 UTC - in response to Message 49364.  

(a shortie perhaps)

Yes, a shortie - you can tell by the 8 Mar 2014 deadline.
ID: 49367 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 25 Apr 07
Posts: 44
Credit: 30,031,014
RAC: 7,530
United States
Message 49368 - Posted: 15 Feb 2014, 22:22:39 UTC
Last modified: 15 Feb 2014, 22:27:34 UTC

I haven't gotten an error-free task since 7.25 came out, but the same tablet (a Samsung Tab 3 8) did 5 error-free 7.24's (I only started that tablet on beta on the 31st of Jan...)

Mostly SIGILL's but I've also gotten a SIGSEGV (segmentation Violation...)

task 16069718 for the SIGSEGV...
ID: 49368 · Report as offensive
jason_gee
Volunteer tester

Send message
Joined: 11 Dec 08
Posts: 198
Credit: 658,573
RAC: 0
Australia
Message 49369 - Posted: 15 Feb 2014, 23:53:41 UTC - in response to Message 49367.  

(a shortie perhaps)

Yes, a shortie - you can tell by the 8 Mar 2014 deadline.


And that went through fine, just over 6 hours :D, not even a sign of the dreaded BoincApi crash or segmentation violation after boinc_finish() !

It downloaded a Vlar next, which promptly failed with a segmentation violation. Let's see whether it consistently fails with those kind while finishing shorties successfully... or some other random thing going on. I tailed apparently with A segmentation violation.
ID: 49369 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 49371 - Posted: 16 Feb 2014, 6:26:36 UTC

7.26 downloaded and started to make progress OK on one of my G-tabs.
Will download it on other devices too now.
ID: 49371 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 49372 - Posted: 16 Feb 2014, 7:32:52 UTC

Happiness was quite short:

<core_client_version>7.0.36</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Unable to resolve function unwind_backtrace_signal_arch Unable to resolve function acquire_my_map_info_list Unable to resolve function release_my_map_info_list Unable to resolve function get_backtrace_symbols Unable to resolve function free_backtrace_symbols Unable to resolve function format_backtrace_line Unable to resolve function load_symbol_table Unable to resolve function free_symbol_table Unable to resolve function find_symbol one or more symbols not found. stackdumps unavailable setiathome_v7 7.26 Revision: 2137 arm-linux-androideabi-g++ (GCC) 4.6 20120106 (prerelease) libboinc: BOINC 7.3.1

Work Unit Info: ............... WU true angle range is : 0.008687 features: swp half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 Optimal function choices: --------------------------------------------------------name timing error --------------------------------------------------------v_BaseLineSmooth (no other) vfp_GetPowerSpectrum 0.001426 0.00000 vfp_ChirpData 0.071499 0.00000 v_pfTranspose4 0.047597 0.00000 opt VFP folding 0.024578 0.00000 SIGSEGV: segmentation violation Manual call stack printout

Exiting...

</stderr_txt>
ID: 49372 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

Message boards : News : SETI@home 7.25 for ARM Android released


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.