No output again (for just one WU)?

Message boards : Number crunching : No output again (for just one WU)?
Message board moderation

To post messages, you must log in.

AuthorMessage
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1634800 - Posted: 30 Jan 2015, 1:53:21 UTC

Last summer I was panicking here after noticing that after updating to the latest BOINC version the 2nd S@h process running was completing instantly and outputting nothing, was fearing it was some CPU issue. Uninstalling, clearing everything and reinstalling an old version fixed it then, and then after I finally updated again during the WU downtime here it didn't happen again, but now among the occasional invalid caused by a faulty WU that keeps popping up lately I also noticed this one: 1684383739. Ran all that time and output was nothing (wingman has an overflow). What happened?
ID: 1634800 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1634812 - Posted: 30 Jan 2015, 2:13:53 UTC - in response to Message 1634800.  

It looks to me that the first 2 results were Inconclusive, so a 3rd unit was sent out, and that result matched the first person better than yours. I think anyways...

So the first and 3rd person got the credits.
ID: 1634812 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1634816 - Posted: 30 Jan 2015, 2:17:01 UTC - in response to Message 1634812.  

That 3rd one just came in now, wasn't there a few minutes ago when I posted the message, and an invalid result would be a reason for concern in itself, but the worse issue is that mine is empty, not that it's a bit off.

All 6 WUs sent since as well as one other sent at the same time as that one validated, so at the moment not a repeat issue, but that one had nothing.
ID: 1634816 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1634830 - Posted: 30 Jan 2015, 3:01:25 UTC - in response to Message 1634816.  

In all likelihood, you've just become another victim of: Strange Invalid MB Overflow tasks with truncated Stderr outputs....

I've had four more of those just in the last 3 weeks. Unfortunately, even though a fix to the validator was proposed over a year ago, Korpela apparently chose not to implement it, so we have to continue to live with the problem.
ID: 1634830 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1634831 - Posted: 30 Jan 2015, 3:08:09 UTC - in response to Message 1634830.  

May be... Never noticed that before though, but it does seem to match your initial assumption there, overflows where wingmen's results say spikes were less than 30. They don't quit right away in that case though (at least not on CPU), see that one after 3.5h. Definitely had plenty of those before though, overflows past spikes, I mean, and they're easy enough to notice in my case, those that finish sooner than they should but after more than 40 sec or so (which is when the spike overflow ones finish), and never saw any having this issue. Odd...
ID: 1634831 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1634835 - Posted: 30 Jan 2015, 3:36:21 UTC - in response to Message 1634831.  

May be... Never noticed that before though, but it does seem to match your initial assumption there, overflows where wingmen's results say spikes were less than 30. They don't quit right away in that case though (at least not on CPU), see that one after 3.5h. Definitely had plenty of those before though, overflows past spikes, I mean, and they're easy enough to notice in my case, those that finish sooner than they should but after more than 40 sec or so (which is when the spike overflow ones finish), and never saw any having this issue. Odd...

For these tasks to get marked Invalid, they essentially require 3 conditions. First, they must be overflow tasks with a truncated Stderr. Second, the Spike count must be less than 30 and, third, they must have an Autocorr count of 0. Your task met the first condition and you can see from your wingmen's Stderr output that the 2nd and 3rd conditions were met, as well.

It's brutal on a CPU when all that processing time is wasted. The 4 that I've had this month were all on GPU tasks, none of which ran longer than 11 seconds. It all depends on how long it takes to find the 30th signal (other than a spike) to cause the overflow.
ID: 1634835 · Report as offensive
Profile Uli
Volunteer tester
Avatar

Send message
Joined: 6 Feb 00
Posts: 10923
Credit: 5,996,015
RAC: 1
Germany
Message 1634883 - Posted: 30 Jan 2015, 5:41:37 UTC

Eric is a very busy man. If it is a Seti issue and only affects very few results, his time is better spend trying to get Grants, to keep this project going.
If it is Boinc related than you need to contact David Anderson via Boinc Dev.
Pluto will always be a planet to me.

Seti Ambassador
Not to late to order an Anni Shirt
ID: 1634883 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1634973 - Posted: 30 Jan 2015, 11:02:25 UTC - in response to Message 1634883.  

Speaking of, couldn't SETI look into collaborating with any teams searching for exoplanets and seeing about including a module to help that endeavor as well? Can be seen as related, so would make some sense to have them on the same project and that search would benefit from S@h's userbase, and should funnel some additional funds this way as well.
ID: 1634973 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1650525 - Posted: 8 Mar 2015, 2:47:51 UTC

Seems like I got another one of these? Wingman has an overflow past spikes with 0 autocorr, just what is said here, and mine has no result displayed.
Admittedly, that was an odd one in terms of processing, was wondering about it, through several reboots, uninstalling one antivirus and installing another, scanning, was wondering whether it messed anything up.
Then again, another done around the same time was fine.
ID: 1650525 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1650608 - Posted: 8 Mar 2015, 12:24:10 UTC - in response to Message 1650525.  

Seems like I got another one of these? Wingman has an overflow past spikes with 0 autocorr, just what is said here, and mine has no result displayed.
Admittedly, that was an odd one in terms of processing, was wondering about it, through several reboots, uninstalling one antivirus and installing another, scanning, was wondering whether it messed anything up.
Then again, another done around the same time was fine.

That's got a truncated stderr.txt

Claggy
ID: 1650608 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1650617 - Posted: 8 Mar 2015, 12:42:52 UTC - in response to Message 1650608.  

Seems like I got another one of these? Wingman has an overflow past spikes with 0 autocorr, just what is said here, and mine has no result displayed.
Admittedly, that was an odd one in terms of processing, was wondering about it, through several reboots, uninstalling one antivirus and installing another, scanning, was wondering whether it messed anything up.
Then again, another done around the same time was fine.

That's got a truncated stderr.txt

Claggy
Seems to suggest DevC++/MinGW stock build is using multithreaded runtimes these days and/or OS buffered IO as well. Boinc Api's treatment of raw threads is pretty poor. Still trying to figure the best way to handle that, with Boinc_dev in complete denial.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1650617 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1650728 - Posted: 8 Mar 2015, 17:50:53 UTC - in response to Message 1650617.  

Seems like I got another one of these? Wingman has an overflow past spikes with 0 autocorr, just what is said here, and mine has no result displayed.
Admittedly, that was an odd one in terms of processing, was wondering about it, through several reboots, uninstalling one antivirus and installing another, scanning, was wondering whether it messed anything up.
Then again, another done around the same time was fine.

That's got a truncated stderr.txt

Claggy
Seems to suggest DevC++/MinGW stock build is using multithreaded runtimes these days and/or OS buffered IO as well. Boinc Api's treatment of raw threads is pretty poor. Still trying to figure the best way to handle that, with Boinc_dev in complete denial.

Matt Arsenault from Milkyway@home posted stderr returned unreliable to boinc_dev in January 2012, and had a similar reaction. The patch he attached to his last post in the thread remains interesting.

setiathome_7.00_windows_intelx86.exe shows in task details:

setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

And Dependency Walker indicates it is linked to MSVCRT.DLL. I suppose it may be possible to use static runtime from the Windows SDK or such, but that might get into issues with the versions of Windows which Microsoft is trying to kill.
                                                                  Joe
ID: 1650728 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1650734 - Posted: 8 Mar 2015, 18:09:06 UTC - in response to Message 1650728.  
Last modified: 8 Mar 2015, 18:09:49 UTC

Interesting. I'll consider trying to contact Milkyway devs then, once I have some things out of the way, even if what they do is fairly specific to them. For Cuda, CPU, and android purposes I've been looking at an overhaul (might as well since I've settled on switching to Gradle to bring 5 separate build systems down to one, including work embedded/MCU stuff), so switched chunks of code using Raw threads to active thread objects wouldn't be difficult, and be more extremely portable.
Not sure if these articles are behind registration walls:
http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095

At least with no choice but to stay custom, then might as well do it properly.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1650734 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1650740 - Posted: 8 Mar 2015, 18:32:51 UTC - in response to Message 1650734.  

Interesting. I'll consider trying to contact Milkyway devs then, once I have some things out of the way, even if what they do is fairly specific to them. For Cuda, CPU, and android purposes I've been looking at an overhaul (might as well since I've settled on switching to Gradle to bring 5 separate build systems down to one, including work embedded/MCU stuff), so switched chunks of code using Raw threads to active thread objects wouldn't be difficult, and be more extremely portable.
Not sure if these articles are behind registration walls:
http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095

At least with no choice but to stay custom, then might as well do it properly.

Matt himself may be worth contacting, but in my experience some of the currently-active devs there (some of whom are undergraduates) understand BOINC even less than we do.
ID: 1650740 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1650927 - Posted: 9 Mar 2015, 6:35:39 UTC - in response to Message 1650740.  

Interesting. I'll consider trying to contact Milkyway devs then, once I have some things out of the way, even if what they do is fairly specific to them. For Cuda, CPU, and android purposes I've been looking at an overhaul (might as well since I've settled on switching to Gradle to bring 5 separate build systems down to one, including work embedded/MCU stuff), so switched chunks of code using Raw threads to active thread objects wouldn't be difficult, and be more extremely portable.
Not sure if these articles are behind registration walls:
http://www.drdobbs.com/parallel/use-threads-correctly-isolation-asynch/215900465
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095

At least with no choice but to stay custom, then might as well do it properly.

Matt himself may be worth contacting, but in my experience some of the currently-active devs there (some of whom are undergraduates) understand BOINC even less than we do.


Or Travis.

ID: 1650927 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1650928 - Posted: 9 Mar 2015, 6:56:53 UTC - in response to Message 1650728.  
Last modified: 9 Mar 2015, 7:28:02 UTC

setiathome_7.00_windows_intelx86.exe shows in task details:

setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

And Dependency Walker indicates it is linked to MSVCRT.DLL. I suppose it may be possible to use static runtime from the Windows SDK or such, but that might get into issues with the versions of Windows which Microsoft is trying to kill.
                                                                  Joe


Having slept on it, the minimal customisations approach to that one would likely best be swapping TerminateProcess() out for ExitProcess(), then changing the timer thread's outer logic from an infinite while (1) loop, to a sentinal done flag (which is already there for different purposes), which will enable the thread to shut itself down gracefully. Adding appropriate EndThread() might be nice, but not required. The dynamic linkage with MSVCRT itself *should* mean no leakage issues with CreateThread() and calling crt functions in the timer threasd, though the current use of TerminateProcess() with active threads and DLLs, will always have the capability to exhibit strange kinds of carnage.

Allowing dynamic or static linkage ( to ms crt ) would mean having _beginthreadex and _endthreadex instead of the Windows Api function calls, though obviously if DevC++/MinGW is happily using dymamic linked crt, then it's one customisation not needed there.

to tell the difference detween static and dynamic linkage to the C-Runtimes,
#ifdef _WIN32
#if ( defined(_MT) && defined(_DLL) )
   // Dynamic CRT
#else
   // static library or old single threaded
#endif // dynamic or static?
#endif // _WIN32


With respect to the buffered IO that gets cancelled, forcing commit mode through either mechanism (open mode or commode.obj) works, though the above should make that unnecessary, except if for some reason the Boinc client makes it to its TerminateProcess() call and kills the process too early, or the user kills the process using task manager. In those last two conditions truncated/missing stderr might be deemed acceptable.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1650928 · Report as offensive

Message boards : Number crunching : No output again (for just one WU)?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.