Stderr Truncations

Message boards : Number crunching : Stderr Truncations
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6323
Credit: 106,370,077
RAC: 121
Russia
Message 1700547 - Posted: 11 Jul 2015, 16:59:07 UTC - in response to Message 1700503.  
Last modified: 11 Jul 2015, 17:03:47 UTC


The code is on github: https://github.com/Milkyway-at-home/milkywayathome_client

same wait for event and sleep between polls. And clFinish after some (not time-critical?) kernels.

#ifdef _WIN32
  #define mwMilliSleep(x) Sleep((DWORD) (x))
  /* The usleep() in MinGW tries to round up to avoid sleeping for 0 */
  #define mwMicroSleep(x) Sleep(((DWORD) (x) + 999) / 1000)
#else
  #define mwMilliSleep(x) usleep((useconds_t) 1000 * (x))
  #define mwMicroSleep(x) usleep((useconds_t)(x))
#endif /* _WIN32 */


Still nothing better than old Sleep() for Win32. And Sleep requires big enough kernel (due to its own granularity).

"Not interesting" case.

What is interesting (but unrelated to CPU usage) is direct IL usage technique. Some of their kernels written on IL, smth like macro-assembler for CPUs but for GPU. This could account for great speed and GPU utilization of MW.
ID: 1700547 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1700550 - Posted: 11 Jul 2015, 17:10:52 UTC - in response to Message 1700544.  

Richard, you also produced an invalid via the truncation mechanism Task 1184514362

So I have - that happened while I was working on the previous one.

11-Jul-2015 17:26:40 [---] [slot] cleaning out slots/1: handle_exited_app()
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/astronomy_parameters.txt
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/boinc_finish_called
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/boinc_task_state.xml
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/init_data.xml
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/separation_checkpoint
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/stars.txt
11-Jul-2015 17:26:40 [---] [slot] removed file slots/1/stderr.txt
11-Jul-2015 17:26:40 [Milkyway@Home] Computation for task ps_fast_15_3s_136_sim1Jun1_1_1434554402_8831369_0 finished
11-Jul-2015 17:26:40 [---] [slot] cleaning out slots/1: get_free_slot()
11-Jul-2015 17:26:40 [Milkyway@Home] [slot] assigning slot 1 to de_modfit_fast_15_3s_136_sim1Jun1_2_1434554402_8831651_0

Nothing much to go on there, either.

Edit - the first one (totally blank stderr) happened while I was out of the house for a walk. The second one happened while I was at the console, opening and searching a 38 MB stdoutdae.txt file, and loading about 40 source code files into Notepad++. I'm not very good at cleaning out my working environment after use...

Um...I know I'm not a Milkyway guy, but are those are those log entries actually from that particular task? I see ps_fast_15_3s_136_sim1Jun1_1_1434554402_8833447_0 in the task detail.
ID: 1700550 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6323
Credit: 106,370,077
RAC: 121
Russia
Message 1700552 - Posted: 11 Jul 2015, 17:11:30 UTC - in response to Message 1700536.  

If the question is kernel size, then the answer becomes choosing a development platform that suits the granularity of the problem under consideration.

It's granularity of algorithm and device used to run it, not "granularity of platform". I'm not sure that suggestion aspecially wrong one can become an "answer".

I accept the semantic correction. Granularity of algorithm, device, implementation, and runtime support libraries.

only 2 first really matter (granularity of runtime support in this particular case is Windows OS quantum and Sleep() call granularity). Algorithm bound with task one try to solve. IF it's large area integration for example, it has nothing to do with SETI. And if it's FFT-based signal modification it has nothing to do with Nbody problem. And device.... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime.
ID: 1700552 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700560 - Posted: 11 Jul 2015, 17:24:08 UTC - in response to Message 1700550.  

Um...I know I'm not a Milkyway guy, but are those are those log entries actually from that particular task? I see ps_fast_15_3s_136_sim1Jun1_1_1434554402_8833447_0 in the task detail.

Beg pardon, I never was good at multitasking. Try

11-Jul-2015 17:37:04 [---] [slot] cleaning out slots/0: handle_exited_app()
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/astronomy_parameters.txt
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/boinc_finish_called
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/boinc_task_state.xml
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/init_data.xml
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/separation_checkpoint
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/stars.txt
11-Jul-2015 17:37:04 [---] [slot] removed file slots/0/stderr.txt
11-Jul-2015 17:37:04 [Milkyway@Home] Computation for task ps_fast_15_3s_136_sim1Jun1_1_1434554402_8833447_0 finished
11-Jul-2015 17:37:04 [---] [slot] cleaning out slots/0: get_free_slot()
11-Jul-2015 17:37:04 [Milkyway@Home] [slot] assigning slot 0 to de_modfit_sum_fast_15_3s_136_sim1Jun1_4_1434554402_8812893_2

I did think the times looked a bit odd, too.
ID: 1700560 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700566 - Posted: 11 Jul 2015, 17:36:21 UTC - in response to Message 1700552.  

... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime.

That takes us back to the nub of my question. Is it an imperative, and if so, so whom? I suspect it's less an imperative for the project, than for a user who shelled out personal bucks for hardware and doesn't like to be told "it isn't the best device for this project, go try use it somewhere else" - as I believe has happened with single-precision hardware at Milkyway. I don't think you could say it was an imperative for Milkyway to re-write their algorithm in a form suitable for single-precision devices.

But what's a "device", anyway? Is it pure hardware, hence the requirement for double-precision purchases for MW? Or is it the combination of the hardware and the software chosen to run on it? I believe it's mathematically possible to achieve double-precision accuracy on single-precision hardware, by software emulation. But it's horribly slow - so one doesn't choose to use that method, even if it would enable a wider range of (hardware) devices to be utilised.
ID: 1700566 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1700567 - Posted: 11 Jul 2015, 17:39:50 UTC - in response to Message 1700560.  
Last modified: 11 Jul 2015, 18:08:14 UTC

Chuckle...no material difference anyway.

Here's a new twist for you, though. I just went and looked at my xw9400 to see if Slot 16 had been created in the latest run and found that it had, although it was now empty. A check of the log found an AP task, #4260258349, with an apparently complete Stderr but which failed the slot clean out, causing the subsequent task to get assigned to Slot 16. Here are the relevant log entries:

11-Jul-2015 02:58:58 [---] [slot] cleaning out slots/3: get_free_slot()
11-Jul-2015 02:58:58 [SETI@home] [slot] assigning slot 3 to ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2
11-Jul-2015 02:58:58 [SETI@home] [cpu_sched] Preempting 12fe15aa.25759.18071.438086664205.12.13.vlar_1 (removed from memory)
11-Jul-2015 02:58:58 [---] [slot] removed file slots/3/init_data.xml
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe to slots/3/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/libfftw3f-3-3-4_x86.dll to slots/3/libfftw3f-3-3-4_x86.dll
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2887.cl to slots/3/AstroPulse_Kernels_r2887.cl
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_cmdline_win_x86_SSE2_OpenCL_NV.txt to slots/3/ap_cmdline.txt
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_NV_config.xml to slots/3/AstroPulse_NV_config.xml
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu to slots/3/in.dat
11-Jul-2015 02:58:58 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0 to slots/3/pulse.out
11-Jul-2015 02:58:58 [---] [slot] removed file slots/3/boinc_temporary_exit
11-Jul-2015 02:58:58 [SETI@home] Starting task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2
11-Jul-2015 02:58:58 [SETI@home] [cpu_sched] Starting task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 using astropulse_v7 version 705 (cuda_opencl_100) in slot 3
...
11-Jul-2015 03:39:56 [SETI@home] Message from task: 0
11-Jul-2015 03:39:56 [---] [slot] cleaning out slots/3: handle_exited_app()
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/ap_cmdline.txt
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/ap_state.dat0
11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/ap_state.dat1: unlink() failed
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/AstroPulse_Kernels_r2887.cl
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/AstroPulse_NV_config.xml
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/boinc_finish_called
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/boinc_task_state.xml
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/in.dat
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/indices.txt
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/init_data.xml
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/libfftw3f-3-3-4_x86.dll
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out0
11-Jul-2015 03:39:56 [---] [slot] removed file slots/3/pulse.out1
11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/stderr.txt: unlink() failed
11-Jul-2015 03:39:56 [SETI@home] Computation for task ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2 finished
11-Jul-2015 03:39:56 [---] [slot] cleaning out slots/3: get_free_slot()
11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/ap_state.dat1: unlink() failed
11-Jul-2015 03:39:56 [---] [slot] failed to remove file slots/3/stderr.txt: unlink() failed
11-Jul-2015 03:39:56 [SETI@home] [slot] failed to clean out dir: unlink() failed
11-Jul-2015 03:39:56 [SETI@home] [slot] assigning slot 16 to ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0
11-Jul-2015 03:39:56 [---] [slot] removed file slots/16/init_data.xml
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe to slots/16/astropulse_7.10_windows_intelx86__opencl_nvidia_100.exe
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/libfftw3f-3-3-4_x86.dll to slots/16/libfftw3f-3-3-4_x86.dll
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_Kernels_r2887.cl to slots/16/AstroPulse_Kernels_r2887.cl
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_cmdline_win_x86_SSE2_OpenCL_NV.txt to slots/16/ap_cmdline.txt
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/AstroPulse_NV_config.xml to slots/16/AstroPulse_NV_config.xml
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_04jn15aa_B2_P1_00172_20150710_03651.wu to slots/16/in.dat
11-Jul-2015 03:39:56 [SETI@home] [slot] linked ../../projects/setiathome.berkeley.edu/ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0_0 to slots/16/pulse.out
11-Jul-2015 03:39:56 [---] [slot] removed file slots/16/boinc_temporary_exit
11-Jul-2015 03:39:56 [SETI@home] Starting task ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0
11-Jul-2015 03:39:56 [SETI@home] [cpu_sched] Starting task ap_04jn15aa_B2_P1_00172_20150710_03651.wu_0 using astropulse_v7 version 705 (cuda_opencl_100) in slot 16
11-Jul-2015 03:39:59 [SETI@home] Started upload of ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0
11-Jul-2015 03:40:02 [SETI@home] Finished upload of ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0
11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0
11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0.gz
11-Jul-2015 03:40:02 [---] [slot] removed file projects/setiathome.berkeley.edu/ap_02jn15ac_B4_P1_00319_20150709_22193.wu_2_0.gzt

More grist for somebody's mill, perhaps?

EDIT: Went back and highlighted the first "ap_state.dat1: unlink() failed", which I originally overlooked.
ID: 1700567 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6323
Credit: 106,370,077
RAC: 121
Russia
Message 1700569 - Posted: 11 Jul 2015, 17:49:19 UTC - in response to Message 1700566.  
Last modified: 11 Jul 2015, 17:50:30 UTC

... well, the imperative here to use any available computational device, not the one with "right granularity" for particular problem. So I still missing the point in your solution to CPU usage issue for OpenCL NV runtime.

That takes us back to the nub of my question. Is it an imperative, and if so, so whom? I suspect it's less an imperative for the project, than for a user who shelled out personal bucks for hardware and doesn't like to be told "it isn't the best device for this project, go try use it somewhere else" - as I believe has happened with single-precision hardware at Milkyway. I don't think you could say it was an imperative for Milkyway to re-write their algorithm in a form suitable for single-precision devices.

But what's a "device", anyway? Is it pure hardware, hence the requirement for double-precision purchases for MW? Or is it the combination of the hardware and the software chosen to run on it? I believe it's mathematically possible to achieve double-precision accuracy on single-precision hardware, by software emulation. But it's horribly slow - so one doesn't choose to use that method, even if it would enable a wider range of (hardware) devices to be utilised.

1) Imperative of BOINC very existence: to use free (initially, CPU) cycles for doing science.
And to select right science by available hardware? What a nonsence. if one doesn't see the need in some research or don't want to support it personally - why one should think about what hardware best suits??

2) you plain wrong here, regarding double precision emulation. That's exactly what was used for CUDA and later for all GPU SETI builds. Both AP and MB require double precision clculations to keep precision in trigonometric functions. And emulation used. Moreover, at least for some GPUs with double precision hardware support it happens to be faster. So, yep, we did it. And did it successfully. MW is another case cause there DP required everywhere along the path AFAIK. Well, just another point not to directly compare different algorithms and their possible issues in implementation.
ID: 1700569 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700572 - Posted: 11 Jul 2015, 18:09:45 UTC

A new one:

11-Jul-2015 18:57:59 [---] [slot] cleaning out slots/0: handle_exited_app()
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/astronomy_parameters.txt
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/boinc_finish_called
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/boinc_task_state.xml
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/init_data.xml
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_nvidia_101.exe
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/separation_checkpoint
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/stars.txt
11-Jul-2015 18:57:59 [---] [slot] failed to remove file slots/0/stderr.txt: Error 32
11-Jul-2015 18:57:59 [Milkyway@Home] Computation for task ps_fast_15_3s_136_sim1Jun1_1_1434554402_8853161_0 finished
11-Jul-2015 18:57:59 [---] [slot] cleaning out slots/0: get_free_slot()
11-Jul-2015 18:57:59 [---] [slot] removed file slots/0/stderr.txt
11-Jul-2015 18:57:59 [Milkyway@Home] [slot] assigning slot 0 to de_modfit_sum_fast_15_3s_136_sim1Jun1_4_1434554402_8844314_2

(stderr truncated)
ID: 1700572 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700576 - Posted: 11 Jul 2015, 19:10:43 UTC - in response to Message 1700572.  

Error 32

From https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx:

ERROR_SHARING_VIOLATION
32 (0x20)
The process cannot access the file because it is being used by another process.

sandbox.cpp, lines 237-248

static int delete_project_owned_file_aux(const char* path) {
#ifdef _WIN32
    if (DeleteFile(path)) return 0;
    int error = GetLastError();
    if (error == ERROR_FILE_NOT_FOUND) {
        return 0;
    }
    if (error == ERROR_ACCESS_DENIED) {
        SetFileAttributes(path, FILE_ATTRIBUTE_NORMAL);
        if (DeleteFile(path)) return 0;
    }
    return error;
ID: 1700576 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1700579 - Posted: 11 Jul 2015, 19:22:26 UTC - in response to Message 1700576.  
Last modified: 11 Jul 2015, 19:41:22 UTC

yeah, bounced a suggestion in response to the global 'Ányone?' dev list cry for ideas. Will see if waiting for the I have no idea what's going on' flag, was a suitable cue''or not I guess.

[Edit:] for posterity
... Anyone have any ideas?...


Not sure how much detail you'd like on the situation. (Can provide much more) It's a result of buffered IO implemented in multithreaded C Runtimes, in some situations using deferred procedure calls. Internal helper threads are being killed before commits are completed.

least desirable partial workaround (but helps):
- disable buffered IO by linking the application with the ms supplied COMMODE.OBJ

Probably Better, but not tested:
- initiate a low level _commit() and add the missing WaitForSingleObject() after the TerminateProcess Call,

Best:
- do a low level _comit() and check the file modification time updated, then preferably use a friendly means of exit that allows DLL/Thread cleanup, closing threads/processes using sentinel flags, like while(!done) instead of while(1) with kills.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1700579 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700592 - Posted: 11 Jul 2015, 20:23:03 UTC - in response to Message 1700579.  

David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then.
ID: 1700592 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1700594 - Posted: 11 Jul 2015, 20:29:04 UTC - in response to Message 1700592.  

David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then.

Okay, will add that when I start the xw9400 back up this evening.
ID: 1700594 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1700595 - Posted: 11 Jul 2015, 20:36:53 UTC - in response to Message 1700592.  

Ugh, did that boinc_dev list really remove my end of lines in the email trial patch ? or just my hotmail did it ?
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1700595 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13135
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1700596 - Posted: 11 Jul 2015, 20:37:26 UTC

I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1700596 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700597 - Posted: 11 Jul 2015, 20:40:46 UTC - in response to Message 1700596.  

I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for.

We're looking for either/both, but most of all we're looking for evidence like those 'unlink failed' or 'error 32' which might hint at the underlying cause.
ID: 1700597 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1700598 - Posted: 11 Jul 2015, 20:43:55 UTC - in response to Message 1700595.  

Ugh, did that boinc_dev list really remove my end of lines in the email trial patch ? or just my hotmail did it ?

I think it's the webmail interfaces we're both sending with. I'm using BT Internet, but it's piggybacking on a Yahoo service.

Since we're copying direct to David, and he's using a text only mail client, he'll see the clean version with line breaks and without the font variations.
ID: 1700598 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1700599 - Posted: 11 Jul 2015, 20:46:36 UTC - in response to Message 1700597.  
Last modified: 11 Jul 2015, 20:49:10 UTC

I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for.

We're looking for either/both, but most of all we're looking for evidence like those 'unlink failed' or 'error 32' which might hint at the underlying cause.


That being the one second sleep followed by a debugbreak, that seems to be there because they didn't put the required wait in after TerminateProcess(). That debugbreak() will attempt to download symbols. You could probably watch internet connections spark up on some scenarios at least. If you already had the requisite symbol PDB file in place, you could probably get an intact stderr with a boinc debugger dump in it.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1700599 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1700600 - Posted: 11 Jul 2015, 20:49:10 UTC - in response to Message 1700596.  

I've done so also. Unlikely though that I will produce any helpful data since I seem to produce only 'empty' stderr.txt results and not the truncated ones you're looking for.

When we were looking at these early last year, I found that there was a tendency for a given machine or OS to produce either one type or the other, though it wasn't entirely consistent. My xw9400 (running Win XP) generally produces the truncated ones and I think my other XP machines did, also. My daily driver (Win Vista) and a T7400 (Win 8.1) generally put out the entirely empty variety.

Which reminds me. I could reactivate that T7400 and install BOINC v7.6.2 on it if samples of empty Stderrs on S@h would be useful. (I'm not going to put v7.6.2 on my daily driver. The terms "MAY BE UNSTABLE" and daily driver don't go together in my world!)
ID: 1700600 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13135
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1700615 - Posted: 11 Jul 2015, 22:07:03 UTC - in response to Message 1700592.  

David also asked that anyone planning to submit further evidence on this problem "set <task_debug> as well [as <slot_debug>]". I've done that, but Murphy's law dictates that I haven't seen a failure to delete stderr.txt since then.


Richard, curious and curiouser. I don't know if its Murphy or not, but since I set the <task_debug> along with the previous <slot_debug> and <cpu_sched> flags I haven't produced an invalid at MW yet. I haven't had a really good run of 1.36 app tasks on either machine though since I changed the flags. Will continue to monitor their progress.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1700615 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1700621 - Posted: 11 Jul 2015, 22:31:40 UTC - in response to Message 1700615.  
Last modified: 11 Jul 2015, 22:46:24 UTC

If the stable situation continues only with the flag, aside from a bug in the actual flag code, there can be effects because you're dealing with a statistical time related phenomena. You're 'looking' at the process and affecting the outcome by altering the client behaviour. IOW, reaching for your binoculars to check on schrodinger's box gave just enough time for the cat to decide it wasn't dead, and run away. (The Boinc client and the app are sharing the system, and take turns in timeslices [quanta] )
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1700621 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : Stderr Truncations


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.