Log/Tracking Tool


log in

Advanced search

Message boards : Number crunching : Log/Tracking Tool

Author Message
Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 33,231,677
RAC: 11,561
United States
Message 1364469 - Posted: 4 May 2013, 17:39:59 UTC

Greetings, all.

Quick question. Have any of our talented Setizens with way too much time on their hands created a logging or tracking tool that does real time, local monitoring and reporting of work units by machine? BoincStats and the other online systems do a great job of reporting credits. But frankly, after doing this for 13 years, I really could care less about credits. By next week, I'll have 25 Million of them. If I could sell them on eBay, I would.

I am more interested in a report that neatly details how many work units my rig received and processed by type (MB/AP). When was each WU received, how was it processed (CPU/GPU), how long did it take to complete (CPU/GPU time and clock time), when was it reported. What was the turnaround time? In short, I'm more interested in knowing how to get the most bang for my crunching buck than how I compare to anyone else.

All of these variables are tracked by BOINC in one way or another. I could find them by checking each result manually. I'm hoping to find a nice utility that does all that. Some trend lines and graphics would be nice, too. :)

Thanks.
____________

Join the PACK!

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2325
Credit: 8,867,528
RAC: 1,422
United States
Message 1364689 - Posted: 5 May 2013, 7:07:30 UTC

I don't know of any automated method, but if you pull up your application details here on the website for each rig's details, it should give you "number of tasks completed" for each type of work on that machine. Average processing rate and average turn-around time is a recent average and not a total average.

Other than that, you can probably load up 'job_log_setiathome.berkeley.edu.txt' (found in the data directory where client_state and cc_config are) into Excel or something similar and have it separate into fields, using space as a delimiter, and then from there, you can gather some information with formulas about it, or make charts.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile rebestProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Apr 00
Posts: 1296
Credit: 33,231,677
RAC: 11,561
United States
Message 1364867 - Posted: 5 May 2013, 17:55:20 UTC - in response to Message 1364689.

I don't know of any automated method, but if you pull up your application details here on the website for each rig's details, it should give you "number of tasks completed" for each type of work on that machine. Average processing rate and average turn-around time is a recent average and not a total average.

Other than that, you can probably load up 'job_log_setiathome.berkeley.edu.txt' (found in the data directory where client_state and cc_config are) into Excel or something similar and have it separate into fields, using space as a delimiter, and then from there, you can gather some information with formulas about it, or make charts.


Many thanks for the reply and identifying more data sources. Unfortunately, I'm not particularly talented or creative with such things and right now I have very little time to undertake such a project.

Tracking a machine's crunching performance by credits is OK, but measures such as RAC are (to borrow a term from economics) a "lagging indicator", meaning that other factors must occur (wingmen and validation) before the credit is actually counted. Also, we have no way of knowing when a tweak behind the scenes in Berkeley has occurred altering the formula for calculating credit. It's happened before.

Thanks again.


____________

Join the PACK!

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8732
Credit: 61,603,854
RAC: 61,251
United Kingdom
Message 1364885 - Posted: 5 May 2013, 18:27:59 UTC

All credits are awarded after the appropriate quorum for the work unit has been reached, so even "credits" is a lagging indicator.
RAC is a long tail rolling average, so is influenced by what has happened in the past.
For any meaningful comparison you have to make your observation at the same time each day, or make allowances for the different observational day lengths. The former can be done with a fairly simple cron job to trigger the grabbing of the "your computers" page, parsing it to get the data out and dumping it into a spreadsheet. Or, do it the way I do, while chewing breakfast (which is normally "at the same time" each weekday) I quickly drop onto the page, scribble the numbers on the pad and do the sums; I suppose if I were feeling enthusiastic I could put the raw numbers into a spreadsheet and do all sorts of wizzy things with them, but I'm too busy chomping, reading emails, working out what's going to happen at work, where I'm meant to be going...
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2325
Credit: 8,867,528
RAC: 1,422
United States
Message 1364993 - Posted: 5 May 2013, 23:31:25 UTC

Well, my machines are slow and I only have two of them, so I actually keep track of all of my APs with Excel. I pull up the tasks page probably 2-3 times a day and add new wuIDs and taskIDs, and the beam/channel for each task assigned, and then add the data for the ones that have been reported, and then print the taskID page to PDF. I have all of them since we switched over to _v6 for both machines.

It only takes a few minutes each day, unless there are some days like when the splitters ran out of tapes to work on for several days and then I get 30 new tasks, then I have a lot of numbers to type. Or when the servers go down for several days to be relocated or to be repaired and reporting tasks can't happen and I accumulate 10-20 completed APs that get reported all at once, then I have a lot of data to enter in and print to PDF.

But for MBs, and especially if there's a GPU involved, I wouldn't even TRY to use my manual method.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4584
Credit: 121,512,654
RAC: 59,181
United States
Message 1365115 - Posted: 6 May 2013, 13:09:57 UTC

For some time I have been wanting to do something like this as well. Sadly I have not got around to it. It would be nice to have something to look at when I see one of my 5 i7-860 machines suddenly climbing in RAC when the other 4 are not, or when an obviously slower machine overtakes a faster one.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3587
Credit: 48,709,135
RAC: 27,923
Russia
Message 1365134 - Posted: 6 May 2013, 15:09:12 UTC
Last modified: 6 May 2013, 15:16:40 UTC

There is Perl script I wrote some time ago that parces client_state.xml for MB and AP tasks and extracts some useful statistics. For OpenCL MB it's even some counters so one can plot how execution time depends on number of signals found for example. But more usual usage is to plot execution time vs AR for MB and execution time vs % blanking for AP.
Usage of this script requires some Perl interpretator available (not too hard to find very tiny ones) and network switched off (in BOINC) for some time to collect results locally.
Usually I fill cache, compute day or 2 w/o network then copy client_state.xml in other location and enable network in BOINC to fill cache.

EDIT:
current version I use: ExtractTimes_v3.pl

$path="client_state.xml"; $results="Times.txt"; open (IN, $path); open (RES, ">".$results); print RES "Task name"."\t"."Result Type"."\t"."Revision"."\t"."Parameter"."\t"."ElapsedTime"."\t"."CPUTime"."\t"."PoT_transfer_needed"."\t"."PoT_transfer_not_needed"."\t"."Gaussian_transfer_needed"."\t"."Gaussian_transfer_not_needed"."\t"."PC_pulse_find_early_miss"."\n"; while (<IN>) { if( /<result>/ ){ #R: we need only result sections in file $Gaussian_transfer_not_needed=0; $Gaussian_transfer_needed=0; $PoT_transfer_not_needed=0; $PoT_transfer_needed=0; $PC_pulse_find_early_miss=0; $trueAR=-1;#R: error condition, will not store such blocks $WUname=""; $ElapsedTime=0; $CPUTime=0; $GPU=0;#R: 2-NV $ResultType=0;#R: 1-AKv8,2-ATI MB,3-ATI AP,4-CPU AP, 5-CUDA MB, 6-NV AP $Revision=0;#R: to distinguish between different builds while(<IN>){ if( /<\/result>/ ){ #R: ready to analyse collected info if( ($trueAR==-1) || ($ElapsedTime==0) ||($CPUTime==0) || ($Revision==0) || ($ResultType==0) ){ last;#R: record unready } print RES $WUname."\t".$ResultType."\t".$Revision."\t".$trueAR."\t".$ElapsedTime."\t".$CPUTime."\t".$PoT_transfer_needed."\t".$PoT_transfer_not_needed."\t".$Gaussian_transfer_needed."\t".$Gaussian_transfer_not_needed."\t".$PC_pulse_find_early_miss."\n"; last;#R: finished with this result record } if(/<name>(.*)<\/name>/){ $WUname=$1; next; } if( /<final_cpu_time>(.*)<\/final_cpu_time>/ ){ $CPUTime=$1; next; } if( /<final_elapsed_time>(.*)<\/final_elapsed_time>/ ){ $ElapsedTime=$1; next; } if( /<exit_status>(.*)<\/exit_status>/ ){ if($1 !=0){ last;#R:invalid result, no need to look into it further } next; } if(/USE_OPENCL_NV/){$GPU=2;} if(/OpenCL/ && $GPU==0){$GPU=1;} if( /AstroPulse v./ ){#R: AP detection block $ResultType=4; while(<IN>){ if(/Windows x86 rev (.*), 6/){ $Revision=$1; next; } if(/Windows x86 rev (.*), V/){ $Revision=$1; next; } if($GPU==1){ $ResultType=3;} if($GPU==2){$ResultType=6;} if( /<\/stderr_txt>/){ last;} if( /<\/result>/ ){#R: broken stderr $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last; } if( /percent blanked: (\S*)/){ $trueAR=$1;#R: for AP this parameter means blanking instead of AR next; } if( /Found 30 single pulses and 30 repeating pulses, exiting./){ $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last;#R: don't count overflows } if( /repetitive pulses: 30/){#R: no need task with FFA disabled $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last;#R: don't count overflows } } } if( /Windows x86 rev (.*), Don't Panic!/ || /Linux 64 bit, rel. Rev (.*)/ || /Linux 32 bit, rel. Rev (.*)/){#uje: CPU opt AP detection $ResultType=4; $Revision=$1; next; } if( /Multibeam x32f Preview/){ $Revision="32"; $ResultType=5; while (<IN>) { if( /Informational message -9 result_overflow/){ $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last;#R: don't count overflows } if( /<\/result>/ ){#R: broken stderr $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last; } if( /WU true angle range is : (\S*)/ ){ $trueAR=$1; next; } if( /<\/stderr_txt>/){ last;} } } if( /Build (.*) , Ported by/ ){#R: ATI MB/AKv8 detection block $Revision=$1; $ResultType=1;#R: suppose AKv8 by default, change if it's ATI MB while (<IN>) { if( /Informational message -9 result_overflow/){ $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last;#R: don't count overflows } if( /<\/result>/ ){#R: broken stderr $Revision=$CPUTime=$ElapsedTime=0;$trueAR=-1;last; } if( /WU true angle range is : (\S*)/ ){ $trueAR=$1;next; } if(/OpenCL version by Raistmer/){ $ResultType=2;#R: do result type correction next; } if(/class Gaussian_transfer_needed: total=(\S*),/){ $Gaussian_transfer_needed=$1; next; } if(/class Gaussian_transfer_not_needed: total=(\S*),/){ $Gaussian_transfer_not_needed=$1;next; } if(/class PoT_transfer_not_needed: total=(\S*),/){ $PoT_transfer_not_needed=$1; next; } if(/class PoT_transfer_needed: total=(\S*),/){ $PoT_transfer_needed=$1; next; } if(/class PC_pulse_find_early_miss: total=(\S*),/){ $PC_pulse_find_early_miss=$1; next; } if( /<\/stderr_txt>/){ last;} }#R:finish with MB task } } } }

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8755
Credit: 52,702,464
RAC: 31,558
United Kingdom
Message 1365218 - Posted: 6 May 2013, 18:18:19 UTC - in response to Message 1365151.

There used to be a couple of guys that would log results and make graphs of some of my rigs when we were testing apps or the old HT on/off question.
I don't know how automated they were or if they took a lot of manual manipulation.

I think archae86 and maybe Richard were the folks, but don't quote me on that.

Archae86 has certainly worked on HT - in more ways than one - most recently at Einstein: http://einstein.phys.uwm.edu/forum_thread.php?id=10000.

I worked with Joe Segur and others (add Fred W's name to the list) on 'estimates and deadlines'.

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 944
Credit: 25,163,734
RAC: 2,955
Australia
Message 1365380 - Posted: 7 May 2013, 10:10:59 UTC

BOINC logx might help. I haven't used it myself but its supposed to be a logging tool for BOINC clients. Maybe those who have experience with it could chime in here.
____________
BOINC blog

Message boards : Number crunching : Log/Tracking Tool

Copyright © 2014 University of California