Calibrating Client v5.3.12.tx37 - fair credits in all projects

Message boards : Number crunching : Calibrating Client v5.3.12.tx37 - fair credits in all projects
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 233290 - Posted: 19 Jan 2006, 0:54:28 UTC
Last modified: 19 Jan 2006, 0:57:06 UTC

I released a new version (5.3.12.tx34) - it is using the newest official revision, and there is a new useful feature.

Our team member Vejpuste aka Uftoun - Zemedelec, who crunches huge number of units on many machines daily, several times experienced defectuous WU's that blocked some computers for many many hours without triggering an error and without completing. To avoid such case, he wrote a shell script watching his machines and alerting him per email in case of such problems. He can then look at the WU details on the web, and kill it when appropriate.

I did not add any emailing (though it may be another future extension used also for other cases), but you can now assing the time threshold, when you want to be notified. The number expreses the multplication of the estimated WU time, so in the following example, the client will pop up an alert message when the WU continue crunching longer than two and half times the estimated time:

<check_max_time>2.5</check_max_time>

The popup dialogue box lets you decide whether you want to abort the WU or not. The event is recorded to the messages log file, of course, too.

PS: I just tested it with BOINC running as service. I think the message box will work also when running in application mode, but possibly I may need to use another message handler. Should you experience problems, please let me know.

Binary and source code available on my webserver as usually.

BTW: Vejpuste also succesfully compiled the previous revision of my client for Linux, so I'll add it to my server. I plan adding a FreeBSD version too, but am not sure when I find time to look at it.
trux
BOINC software
Freediving Team
Czech Republic
ID: 233290 · Report as offensive
Profile StokeyBob
Avatar

Send message
Joined: 31 Aug 03
Posts: 848
Credit: 2,218,691
RAC: 0
United States
Message 233293 - Posted: 19 Jan 2006, 1:03:46 UTC - in response to Message 232992.  
Last modified: 19 Jan 2006, 1:13:42 UTC


I think the reason my machine crashed is it is having trouble connecting to report and download. None of my machines have tolerated prolonged connection failures for about the last six months.


Was having the same problem for a few months. About 3 weeks ago I finally found the culprit. My nic card was faulty and since I replaced it not one crash. This was a tough one to narrow down because I never thought that the nic was having any problems. Hope this helps you StokeyBob :)

Edit: sorry I'm OT


I'm not sure if it could be a network card. I've had trouble on four different machines. They all do go through the same router and cable modem though.

The Linux machine has had the least trouble. It also has the least trouble with making connections during the outages.

I think they just get over worked with doing the work and the downloads both and then if something interrupts the process it ends with a "Pool_Corruption".

I have a web-cam with a motion sensor that I was leaving on during the day. It may have been messing one of the machines up. I turned it off and have had no trouble since with that machine.

The other day I noticed that Windows automatically reset the clock at the same time as a crash on a different machine. It was during a time when the uploads and downloads were backed up.

I switched the Windows machines to save a kernel memory dump and see if I can learn more.

P.S. Another thing that seems to have trouble muscling its way is is my Trend Micro virus scanner. I've found my computers locked up with the scan stuck in the middle.
ID: 233293 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 233297 - Posted: 19 Jan 2006, 1:22:24 UTC - in response to Message 233293.  

I'm not sure if it could be a network card.
I apologize for the mentoring, but this thread begins to be rather long, and it starts to be a big pain for modem users, when they have to download it all. I'd prefer if you could discuss unrelated problems in a separate thread. This could easilly turn this thread into a never-ending discussion about hardware or other problems in no way related to the calibration client we discuss here.

Sorry and thanks for your comprehension,

trux
BOINC software
Freediving Team
Czech Republic
ID: 233297 · Report as offensive
Marky-UK
Volunteer tester

Send message
Joined: 1 Nov 05
Posts: 10
Credit: 356
RAC: 0
United Kingdom
Message 233496 - Posted: 19 Jan 2006, 15:00:43 UTC

This is purely a cosmetic thing, but is the optimising client changing project names to lowercase now? The PC I installed it on has suddenly started reporting all lowercase project names to BOINCView (seti@home, einstein@home, etc, instead of SETI@home, Einstein@home).

I know it's not a big thing.
ID: 233496 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 233561 - Posted: 19 Jan 2006, 17:12:59 UTC - in response to Message 233496.  
Last modified: 19 Jan 2006, 17:17:05 UTC

Lowercase project names should be fixed in v5.3.12.tx36

There is also a change in the monitoring of the max crunching time of each WU - now it suspends the WU immediately (unattended) and starts another WU. You can then decide whether you want to continue crunching the WU, or aborting it completely. The popup alert dialogue allowing you to abort the WU can be completely suppressed by another configuration setting (the suspending wll still work if check_max_time is defined).

The source code was slightly modified and is now compilable with GCC (Linux, FreeBSD, Mac?,...) and possibly other compilers too.
trux
BOINC software
Freediving Team
Czech Republic
ID: 233561 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 234155 - Posted: 20 Jan 2006, 13:14:53 UTC

Trux, I think I have discourverd a minor glitch.

When your version uses the Emergency project because the priority project is out of work it requests one second of work. This is enough to get one, and only one, work unit. On duel (or more I assume) CPU machines this results in one CPU only being used.

Somehow or other you need to check the number of avaliable CPU's and once the first WU is downloaded, repeat the request for 1 second of work, to get another WU, and repeat, till the number of avaliable WU's matches the number of cpu's. Somehow or other the system needs to remember that there are X CPU's and replace WU's as they are finished until work is avaliable again from the priority project.



Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 234155 · Report as offensive
Marky-UK
Volunteer tester

Send message
Joined: 1 Nov 05
Posts: 10
Credit: 356
RAC: 0
United Kingdom
Message 234158 - Posted: 20 Jan 2006, 13:22:20 UTC

If the priority project has run out of work and the client has dropped back to standard scheduling to allow work on other projects, what happens when work becomes available for the priority project again?
ID: 234158 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 234163 - Posted: 20 Jan 2006, 13:32:37 UTC - in response to Message 234158.  

If the priority project has run out of work and the client has dropped back to standard scheduling to allow work on other projects, what happens when work becomes available for the priority project again?


I guess it stops asking for work from the backup project, and the resource share/EDF takes care of things in the normal manner.


Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 234163 · Report as offensive
kevint
Volunteer tester

Send message
Joined: 17 May 99
Posts: 414
Credit: 11,680,240
RAC: 0
United States
Message 234222 - Posted: 20 Jan 2006, 15:45:15 UTC

Trux,
I have reviewed several threads but I can not find find anywhere where this has been talked about.

On a multiple CPU machine, is it possible to have 1 or 2 CPU's to be dedicated to one application and the other CPUs dedicated to other applications
For expample

CPU0 - Seti
CPU1 - Seti
CPU2 - Rosetta / SIMAP
CPU3 - Einstien

Reason: I have noticed that Boinc will switch all four processors to one application and crunch - and completly ignore the other projects for a time, unless there is only 1 or 2 WU to be processes for that application.

Thank you, and I am sorry if this has been talked about before.
ID: 234222 · Report as offensive
kevint
Volunteer tester

Send message
Joined: 17 May 99
Posts: 414
Credit: 11,680,240
RAC: 0
United States
Message 234228 - Posted: 20 Jan 2006, 15:54:00 UTC - in response to Message 233561.  

Lowercase project names should be fixed in v5.3.12.tx36

There is also a change in the monitoring of the max crunching time of each WU - now it suspends the WU immediately (unattended) and starts another WU. You can then decide whether you want to continue crunching the WU, or aborting it completely. The popup alert dialogue allowing you to abort the WU can be completely suppressed by another configuration setting (the suspending wll still work if check_max_time is defined).

The source code was slightly modified and is now compilable with GCC (Linux, FreeBSD, Mac?,...) and possibly other compilers too.



What about WU's that exceed the return deadline - can these be automaticly skipped or aborted so as not to waste cpu crunch time -
ID: 234228 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 234239 - Posted: 20 Jan 2006, 16:16:28 UTC - in response to Message 234155.  

Trux, I think I have discourverd a minor glitch.
Yes, this was already reported earlier. I thought it was already fixed, but it is well possible I've simply forgotten it. I'll look at it.

trux
BOINC software
Freediving Team
Czech Republic
ID: 234239 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 234241 - Posted: 20 Jan 2006, 16:19:18 UTC - in response to Message 234228.  

What about WU's that exceed the return deadline - can these be automaticly skipped or aborted so as not to waste cpu crunch time -
Yup, if you allow it in the conf file (delete_overdue), then certainly.

trux
BOINC software
Freediving Team
Czech Republic
ID: 234241 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 234245 - Posted: 20 Jan 2006, 16:22:24 UTC - in response to Message 234222.  
Last modified: 20 Jan 2006, 16:22:46 UTC

I have reviewed several threads but I can not find find anywhere where this has been talked about.
http://setiathome.berkeley.edu/forum_thread.php?id=26637

trux
BOINC software
Freediving Team
Czech Republic
ID: 234245 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 234278 - Posted: 20 Jan 2006, 17:14:35 UTC - in response to Message 234239.  

Trux, I think I have discourverd a minor glitch.
Yes, this was already reported earlier. I thought it was already fixed, but it is well possible I've simply forgotten it. I'll look at it.


Thanks. :-)


Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 234278 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 234689 - Posted: 21 Jan 2006, 3:05:50 UTC
Last modified: 21 Jan 2006, 3:11:33 UTC

OK, there is a new release (5.3.12.tx37) under the link Download >> beta versions at http://boinc.truxoft.com/core-cal.htm.

I made some modifications in the scheduler request, and hope it helps with getting WU of backup project(s) (when priority projects fail) for each CPU, on multiprocessor (or HT) machines. I did not test it myself yet, and it is late here in Europe, so I'll not do it today.

The scheduling mechanism is rather complicated and depending on far too many factors that are not all known at the same time. Hence, there will be always some situations where it will load some extra backup WU, or will need more than one scheduler request to feed all CPU's. You may also consider adding more than a single backup project, on multiprocessor machines.
trux
BOINC software
Freediving Team
Czech Republic
ID: 234689 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 234909 - Posted: 21 Jan 2006, 15:08:18 UTC

I loaded the v5.3.12.tx36 version on my D830 system and it looks like my Claimed Credits dropped:

<core_client_version>5.3.12.tx36</core_client_version>
<real_cpu_time>1863</real_cpu_time>
<corrected_cpu_time>1041</corrected_cpu_time>
<corrected_Mfpops>2106.5</corrected_Mfpops>
<stderr_txt>
Windows optimized S@H application by Crunch3r
Improvements by Tetsuji Maverick Rai, Hans Dorn, Harold Naparst, Ned Slider, Crunch3r, trux,...
$Using cache implementation by Hans Dorn $
$Build: Windows SSE3 Intel Pentium4 V2.10 by Crunch3r $
$Rev: 166.10 Windows SSE3 Intel Pentium4 V2.10 $
$Internal: +16SMA;+PA;+IA;+EFL-P4P $
Datapoints: 1048576
Windows optimized S@H application by Crunch3r
Improvements by Tetsuji Maverick Rai, Hans Dorn, Harold Naparst, Ned Slider, Crunch3r, trux,...
$Using cache implementation by Hans Dorn $
$Build: Windows SSE3 Intel Pentium4 V2.10 by Crunch3r $
$Rev: 166.10 Windows SSE3 Intel Pentium4 V2.10 $
$Internal: +16SMA;+PA;+IA;+EFL-P4P $
Datapoints: 1048576
Windows optimized S@H application by Crunch3r
Improvements by Tetsuji Maverick Rai, Hans Dorn, Harold Naparst, Ned Slider, Crunch3r, trux,...
$Using cache implementation by Hans Dorn $
$Build: Windows SSE3 Intel Pentium4 V2.10 by Crunch3r $
$Rev: 166.10 Windows SSE3 Intel Pentium4 V2.10 $
$Internal: +16SMA;+PA;+IA;+EFL-P4P $
Datapoints: 1048576
cache_miss: 20

</stderr_txt>


Is this normal? Will it go back up?
ID: 234909 · Report as offensive
kevint
Volunteer tester

Send message
Joined: 17 May 99
Posts: 414
Credit: 11,680,240
RAC: 0
United States
Message 234928 - Posted: 21 Jan 2006, 15:54:19 UTC - in response to Message 234689.  
Last modified: 21 Jan 2006, 16:06:41 UTC

OK, there is a new release (5.3.12.tx37) under the link Download >> beta versions at http://boinc.truxoft.com/core-cal.htm.

I made some modifications in the scheduler request, and hope it helps with getting WU of backup project(s) (when priority projects fail) for each CPU, on multiprocessor (or HT) machines. I did not test it myself yet, and it is late here in Europe, so I'll not do it today.

The scheduling mechanism is rather complicated and depending on far too many factors that are not all known at the same time. Hence, there will be always some situations where it will load some extra backup WU, or will need more than one scheduler request to feed all CPU's. You may also consider adding more than a single backup project, on multiprocessor machines.



Thanks trux - you are the BEST - I just sent you a donation via pay pal.. keep up the good work!
ID: 234928 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 234946 - Posted: 21 Jan 2006, 16:21:10 UTC - in response to Message 234909.  
Last modified: 21 Jan 2006, 17:20:03 UTC

n7rfa: I loaded the v5.3.12.tx36 version on my D830 system and it looks like my Claimed Credits dropped:

1) These messages tell nothing about the actual claimed credit
2) Did you read the note on the download page telling:

"the client may need to calculate a dozen or two of workunits before giving consistent results."

Thanks trux - you are the BEST - I just sent you a donation via pay pal.. keep up the good work!
It is not yet perfect. In some situations ir requests work even if it should not.

As for the donation - I think with PayPal, you can pay by a CC even without creating an account, but if it should be a problem, you could use the payment gateway at http://www.mivacentral.com/payments.mv, selecting truXoft as the recipient :) Thanks!




trux
BOINC software
Freediving Team
Czech Republic
ID: 234946 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 234969 - Posted: 21 Jan 2006, 16:48:53 UTC - in response to Message 234946.  

n7rfa: I loaded the v5.3.12.tx36 version on my D830 system and it looks like my Claimed Credits dropped:

1) These messages tell nothing about the actual claimed credit
2) Did you read the note on the download page telling:

"the client may need to calculate a dozen or two of workunits before giving consistent results."



Now that you mention it, I remember seeing that before. I just didn't see it today when I went to do the work.

I was seeing about a 50% drop in claimed credit and a like drop in the corrected CPU time over the tx12 version I was using before.

I see that the claimed credit is starting to creep back up now. I'll keep an eye on it.

Thanks for the work.

ID: 234969 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 235077 - Posted: 21 Jan 2006, 19:15:43 UTC

I can only say WOW!

Just checked how all my boxes are doing at Andyk's web site. My claimed credits and granted credits are all within .5 now. Not only that but now my work is sometimes being selected as the basis for the granted credit and even sometimes selected to be the canonical result. I no longer need to be the 4th unit reported and no longer is my work wasted as some would say. Never believed that anyway!

Thanks to Crunch3r and Trux for all your work!


Boinc....Boinc....Boinc....Boinc....
ID: 235077 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Calibrating Client v5.3.12.tx37 - fair credits in all projects


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.