report the results immediately!

Message boards : Number crunching : report the results immediately!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
SETI User

Send message
Joined: 29 Jun 02
Posts: 369
Credit: 0
RAC: 0
Germany
Message 307145 - Posted: 16 May 2006, 7:21:59 UTC

Hello,

is the Truxoft Boinc V5.3.12.txXX the only client that can report the results immediately?

Or have the new Boinc V5.4.9 this feature too?


Greetings!

ID: 307145 · Report as offensive
Profile neo32843
Avatar

Send message
Joined: 1 Dec 03
Posts: 33
Credit: 8,919
RAC: 0
United States
Message 307191 - Posted: 16 May 2006, 8:09:50 UTC

A little off topic, sorry, but is there a Truxoft that works with enhanced?
SETI@HOME - Crunching 24/7

SETI Classic to SETI Enhanced. I was there.


"I'm the Christian the Devil warned you about!"
ID: 307191 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 307193 - Posted: 16 May 2006, 8:15:52 UTC - in response to Message 307191.  

A little off topic, sorry, but is there a Truxoft that works with enhanced?
The existing Truxoft Boinc V5.3.12.tx36 works just fine with enhanced, but I recommend you turn off the calibration for SETI (not used on enhanced: DCF changes mess up the calibation for standard). But it's still worth it if you do optimised Einstein.

ID: 307193 · Report as offensive
Jack Gulley

Send message
Joined: 4 Mar 03
Posts: 423
Credit: 526,566
RAC: 0
United States
Message 307198 - Posted: 16 May 2006, 8:26:13 UTC - in response to Message 307191.  

A little off topic, sorry, but is there a Truxoft that works with enhanced?

His current client version works just fine with Enhanced. No changes required.

The calibration of course only attempts to adjust what it claimed, but has no effect on what is actually claimed or granted. So you might as well not enable the Calibration feature for Seti. But all of the other options such as Report_Results_Immediately still work just like they did.

Enhanced really only effects the science application, not the Client manager.

No word from Trux yet when he will put out a version of the new client with his remaining features that are still useful. Suspect he will wait until after a few more problems are fixed in the recommend new client. And NO, he does not have a version of the Enhanced science application. No need for it as Crunch3r has his optimized version out.
ID: 307198 · Report as offensive
Profile neo32843
Avatar

Send message
Joined: 1 Dec 03
Posts: 33
Credit: 8,919
RAC: 0
United States
Message 307201 - Posted: 16 May 2006, 8:33:49 UTC

OK thanks, now answer his question.

lol

Sorry, didn't mean to post-jack.
SETI@HOME - Crunching 24/7

SETI Classic to SETI Enhanced. I was there.


"I'm the Christian the Devil warned you about!"
ID: 307201 · Report as offensive
Profile Keck_Komputers
Volunteer tester
Avatar

Send message
Joined: 4 Jul 99
Posts: 1575
Credit: 4,152,111
RAC: 1
United States
Message 307286 - Posted: 16 May 2006, 11:46:06 UTC - in response to Message 307145.  

Hello,

is the Truxoft Boinc V5.3.12.txXX the only client that can report the results immediately?

Or have the new Boinc V5.4.9 this feature too?


Greetings!

This is a bug not a feature, and is the sole reason I do not endorse the trux clients.
BOINC WIKI

BOINCing since 2002/12/8
ID: 307286 · Report as offensive
Zap de Ridder
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 227
Credit: 1,468,844
RAC: 1
Netherlands
Message 307359 - Posted: 16 May 2006, 13:39:18 UTC - in response to Message 307286.  

Hello,

is the Truxoft Boinc V5.3.12.txXX the only client that can report the results immediately?

Or have the new Boinc V5.4.9 this feature too?


Greetings!

This is a bug not a feature, and is the sole reason I do not endorse the trux clients.

It is a feature and can set off by removing this line:

<return_results_immediately/>

in the truxoft_prefs.xml
ID: 307359 · Report as offensive
Profile Lee Carre
Volunteer tester

Send message
Joined: 21 Apr 00
Posts: 1459
Credit: 58,485
RAC: 0
Channel Islands
Message 307363 - Posted: 16 May 2006, 14:05:08 UTC - in response to Message 307359.  
Last modified: 16 May 2006, 14:06:00 UTC

Hello,

is the Truxoft Boinc V5.3.12.txXX the only client that can report the results immediately?

Or have the new Boinc V5.4.9 this feature too?


Greetings!
This is a bug not a feature, and is the sole reason I do not endorse the trux clients.
It is a feature and can set off by removing this line:

<return_results_immediately/>

in the truxoft_prefs.xml
it is a bug because no client is ment to have that kind of behaviour, there are reasons the projects don't want clients reporting results immediately, yes i know it's nice for users, but that's not the only factor here

people complain about how slow things are, how seti is always having problems and outages, well this is one of the things causing it, so stop reporting immediately
also a feature like that should be off by default, but if a user has to "remove" a string from a config file, i'm guessing it's on by default, again this is bad
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Engines
ID: 307363 · Report as offensive
WendyR
Volunteer tester
Avatar

Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 307364 - Posted: 16 May 2006, 14:11:10 UTC

I like the idea of returning results as soon as they are complete.

Jack Gulley posted a well thought-out reason for wanting this feature in this thread.

Basically, he uses his returned SETI results to monitor that his home network/computers are functioning correctly. He can do this from any internet connection, and does not require any unusual software or hardware to accomplish it. While it is not Berkley's responsibility to provide a monitoring system into people's home networks, it is a nice side benefit they provide for their volunteers.

It gives me some comfort knowing that my results are "safely home" to Berkley quickly. That means lower probability of me or my machines screwing something up and losing a completed result. In addition, returning my results faster gives me a greater probability of being part of the quorum, and/or my unit being chosen as the cannonical result (Sometimes it seems that algorithm does seem to include "if the machine is Wendy's don't pick it though... :) )

My "trivial but still annoying" reason -- my "active" results don't fit into the real estate on my screen with those "extra" results there.

The argument against this "feature" has always been server load. I want to question that. Really, how much of a load is it? And just where is that load?

According to the Wiki the process happens in two phases, and the "troublesome" part is the upload of the data file, which has always happened as soon as possible. The feature we are talking about is the contact with the scheduler.

My experience is that the problems are in contacting the upload/download server, rather than the scheduler. In addition, the latest round of problems has been in full disks on the upload/download server. Wouldn't letting the scheduler know that those results are there, ready for assimilation and deletion, help this situation? How many results get trashed when they are in that "Ready to report" status? Are those files still sitting out on the upload/download server? How long? They seem to have lots of capacity on the assimilator and deleter pieces of the process, and need some help on the upload/download part of the process. Wouldn't allowing "return results immediately" help this?

Just a few thoughts....

ID: 307364 · Report as offensive
Profile Lord_Vader
Avatar

Send message
Joined: 7 May 05
Posts: 217
Credit: 10,386,105
RAC: 12
United States
Message 307370 - Posted: 16 May 2006, 14:28:14 UTC

Isn't it better to maintain a constant stream of a small amount of data rather than everyone surging at times with dozens of updates? I would rather send one file 10 to 12 times a day than to wait and report 10 days worth of data.

Just because you don't like the "bug", it doesn't mean it is one.




Fear will keep the local systems in line. Fear of this battle station. - Grand Moff Tarkin
ID: 307370 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 307373 - Posted: 16 May 2006, 14:33:41 UTC
Last modified: 16 May 2006, 14:34:17 UTC

The result uploads go directly to the hard drive. They are 6-23K in size. The "reporting" isn't really much more than a few bytes, but it forces communication between the different servers inside their network. Each connection takes time and the operations required after receiving the report take time. It's this which is slowing down the system. To report 1 wu takes about as much as reporting a dozen. So by waiting and reporting multiple results at the same time saves on these open connections between servers.
ID: 307373 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 307378 - Posted: 16 May 2006, 14:38:13 UTC

for example, if they send a million results a day, they get back roughly the same number. 1,000,000 divided by 86400 (seconds in a day)= 11.57 operations per second. If the users report 2 at a time, it only needs to do 6 per second, if 3 then 4 per second, if 4 then 3 per second. you get the picture.
ID: 307378 · Report as offensive
Profile Lee Carre
Volunteer tester

Send message
Joined: 21 Apr 00
Posts: 1459
Credit: 58,485
RAC: 0
Channel Islands
Message 307417 - Posted: 16 May 2006, 15:30:24 UTC - in response to Message 307364.  
Last modified: 16 May 2006, 15:35:35 UTC

I like the idea of returning results as soon as they are complete.
i do as well, and have been tempted sometimes, however, i'd rather respect the wishes of the project admins, as they know what's best for their project

Jack Gulley posted a well thought-out reason for wanting this feature in this thread.
i see his point, but there are much easier and better ways of monitoring, he could set up a dynamic DNS account, so he can always access his site remotely, and run something like Nagios on a web server, that'll give him a lot more detail, he could also set up secure remote access, so he can log into machines remotely, and use them over the internet, to reboot them or whatever

It gives me some comfort knowing that my results are "safely home" to Berkley quickly. That means lower probability of me or my machines screwing something up and losing a completed result. In addition, returning my results faster gives me a greater probability of being part of the quorum, and/or my unit being chosen as the cannonical result (Sometimes it seems that algorithm does seem to include "if the machine is Wendy's don't pick it though... :) )
i see your point, and i've had trouble with crashes and glitches causing me to loose workunits in the past, so i see the tempation, however, it's not very good if, by doing that, we're overloading the servers more than we need to, seti is big enough, with enough performance issues as it is, without us adding new ones

My "trivial but still annoying" reason -- my "active" results don't fit into the real estate on my screen with those "extra" results there.
a more appropriate solution is to reduce your cache, increase your screen resolution, or use something like BoincView which can filter results based on status (so you only see active and paused if you want), there's a setup guide in the wiki

The argument against this "feature" has always been server load. I want to question that. Really, how much of a load is it? And just where is that load?
quite a bit, in the context of reporting the load is on the DB server, there are some more detailed values below, with examples and explanations...

According to the Wiki the process happens in two phases, and the "troublesome" part is the upload of the data file, which has always happened as soon as possible. The feature we are talking about is the contact with the scheduler.
overall, yes, the data server takes quite a hammering
however, as you say reporting is nothing to do with uploading, so uploading is irrelevent
it actually makes no difference how/when you upload results, you've still got to send the same ammount of data, the problem is with the sheer amount of it, and the processing power needed to handle all those uploads, especially if lots are happening at the same time (like after an outage), but there's not much that can be done to improve the efficiecy of that, the only thing is buying more bandwidth (which would be unused normally) and buy a higher capacity server, both of which cost a far bit of money, which SETI doesn't have

anyway, with regard to reporting, as most of you know when you report results, it's part of an "update" to that project (in this case SETI)
when you do any kind of "update" the request is sent to the scheduler, which in turn accesses the DB (database)
and here enlies/inlies (sp?) the problem, the load put on the DB by lots of updating/reporting
(the DB is the slowest part of the whole system, and the reason for the weekly outage)

i know what you're thinking: how is reporting any different to uploading?
well, that's the subtle difference, there's a lot more overhead involved with reporting (or any kind of "update" to the scheduler)

what happens is that when you do a "blank" update, one that doesn't report any results, and doesn't request new work, it takes about 4 DB queries to complete that request (to get your user data and a few other things)

now, for each result reported, it takes about 3 queries (my memory is foggy with these number, so please correct me if i'm wrong, but i think i'm close at least)
so to report 1 result, it takes 7 queries, 4 for user data, and 3 for the result
for each additional result it takes an additional 3 queries in addition to the 7

so for 5 results, it would be the 4 for the user data, and 3*5=15, so that's 19 in total

however, if we were to report each of those results individually, the outcome would be quite different
for each report, there'd be the 4 for user data, and the 3 for the result, which is 7
so 7 * the 5 results is 35, for the same 5 results that could be reported all together and only use 19

35:19
that's quite a difference in my mind


My experience is that the problems are in contacting the upload/download server, rather than the scheduler. In addition, the latest round of problems has been in full disks on the upload/download server. Wouldn't letting the scheduler know that those results are there, ready for assimilation and deletion, help this situation? How many results get trashed when they are in that "Ready to report" status? Are those files still sitting out on the upload/download server? How long? They seem to have lots of capacity on the assimilator and deleter pieces of the process, and need some help on the upload/download part of the process. Wouldn't allowing "return results immediately" help this?
many valid points, however we have to weigh up the cost:benifit ratio and decide which is the best choice based on cost:benifit
there are already measures in place to stop the disks getting full, and to manage really old uploads that arne't in the DB anymore (so that disks don't get clogged up)

but in short, the additional load placed on the DB isn't worth the "gain", and i argue that the gain is non-existant because "lost" results are just resent, and other things aren't of much benifit, the load on the DB is the single biggest problem for this project
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Engines
ID: 307417 · Report as offensive
Profile Lee Carre
Volunteer tester

Send message
Joined: 21 Apr 00
Posts: 1459
Credit: 58,485
RAC: 0
Channel Islands
Message 307419 - Posted: 16 May 2006, 15:34:20 UTC - in response to Message 307370.  

Isn't it better to maintain a constant stream of a small amount of data rather than everyone surging at times with dozens of updates? I would rather send one file 10 to 12 times a day than to wait and report 10 days worth of data.
for data uploads there's no difference, and it's actually better to have a more constant stream, so the data server is better able to handle things, rather than in floods that come and go, but for reporting it's quite a different matter, also reporting doens't need much bandwidth at all, so you can quite easilly report 20 results without a problem
reporting lots of results in the same update is kinder to the DB

Just because you don't like the "bug", it doesn't mean it is one.
opinions on the matter are irrelevent, the fact remains that it's a bug, because it's not supposed to behave like that, so it's a bug by definition
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Engines
ID: 307419 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 307426 - Posted: 16 May 2006, 15:49:21 UTC - in response to Message 307417.  

In addition to Lee's excellent explanation, also consider this:

When uploading all you do is write your result to a directory on the harddrive. That doesn't take much overhead.

Yet for reporting each result, you use the server's CPU and memory.
Not so much a problem if only you, or Jack were doing it. But then extrapolate this when thousands of users are doing this, multiple of them at the same time.
ID: 307426 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21675
Credit: 7,508,002
RAC: 20
United Kingdom
Message 307442 - Posted: 16 May 2006, 16:10:02 UTC - in response to Message 307426.  

In addition to Lee's excellent explanation, also consider this:

When uploading all you do is write your result to a directory on the harddrive. That doesn't take much overhead.

Yet for reporting each result, you use the server's CPU and memory.
Not so much a problem if only you, or Jack were doing it. But then extrapolate this when thousands of users are doing this, multiple of them at the same time.

(My bold added to the quote.)

The bottleneck for the entire system is the Berkeley database server.

Sending back clusters of results all at once reduces the number of database server transactions required and so gives a useful server-side performance boost.

Multiply that (small) boost by a few hundred thousand and you save yourself the need for a very expensive machine in your server closet.

Jack, you got a spare super-server that you can donate? There's much better ways to monitor the health of the machines in your farm.

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 307442 · Report as offensive
Profile jedimstr
Avatar

Send message
Joined: 23 Oct 00
Posts: 33
Credit: 16,828,887
RAC: 0
United States
Message 307563 - Posted: 16 May 2006, 17:17:55 UTC - in response to Message 307363.  
Last modified: 16 May 2006, 17:18:10 UTC


also a feature like that should be off by default, but if a user has to "remove" a string from a config file, i'm guessing it's on by default, again this is bad



Actually it IS off by default (at least it was when I used Trux' client before 5.4.9 came out). Only Calibration was on by default. You had to change the config file to actually turn return_results_immediately ON. Trux' client was much more useful than just for calibration and returning results settings. It allowed automatic CPU Affinity (great for SMP and Dual Cores), Affinity by Project, and assigning Project Priorities specific to that computer.

So i wouldn't disparage Truxoft for what is an option and default off "feature"/bug.
ID: 307563 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 307689 - Posted: 16 May 2006, 18:04:35 UTC - in response to Message 307373.  

The result uploads go directly to the hard drive. They are 6-23K in size. The "reporting" isn't really much more than a few bytes, but it forces communication between the different servers inside their network. Each connection takes time and the operations required after receiving the report take time. It's this which is slowing down the system. To report 1 wu takes about as much as reporting a dozen. So by waiting and reporting multiple results at the same time saves on these open connections between servers.

Reporting means invoking a program on the web server (which may spawn a thread and load an executable), connecting to the database server, finding the appropriate row in the table and updating it.

If you are reporting more than one work unit, all but the last two steps can be shared across the update.
ID: 307689 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 307695 - Posted: 16 May 2006, 18:08:57 UTC - in response to Message 307442.  



Jack, you got a spare super-server that you can donate? There's much better ways to monitor the health of the machines in your farm.

Regards,
Martin

Jack could get a hosting account at some cheezy ISP like 1dollarhosting.com and then put a little script on each machine that uploads a file periodically (every 10 minutes?).

It could be a pretty small file.

... and at least a half-dozen ways to get those all combined into one HTML file.
ID: 307695 · Report as offensive
WendyR
Volunteer tester
Avatar

Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 307710 - Posted: 16 May 2006, 18:26:09 UTC - in response to Message 307417.  

Lee,

Thank you for your thoughtful, polite and useful response to my questions. Please understand that I am not trying to be obnoxious here. I, too, am a software developer, and fairly good at what might best be called "process optimization", and might put a little different "spin" on things.

i do as well, and have been tempted sometimes, however, i'd rather respect the wishes of the project admins, as they know what's best for their project
Yes, it is their project, and they get to make the final decision. I just hope that rational input from the user community is considered.

i see his point, but there are much easier and better ways of monitoring.
I agree, but it is nice to see some "side benefits" of this too.

a more appropriate solution is to reduce your cache, increase your screen resolution, or use something like BoincView which can filter results based on status (so you only see active and paused if you want), there's a setup guide in the wiki
Already at .1 days, using BoincView, and monitor set at maximum resolution. (These 42 yr old eyes are starting having issues with those little letters on the screen too)

...[useful explaination of report results process and database operations deleted for brevity]...

35:19
that's quite a difference in my mind


many valid points, however we have to weigh up the cost:benifit ratio and decide which is the best choice based on cost:benifit
there are already measures in place to stop the disks getting full, and to manage really old uploads that arne't in the DB anymore (so that disks don't get clogged up)

but in short, the additional load placed on the DB isn't worth the "gain", and i argue that the gain is non-existant because "lost" results are just resent, and other things aren't of much benifit, the load on the DB is the single biggest problem for this project


Until the bold statement, certainly all valid points that really can't be argued with. I too, would have to go with the decision to optimize the cost/benefit ratio. But, when you optimize a process, you have to look at the whole picture. Sure, you can spend lots of time optimizing a small piece of the process, but even a factor of 10 speedup on 1% of your total is only a total speedup of .1%. You are better off looking at a more modest improvement on a larger portion of the whole. You also want to look for "low hanging fruit" -- things that you can do quickly, easily, and with as little disruption to the existing process as possible.

So, just where are the log jams and problems in the current process? Have they "moved" or changed in character with recent hardware and software upgrades? Are "we" operating under old assumptions? Looking back over posts from the last few months, there were several outages related to specific hardware failures, disks, memory upgrades, UPS, power failures -- talk about luck!

I seldom see "large" buildups in one area on the status page anymore, and when I do, there is usually some process down, and it clears in a few hours. Clearly, the database server is able to handle "day to day" tasks without falling behind. It also appears to have additional capacity to be able to catch up fairly quickly after an outage.

I found this little tidbit here. It is Tony (mmciastro) quoting Matt Right now we are generating and sending out 250,000 results a day without taxing our database server.

Recent problems -- looking back, there was a whole series of 403 and validation error reports starting around March 27, 2006, finally ending about a week later with "cranking up" the time to deletion. There are 403 and validation errors happening right now (May 15, 2006). My (failable/sp?) memory seems to recall a couple of other spats with this type of problem in the last few months. I seem to recall a couple of other times sometime last year where a full disk resulted in both database sluggishness, malformed workunits being created, and upload/download issues. Each time, the solution seemed to be to clear up those directories on the upload/download server.

Right now, this seems to be the most common reason for an unplanned outage. So, the obvious questions are what can be done to help this? What simple things are available? Faster clearing of completed results [report_results_immediatly] would seem to be one way of improving the situation. It seems like it should be simple, since it was already in the software at one point in time.

Now, the whole "BOINC covers more than one project" argument should still be made, and, yes, my specific examples apply only to the current situation for the SETI project, and perhaps other projects are database bound. That does need to enter into the cost benefit analysis too.

Am I making any sense? It sounds like I am blathering now. I will shut up for a while....
ID: 307710 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : report the results immediately!


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.