5.2.6 + return_results_immediately

Message boards : Number crunching : 5.2.6 + return_results_immediately
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 186403 - Posted: 6 Nov 2005, 18:43:52 UTC - in response to Message 186371.  


Phrases like "not to be rude, but you're completely wrong" or "keep taking those pills" do not belong in a technical discussion.


It was keep taking the "blue pills", and I even provided a shot of
Morpheus and Neo so you would know what I was talking about, unless that
is you have not seen "The Matrix"... I didn't like doing it, but I figured
it was the only way to get your attention, unfortunately. See, when someone
in "The Matrix" (particularly "Matrix Online") is a "blue pill", they are
still a part of the system of belief that what they are "told" (into their brain)
by "the machines" (in this case, the documentation of the project), is the truth
and there is little that you can do to change them of that belief.

As for the other statement, you misquoted me. I intentionally avoided the
use of the word "wrong". I said that you were "incorrect", which is
supposedly "warmer and fuzzier"...

Now I'm just dropping this subject with you, because it is clear that we
disagree. So we need to agree to disagree and move on...

Brian

ID: 186403 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 186409 - Posted: 6 Nov 2005, 18:53:31 UTC - in response to Message 186342.  


In multi-project the behaviour is more random...


This could be perhaps what I'm seeing. I'm attached to SETI with what
the manager says is a 90% allocation and to Einstein with the remaining
10%. I've currently got Einstein suspended, but have 4 units waiting
(3 that haven't been started, 1 that is in the middle). I'm working on
clearing out the SETI units so I can switch down to a 2-day cache
and so I can try the v7 version of TMR's optimized client, since people
are saying it is faster than 8.1. I'm also going to empty out the Einstein
queue and then delete any leftover files (problem where I crashed and got
new host-ids for both projects...it left a mess of files in the project
directories).

From there, I'll do a "set it and forget it" (thanks RONCO!) on both projects
and see what happens...

Brian
ID: 186409 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 186410 - Posted: 6 Nov 2005, 18:54:25 UTC

You have the option to "Return results Dang near immediately", just upload them manually, and report them by doing a project update.

there, that's settled.
ID: 186410 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 186414 - Posted: 6 Nov 2005, 19:07:04 UTC - in response to Message 186410.  

there, that's settled.


Oh go fix your computer! :P
ID: 186414 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 186416 - Posted: 6 Nov 2005, 19:14:23 UTC - in response to Message 186414.  

there, that's settled.


Oh go fix your computer! :P

ha ha ha, wait to talk to level 2 at Kingston on Monday. Strangely I've had Boinc freeze with that board with my two OCZ stick in it. It's frozen twice and displayed "computation error" on every WU, but after reboot, they all go back to normal.

I only wish my problem was as simple as report now or report later.

"Many a night I lay asleep dreaming of being spat on in the face" , Monty Python.
ID: 186416 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 186426 - Posted: 6 Nov 2005, 19:51:46 UTC - in response to Message 186416.  

ha ha ha, wait to talk to level 2 at Kingston on Monday. Strangely I've had Boinc freeze with that board with my two OCZ stick in it. It's frozen twice and displayed "computation error" on every WU, but after reboot, they all go back to normal.


Hmmm... Sounds like timing issues again... Not sure...
ID: 186426 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 186427 - Posted: 6 Nov 2005, 19:54:29 UTC - in response to Message 186426.  

Hmmm... Sounds like timing issues again... Not sure...

This isn't the thread for this and don't intend to hijack it. Kingston Guarantees it to work, so Level 2 will need to either fix it or find a cure.
ID: 186427 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 186452 - Posted: 6 Nov 2005, 20:58:51 UTC
Last modified: 6 Nov 2005, 20:59:30 UTC

OK, so I've modified the souce code to fix the missing functionality. I included the change into my optimized Windows client 5.3.1 available at http://boinc.truxoft.com. Besides turning it on when started from the command line, you can also add the following switch to the remote_hosts.cfg file to make it working:

# return_results_immediately

This is especially useful if you run BOINC as service, since afaik, it does not let you pass more than one command line switch in that mode.

For those who prefer adding the functionality alone and compiling a client for their platform, the change is indeed trivial, so you can do it very easily:

In the file cs_scheduler.c, go to the function CLIENT_STATE::find_project_with_overdue_results (around line 440 in the current developer source code 5.3.1) and replace the following line:

if (have_sporadic_connection) {

with this one:

if (have_sporadic_connection || return_results_immediately) {


trux
BOINC software
Freediving Team
Czech Republic
ID: 186452 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 186456 - Posted: 6 Nov 2005, 21:10:19 UTC - in response to Message 186452.  


For those who prefer adding the functionality alone and compiling a client for their platform, the change is indeed trivial, so you can do it very easily:

In the file cs_scheduler.c, go to the function CLIENT_STATE::find_project_with_overdue_results (around line 440 in the current developer source code 5.3.1) and replace the following line:

if (have_sporadic_connection) {

with this one:

if (have_sporadic_connection || return_results_immediately) {



Thanks :o)


ID: 186456 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 186495 - Posted: 7 Nov 2005, 0:13:49 UTC - in response to Message 186409.  


In multi-project the behaviour is more random...


This could be perhaps what I'm seeing. I'm attached to SETI with what
the manager says is a 90% allocation and to Einstein with the remaining
10%.

(snipped)

Brian

When you have more than one project, BOINC does not appear to honor the resource share when requesting -- Long Term Debt is used instead to decide which projects can download. If SETI is "owed work" BOINC won't download Einstein.

... and you can explore all of this for yourself. It all shows up in the log files. It was easier to see when Einstein had a 7 day deadline, because you could set "connect every 'x' days" near the deadline and watch BOINC do nothing but Einstein so that it would report on time, then do nothing but SETI.

If you don't believe the documentation, you can see it work.


ID: 186495 · Report as offensive
Shaof
Volunteer tester

Send message
Joined: 27 May 99
Posts: 4
Credit: 82,084
RAC: 0
Finland
Message 186596 - Posted: 7 Nov 2005, 8:03:58 UTC - in response to Message 186452.  
Last modified: 7 Nov 2005, 8:04:18 UTC

OK, so I've modified the souce code to fix the missing functionality. I included the change into my optimized Windows client 5.3.1 available at http://boinc.truxoft.com. Besides turning it on when started from the command line, you can also add the following switch to the remote_hosts.cfg file to make it working:

# return_results_immediately

This is especially useful if you run BOINC as service, since afaik, it does not let you pass more than one command line switch in that mode.


Your website suggests to use "# report_results_immediately" instead of
"# return_results_immediately". Does both switches work???

ID: 186596 · Report as offensive
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 186624 - Posted: 7 Nov 2005, 11:38:20 UTC - in response to Message 186596.  

Your website suggests to use "# report_results_immediately" instead of
"# return_results_immediately". Does both switches work???

Oops, that was a typo. "# return_results_immediately" as written here is correct. I fixed the information on the website. Thanks for for bringing it to my attention!

trux
BOINC software
Freediving Team
Czech Republic
ID: 186624 · Report as offensive
Profile Angus
Volunteer tester

Send message
Joined: 26 May 99
Posts: 459
Credit: 91,013
RAC: 0
Pitcairn Islands
Message 187452 - Posted: 10 Nov 2005, 3:40:35 UTC - in response to Message 186292.  
Last modified: 10 Nov 2005, 3:50:01 UTC


This flag was created for debugging, it was never intended for general use. If I remember correctly it was depreciated in late version 2.xx or early version 3.xx, certainly before public release. It just took a little while for it to actually get deleted.


According to THE "Paul D. Buck" over on the Rosetta forum in this thread, there is a problem with RAC calculations when many results are returned at once.

Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well.


ID: 187452 · Report as offensive
Profile Landroval

Send message
Joined: 7 Oct 01
Posts: 188
Credit: 2,098,881
RAC: 1
United States
Message 187459 - Posted: 10 Nov 2005, 4:44:55 UTC - in response to Message 187452.  

Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well.


On one machine of mine, I'm running 2 projects (S@H & E@H) and 'connect every xx' set to 5 days...had some Einstein units finish up and then sit there until the next 'regularly scheduled' 5-day contact.

Seti, meanwhile, is uploading & reporting as soon as the workunits are finished. So far the longest it's delayed anything is about 6 hours.

It's not hurting anything, everything's still getting turned in by deadline, but it's interesting to watch it happen & speculate as to causes.

If you think education is expensive, try ignorance.
ID: 187459 · Report as offensive
Profile Lee Carre
Volunteer tester

Send message
Joined: 21 Apr 00
Posts: 1459
Credit: 58,485
RAC: 0
Channel Islands
Message 187462 - Posted: 10 Nov 2005, 4:58:53 UTC

hte purpose of trying to report results when the client needs to contact the scheduler anyway is to reduce the load on the servers, and as a side effect, is reducing network traffic as can be seen here

the graphs used to be at a fairly constant rate (when everything was well) but now the average rate is dropping
ID: 187462 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 187548 - Posted: 10 Nov 2005, 14:31:59 UTC - in response to Message 187452.  
Last modified: 10 Nov 2005, 15:06:12 UTC

Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well.



Upload-server and scheduling-server very often is not the same server, and don't even need to be the same location, example CPDN has multiple upload-servers in UK and Switzerland, and even if same location doesn't need to use the same ISP, example SETI@Home there upload/download-server uses Cogent while Scheduling-server uses Berkeley's.

Also, projects can very easily add an extra scheduling/feeder-server, upload-server, download-server, transitioner, validator, assimilator, file_deleter or work-generator if needed. But, if the BOINC-database can't keep up you need to get a more powerful server and move the database to the new server...


Therefore, anything that accesses the database is an "expensive" operation, and minimizing traffic to database is an advantage.


For someone with a permanent connection mainly crunching one project, the client will normally ask for more work somewhere during crunching a result. If a result takes N hours to crunch, waiting to report till asks for work will mean on average reported N/2 hours after finished. Why it's not always following this, see my example earlier in thread of normal/VHAR in SETI@Home.


In the BOINC benchmark-project there a result was reported at same time asked for more work, the scheduling-server was responsible for 36% of the database-load and 51% of total load. If adding an extra scheduler-request to report a result doubles the scheduler-load, scheduling-server is suddenly at 53% database-load and 67% total load. This means a single-server-setup can handle 34% less results/day, while in multi-server-setup can handle 26% less results/day.


If 95% of users has no problem to wait a little before reported, and the remaining 5% has a life outside BOINC so isn't staring at BOINC 24/7, roughly 1% of the time it would be a "problem". Making the BOINC-client report results immediately after upload and in the process losing 34% or 26% of the capasity to cater for the 1% would in my opinion make little sence.


Well, adding an extra connection just to report a result will likely not double the scheduling-server load, but still it can significantly decrease the available capasity.
ID: 187548 · Report as offensive
Profile Angus
Volunteer tester

Send message
Joined: 26 May 99
Posts: 459
Credit: 91,013
RAC: 0
Pitcairn Islands
Message 187556 - Posted: 10 Nov 2005, 15:26:21 UTC - in response to Message 187548.  
Last modified: 10 Nov 2005, 15:28:10 UTC


For someone with a permanent connection mainly crunching one project, the client will normally ask for more work somewhere during crunching a result. If a result takes N hours to crunch, waiting to report till asks for work will mean on average reported N/2 hours after finished. Why it's not always following this, see my example earlier in thread of normal/VHAR in SETI@Home.
For someone running only one project and cache-setting 3 days, he'll crunch result-1, upload it, start crunching result-2, and somewhere while crunching result-2 asks for more work. Asking for more work also reports any uploaded results...

I don't see how this behaviour, as you describe it, is any different than reporting immediately in the number of database accesses. Your example has the client reporting uploaded results sometime during the crucnhing of the next WU. That's one database access for each WU reported. It's not "batching" the reporting function. Having the client report immediately only changes the time between uploading and reporting.

What I see happening in my single project situation is the client storing up results to be reported for well past one day (many WUs). When the cache finally gets low enough the client asks for more work and reports a bunch of results at once. This would seem to be one connection to the database, but with many rows being inserted. Is that better or worse than one row being inserted per connection? I don't know. I do know that Paul has reported RAC calculation problems with a number of WUs being reported at the same time, since part of the RAC calculation formula is the time between reports. this is only exacerbated in a project like Rosetta where reporting and credit granting happen virtually at the same time since there is a quorum of only 1.

I have seen the RAC of individual boxes rise dramatically after the change is made to report immediately, and with no other changes. This would at least be an anecdotal sign that the RAC calculation is broken if reporting of results is "batched".


In the BOINC benchmark-project there a result was reported at same time asked for more work, the scheduling-server was responsible for 36% of the database-load and 51% of total load. If adding an extra scheduler-request to report a result doubles the scheduler-load, scheduling-server is suddenly at 53% database-load and 67% total load. This means a single-server-setup can handle 34% less results/day, while in multi-server-setup can handle 26% less results/day.


If 95% of users has no problem to wait a little before reported, and the remaining 5% has a life outside BOINC so isn't staring at BOINC 24/7, roughly 1% of the time it would be a problem. Making the BOINC-client report results immediately after upload and in the process losing 34% or 26% of the capasity to cater for the 1% would in my opinion make little sence.

Does having incorrect RAC scores make sense either? Accurate credit granting is at least as important as anything else a project does, in terms of public perception. ( The scientists are screaming now, but such is life in the DC world.)


Well, adding an extra connection just to report a result will likely not double the scheduling-server load, but still it can significantly decrease the available capasity.


ID: 187556 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 187580 - Posted: 10 Nov 2005, 17:19:53 UTC - in response to Message 187556.  

I don't see how this behaviour, as you describe it, is any different than reporting immediately in the number of database accesses. Your example has the client reporting uploaded results sometime during the crucnhing of the next WU. That's one database access for each WU reported. It's not "batching" the reporting function. Having the client report immediately only changes the time between uploading and reporting.



Well, don't know all the innards of the scheduling-server, but from a quick look on things it indicates there's likely one db-read and one db-write per result assigned or reported, how many fields is another matter.

Also, you'll need to look-up host, user and possibly team. If not scheduling-server is very ineffectively programmed, this should only be read-in once per connection, but if read more than once the chances are is cached in memory.
Host-info is always updated, and user-info can be updated depending on new preferences or not.


So, if is not mistaken, each connection to scheduling-server gives 3 db-reads and 1 db-write. For each result assigned or reported there's an additional db-read and db-write. Both host-info and user-info has more fields than result-info.

This also means, asking for 1 result and reporting 1 result at same time means 5 reads and 3 writes. Asking for 1 result and reporting 1 result at different times means 8 reads and 4 writes.


Anyway, if db-time used on user/host/team-tables is insignificant compared to time used on result-table, having an extra scheduling-server-connection just to report results wouldn't really matter.
But, if it's insignificant, why did the bug in v4.4x with results being reported immediately get fixed in later clients...
ID: 187580 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 187582 - Posted: 10 Nov 2005, 17:34:44 UTC - in response to Message 187556.  


What I see happening in my single project situation is the client storing up results to be reported for well past one day (many WUs). When the cache finally gets low enough the client asks for more work and reports a bunch of results at once. This would seem to be one connection to the database, but with many rows being inserted. Is that better or worse than one row being inserted per connection? I don't know. I do know that Paul has reported RAC calculation problems with a number of WUs being reported at the same time, since part of the RAC calculation formula is the time between reports. this is only exacerbated in a project like Rosetta where reporting and credit granting happen virtually at the same time since there is a quorum of only 1.

It isn't just database connections.

I'm going to try to explain this using "standard CGI" even though FastCGI is more efficient -- the same issues apply.

The scheduler is a CGI program running on a web server. The connection comes in and invokes the CGI. The web server starts a thread and starts the CGI program. The CGI program loads whatever libraries/interpreters/etc. it needs, opens database connections, does whatever transactions are needed, responds to the client, then closes the database connections, releases the resources, terminates, the thread terminates, the web server terminates the connection, and we're done.

There are a lot of steps leading up to the first database update, and a lot of steps after.

FastCGI helps because it keeps the CGI in RAM, but there is still overhead, and still a limited number of threads (more threads means slower threads, fewer threads run faster), it's better, but not perfect.

At any rate, that's the concepts, even if the details are a little off.

ID: 187582 · Report as offensive
Profile Angus
Volunteer tester

Send message
Joined: 26 May 99
Posts: 459
Credit: 91,013
RAC: 0
Pitcairn Islands
Message 187617 - Posted: 10 Nov 2005, 19:08:54 UTC

This seems to be at odds with the SETI/BOINC mantra about keeping the "connect every xx" setting as low as possible - usually recommended to be something like .1 or .0x days.

This is forcing the client to fetch more work and report after every result is done, creating a lot more server connections than necessary.

So - either way, the server gets the load it gets. If it's too slow to handle the traffic, that's a hardware issue.


ID: 187617 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : 5.2.6 + return_results_immediately


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.