5.2.6 + return_results

Author	Message
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 186403 - Posted: 6 Nov 2005, 18:43:52 UTC - in response to Message 186371. Phrases like "not to be rude, but you're completely wrong" or "keep taking those pills" do not belong in a technical discussion. It was keep taking the "blue pills", and I even provided a shot of Morpheus and Neo so you would know what I was talking about, unless that is you have not seen "The Matrix"... I didn't like doing it, but I figured it was the only way to get your attention, unfortunately. See, when someone in "The Matrix" (particularly "Matrix Online") is a "blue pill", they are still a part of the system of belief that what they are "told" (into their brain) by "the machines" (in this case, the documentation of the project), is the truth and there is little that you can do to change them of that belief. As for the other statement, you misquoted me. I intentionally avoided the use of the word "wrong". I said that you were "incorrect", which is supposedly "warmer and fuzzier"... Now I'm just dropping this subject with you, because it is clear that we disagree. So we need to agree to disagree and move on... Brian ID: 186403 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 186409 - Posted: 6 Nov 2005, 18:53:31 UTC - in response to Message 186342. In multi-project the behaviour is more random... This could be perhaps what I'm seeing. I'm attached to SETI with what the manager says is a 90% allocation and to Einstein with the remaining 10%. I've currently got Einstein suspended, but have 4 units waiting (3 that haven't been started, 1 that is in the middle). I'm working on clearing out the SETI units so I can switch down to a 2-day cache and so I can try the v7 version of TMR's optimized client, since people are saying it is faster than 8.1. I'm also going to empty out the Einstein queue and then delete any leftover files (problem where I crashed and got new host-ids for both projects...it left a mess of files in the project directories). From there, I'll do a "set it and forget it" (thanks RONCO!) on both projects and see what happens... Brian ID: 186409 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 186410 - Posted: 6 Nov 2005, 18:54:25 UTC You have the option to "Return results Dang near immediately", just upload them manually, and report them by doing a project update. there, that's settled. ID: 186410 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 186414 - Posted: 6 Nov 2005, 19:07:04 UTC - in response to Message 186410. there, that's settled. Oh go fix your computer! :P ID: 186414 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 186416 - Posted: 6 Nov 2005, 19:14:23 UTC - in response to Message 186414. there, that's settled. Oh go fix your computer! :P ha ha ha, wait to talk to level 2 at Kingston on Monday. Strangely I've had Boinc freeze with that board with my two OCZ stick in it. It's frozen twice and displayed "computation error" on every WU, but after reboot, they all go back to normal. I only wish my problem was as simple as report now or report later. "Many a night I lay asleep dreaming of being spat on in the face" , Monty Python. ID: 186416 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 186426 - Posted: 6 Nov 2005, 19:51:46 UTC - in response to Message 186416. ha ha ha, wait to talk to level 2 at Kingston on Monday. Strangely I've had Boinc freeze with that board with my two OCZ stick in it. It's frozen twice and displayed "computation error" on every WU, but after reboot, they all go back to normal. Hmmm... Sounds like timing issues again... Not sure... ID: 186426 ·

Astro Volunteer tester Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0	Message 186427 - Posted: 6 Nov 2005, 19:54:29 UTC - in response to Message 186426. Hmmm... Sounds like timing issues again... Not sure... This isn't the thread for this and don't intend to hijack it. Kingston Guarantees it to work, so Level 2 will need to either fix it or find a cure. ID: 186427 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 186452 - Posted: 6 Nov 2005, 20:58:51 UTC Last modified: 6 Nov 2005, 20:59:30 UTC OK, so I've modified the souce code to fix the missing functionality. I included the change into my optimized Windows client 5.3.1 available at http://boinc.truxoft.com. Besides turning it on when started from the command line, you can also add the following switch to the remote_hosts.cfg file to make it working: # return_results_immediately This is especially useful if you run BOINC as service, since afaik, it does not let you pass more than one command line switch in that mode. For those who prefer adding the functionality alone and compiling a client for their platform, the change is indeed trivial, so you can do it very easily: In the file cs_scheduler.c, go to the function CLIENT_STATE::find_project_with_overdue_results (around line 440 in the current developer source code 5.3.1) and replace the following line: if (have_sporadic_connection) { with this one: if (have_sporadic_connection \|\| return_results_immediately) { trux BOINC software Freediving Team Czech Republic ID: 186452 ·

Hans Dorn Volunteer developer Volunteer tester Send message Joined: 3 Apr 99 Posts: 2262 Credit: 26,448,570 RAC: 0	Message 186456 - Posted: 6 Nov 2005, 21:10:19 UTC - in response to Message 186452. For those who prefer adding the functionality alone and compiling a client for their platform, the change is indeed trivial, so you can do it very easily: In the file cs_scheduler.c, go to the function CLIENT_STATE::find_project_with_overdue_results (around line 440 in the current developer source code 5.3.1) and replace the following line: if (have_sporadic_connection) { with this one: if (have_sporadic_connection \|\| return_results_immediately) { Thanks :o) ID: 186456 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 186495 - Posted: 7 Nov 2005, 0:13:49 UTC - in response to Message 186409. In multi-project the behaviour is more random... This could be perhaps what I'm seeing. I'm attached to SETI with what the manager says is a 90% allocation and to Einstein with the remaining 10%. (snipped) Brian When you have more than one project, BOINC does not appear to honor the resource share when requesting -- Long Term Debt is used instead to decide which projects can download. If SETI is "owed work" BOINC won't download Einstein. ... and you can explore all of this for yourself. It all shows up in the log files. It was easier to see when Einstein had a 7 day deadline, because you could set "connect every 'x' days" near the deadline and watch BOINC do nothing but Einstein so that it would report on time, then do nothing but SETI. If you don't believe the documentation, you can see it work. ID: 186495 ·

Shaof Volunteer tester Send message Joined: 27 May 99 Posts: 4 Credit: 82,084 RAC: 0	Message 186596 - Posted: 7 Nov 2005, 8:03:58 UTC - in response to Message 186452. Last modified: 7 Nov 2005, 8:04:18 UTC OK, so I've modified the souce code to fix the missing functionality. I included the change into my optimized Windows client 5.3.1 available at http://boinc.truxoft.com. Besides turning it on when started from the command line, you can also add the following switch to the remote_hosts.cfg file to make it working: # return_results_immediately This is especially useful if you run BOINC as service, since afaik, it does not let you pass more than one command line switch in that mode. Your website suggests to use "# report_results_immediately" instead of "# return_results_immediately". Does both switches work??? ID: 186596 ·

trux Volunteer tester Send message Joined: 6 Feb 01 Posts: 344 Credit: 1,127,051 RAC: 0	Message 186624 - Posted: 7 Nov 2005, 11:38:20 UTC - in response to Message 186596. Your website suggests to use "# report_results_immediately" instead of "# return_results_immediately". Does both switches work??? Oops, that was a typo. "# return_results_immediately" as written here is correct. I fixed the information on the website. Thanks for for bringing it to my attention! trux BOINC software Freediving Team Czech Republic ID: 186624 ·

Angus Volunteer tester Send message Joined: 26 May 99 Posts: 459 Credit: 91,013 RAC: 0	Message 187452 - Posted: 10 Nov 2005, 3:40:35 UTC - in response to Message 186292. Last modified: 10 Nov 2005, 3:50:01 UTC This flag was created for debugging, it was never intended for general use. If I remember correctly it was depreciated in late version 2.xx or early version 3.xx, certainly before public release. It just took a little while for it to actually get deleted. According to THE "Paul D. Buck" over on the Rosetta forum in this thread, there is a problem with RAC calculations when many results are returned at once. Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well. ID: 187452 ·

Landroval Send message Joined: 7 Oct 01 Posts: 188 Credit: 2,098,881 RAC: 1	Message 187459 - Posted: 10 Nov 2005, 4:44:55 UTC - in response to Message 187452. Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well. On one machine of mine, I'm running 2 projects (S@H & E@H) and 'connect every xx' set to 5 days...had some Einstein units finish up and then sit there until the next 'regularly scheduled' 5-day contact. Seti, meanwhile, is uploading & reporting as soon as the workunits are finished. So far the longest it's delayed anything is about 6 hours. It's not hurting anything, everything's still getting turned in by deadline, but it's interesting to watch it happen & speculate as to causes. If you think education is expensive, try ignorance. ID: 187459 ·

Lee Carre Volunteer tester Send message Joined: 21 Apr 00 Posts: 1459 Credit: 58,485 RAC: 0	Message 187462 - Posted: 10 Nov 2005, 4:58:53 UTC hte purpose of trying to report results when the client needs to contact the scheduler anyway is to reduce the load on the servers, and as a side effect, is reducing network traffic as can be seen here the graphs used to be at a fairly constant rate (when everything was well) but now the average rate is dropping ID: 187462 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 187548 - Posted: 10 Nov 2005, 14:31:59 UTC - in response to Message 187452. Last modified: 10 Nov 2005, 15:06:12 UTC Having results returned immediately would remove this issue, and besides, it makes absolutely no sense to delay reporting if the uploads are happening as soon as the WU finishes. If the client breaks the 'connect every xx' to upload, it can dang well break it to report as well. Upload-server and scheduling-server very often is not the same server, and don't even need to be the same location, example CPDN has multiple upload-servers in UK and Switzerland, and even if same location doesn't need to use the same ISP, example SETI@Home there upload/download-server uses Cogent while Scheduling-server uses Berkeley's. Also, projects can very easily add an extra scheduling/feeder-server, upload-server, download-server, transitioner, validator, assimilator, file_deleter or work-generator if needed. But, if the BOINC-database can't keep up you need to get a more powerful server and move the database to the new server... Therefore, anything that accesses the database is an "expensive" operation, and minimizing traffic to database is an advantage. For someone with a permanent connection mainly crunching one project, the client will normally ask for more work somewhere during crunching a result. If a result takes N hours to crunch, waiting to report till asks for work will mean on average reported N/2 hours after finished. Why it's not always following this, see my example earlier in thread of normal/VHAR in SETI@Home. In the BOINC benchmark-project there a result was reported at same time asked for more work, the scheduling-server was responsible for 36% of the database-load and 51% of total load. If adding an extra scheduler-request to report a result doubles the scheduler-load, scheduling-server is suddenly at 53% database-load and 67% total load. This means a single-server-setup can handle 34% less results/day, while in multi-server-setup can handle 26% less results/day. If 95% of users has no problem to wait a little before reported, and the remaining 5% has a life outside BOINC so isn't staring at BOINC 24/7, roughly 1% of the time it would be a "problem". Making the BOINC-client report results immediately after upload and in the process losing 34% or 26% of the capasity to cater for the 1% would in my opinion make little sence. Well, adding an extra connection just to report a result will likely not double the scheduling-server load, but still it can significantly decrease the available capasity. ID: 187548 ·

Angus Volunteer tester Send message Joined: 26 May 99 Posts: 459 Credit: 91,013 RAC: 0	Message 187556 - Posted: 10 Nov 2005, 15:26:21 UTC - in response to Message 187548. Last modified: 10 Nov 2005, 15:28:10 UTC For someone with a permanent connection mainly crunching one project, the client will normally ask for more work somewhere during crunching a result. If a result takes N hours to crunch, waiting to report till asks for work will mean on average reported N/2 hours after finished. Why it's not always following this, see my example earlier in thread of normal/VHAR in SETI@Home. For someone running only one project and cache-setting 3 days, he'll crunch result-1, upload it, start crunching result-2, and somewhere while crunching result-2 asks for more work. Asking for more work also reports any uploaded results... I don't see how this behaviour, as you describe it, is any different than reporting immediately in the number of database accesses. Your example has the client reporting uploaded results sometime during the crucnhing of the next WU. That's one database access for each WU reported. It's not "batching" the reporting function. Having the client report immediately only changes the time between uploading and reporting. What I see happening in my single project situation is the client storing up results to be reported for well past one day (many WUs). When the cache finally gets low enough the client asks for more work and reports a bunch of results at once. This would seem to be one connection to the database, but with many rows being inserted. Is that better or worse than one row being inserted per connection? I don't know. I do know that Paul has reported RAC calculation problems with a number of WUs being reported at the same time, since part of the RAC calculation formula is the time between reports. this is only exacerbated in a project like Rosetta where reporting and credit granting happen virtually at the same time since there is a quorum of only 1. I have seen the RAC of individual boxes rise dramatically after the change is made to report immediately, and with no other changes. This would at least be an anecdotal sign that the RAC calculation is broken if reporting of results is "batched". In the BOINC benchmark-project there a result was reported at same time asked for more work, the scheduling-server was responsible for 36% of the database-load and 51% of total load. If adding an extra scheduler-request to report a result doubles the scheduler-load, scheduling-server is suddenly at 53% database-load and 67% total load. This means a single-server-setup can handle 34% less results/day, while in multi-server-setup can handle 26% less results/day. If 95% of users has no problem to wait a little before reported, and the remaining 5% has a life outside BOINC so isn't staring at BOINC 24/7, roughly 1% of the time it would be a problem. Making the BOINC-client report results immediately after upload and in the process losing 34% or 26% of the capasity to cater for the 1% would in my opinion make little sence. Does having incorrect RAC scores make sense either? Accurate credit granting is at least as important as anything else a project does, in terms of public perception. ( The scientists are screaming now, but such is life in the DC world.) Well, adding an extra connection just to report a result will likely not double the scheduling-server load, but still it can significantly decrease the available capasity. ID: 187556 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 187580 - Posted: 10 Nov 2005, 17:19:53 UTC - in response to Message 187556. I don't see how this behaviour, as you describe it, is any different than reporting immediately in the number of database accesses. Your example has the client reporting uploaded results sometime during the crucnhing of the next WU. That's one database access for each WU reported. It's not "batching" the reporting function. Having the client report immediately only changes the time between uploading and reporting. Well, don't know all the innards of the scheduling-server, but from a quick look on things it indicates there's likely one db-read and one db-write per result assigned or reported, how many fields is another matter. Also, you'll need to look-up host, user and possibly team. If not scheduling-server is very ineffectively programmed, this should only be read-in once per connection, but if read more than once the chances are is cached in memory. Host-info is always updated, and user-info can be updated depending on new preferences or not. So, if is not mistaken, each connection to scheduling-server gives 3 db-reads and 1 db-write. For each result assigned or reported there's an additional db-read and db-write. Both host-info and user-info has more fields than result-info. This also means, asking for 1 result and reporting 1 result at same time means 5 reads and 3 writes. Asking for 1 result and reporting 1 result at different times means 8 reads and 4 writes. Anyway, if db-time used on user/host/team-tables is insignificant compared to time used on result-table, having an extra scheduling-server-connection just to report results wouldn't really matter. But, if it's insignificant, why did the bug in v4.4x with results being reported immediately get fixed in later clients... ID: 187580 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 187582 - Posted: 10 Nov 2005, 17:34:44 UTC - in response to Message 187556. What I see happening in my single project situation is the client storing up results to be reported for well past one day (many WUs). When the cache finally gets low enough the client asks for more work and reports a bunch of results at once. This would seem to be one connection to the database, but with many rows being inserted. Is that better or worse than one row being inserted per connection? I don't know. I do know that Paul has reported RAC calculation problems with a number of WUs being reported at the same time, since part of the RAC calculation formula is the time between reports. this is only exacerbated in a project like Rosetta where reporting and credit granting happen virtually at the same time since there is a quorum of only 1. It isn't just database connections. I'm going to try to explain this using "standard CGI" even though FastCGI is more efficient -- the same issues apply. The scheduler is a CGI program running on a web server. The connection comes in and invokes the CGI. The web server starts a thread and starts the CGI program. The CGI program loads whatever libraries/interpreters/etc. it needs, opens database connections, does whatever transactions are needed, responds to the client, then closes the database connections, releases the resources, terminates, the thread terminates, the web server terminates the connection, and we're done. There are a lot of steps leading up to the first database update, and a lot of steps after. FastCGI helps because it keeps the CGI in RAM, but there is still overhead, and still a limited number of threads (more threads means slower threads, fewer threads run faster), it's better, but not perfect. At any rate, that's the concepts, even if the details are a little off. ID: 187582 ·

Angus Volunteer tester Send message Joined: 26 May 99 Posts: 459 Credit: 91,013 RAC: 0	Message 187617 - Posted: 10 Nov 2005, 19:08:54 UTC This seems to be at odds with the SETI/BOINC mantra about keeping the "connect every xx" setting as low as possible - usually recommended to be something like .1 or .0x days. This is forcing the client to fetch more work and report after every result is done, creating a lot more server connections than necessary. So - either way, the server gets the load it gets. If it's too slow to handle the traffic, that's a hardware issue. ID: 187617 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.

5.2.6 + return_results_immediately