Message boards :
Number crunching :
Panic Mode On (22) Server problems
Message board moderation
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
I was just about to post a follow up to a discussion in the other thread, and then i couldn't. So i might as well post it here. :-) Yet another SSD from Intel. I draw your attention to the Random Read & Random Write performance, in particular the comparison to the VelociRaptor. EDIT- BTW- whatever Eric did is still working. There have been a few times where it's taken a retry or 2 before a result has uploaded, but this is during traffic that previously nothing would have been able to upload in. Grant Darwin NT |
gizbar Send message Joined: 7 Jan 01 Posts: 586 Credit: 21,087,774 RAC: 0 |
This morning, I finally cleared my backlog completely. I hope that all the tasks were reported in time, and that I'll get all the credit that I worked for. I've noticed that I've got quite a few of the new "double precision workunits?" that will take approximately twice as long. Let's hope all the changes are for the good, and we can settle down to a bit of reliability for a while. The servers can breathe a sigh of relief once the pressure drops, and hopefilly get back to normal. regards, Gizbar. A proud GPU User Server Donor! |
ML1 Send message Joined: 25 Nov 01 Posts: 21017 Credit: 7,508,002 RAC: 20 |
From the previous thread: "the Staff" having worked for a solid 3 days through the TCP settings on the Upload server the Log Jam should be broken. Well, things are certainly running more smoothly from the users point of view. For my system here, uploads and downloads clear pretty much instantly. Curiously, on Cricket the downloads look to be maxed out for 1 hour periods at a time. More significantly: Has the uploads issue really been 'fixed' by only tweaking the upload server TCP settings? Or has the fix been greatly helped by also avoiding the download servers from saturating the downlink? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
More significantly: Has the uploads issue really been 'fixed' by only tweaking the upload server TCP settings? Or has the fix been greatly helped by also avoiding the download servers from saturating the downlink? My vote is for the tweak. For a given level of download traffic i'm getting uploads going through when before they would have taken several attempts. I'm even getting some uploads going through after a couple of attempts where previously nothing would have gotten through. Grant Darwin NT |
Joseph Monk Send message Joined: 31 Mar 07 Posts: 150 Credit: 1,181,197 RAC: 0 |
What ever they did is great, every one of my uploads have gone throw right away and my downloads are coming cleanly. 6.6.11 seems to be the magic version for people like me (two CUDA cards on Linux x64), so the timing is perfect. Life is good now! |
Space Cowboy Send message Joined: 24 Apr 00 Posts: 43 Credit: 1,730,621 RAC: 0 |
Good to see everything back up and running again. Question though, anyone know why the option to view tasks is disabled and the one to view pending credit on your account page has vanished? |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Good to see everything back up and running again. Question though, anyone know why the option to view tasks is disabled and the one to view pending credit on your account page has vanished? It has been documented in several threads. Both those functions put a heavy load on the replica database which is only just catching up with the master. They will be switched back on when thimgs have settled down (probably next week when Berkeley is fully staffed again). F. |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
Good to see everything back up and running again. Question though, anyone know why the option to view tasks is disabled and the one to view pending credit on your account page has vanished? Why isn't it fully staffed now? Vacation? |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66216 Credit: 55,293,173 RAC: 49 |
Good to see everything back up and running again. Question though, anyone know why the option to view tasks is disabled and the one to view pending credit on your account page has vanished? Yep, Last I heard 1/3rd of the staff is on vacation(Matt), He earned It too. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Good to see everything back up and running again. Question though, anyone know why the option to view tasks is disabled and the one to view pending credit on your account page has vanished? There's another person on vacation too. |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
Well, a new internet link won't fix the tasks list. They would need a new server for that, right? |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Well, a new internet link won't fix the tasks list. They need a gigabit internet connection to ensure enough bandwidth for all the hungry crunchers. They need powerful enough servers to not drop all the connections asking for more work or returning results. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I don't know what happen on other PCs.. but.. My GPU cruncher start now to have UL and DL 'http errors'. EDIT: Also what I mentioned already in the other panic thread.. The DL speed is ~ 50 % cutted. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Well, a new internet link won't fix the tasks list. They can use many things. One issue "may" be the way the BOINC client handles uploads and downloads -- just being too aggressive when things are slow. That could be fixed in 6.6.38. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Vistro asked in the previous thread: I don't know why this is really making my scratch my head.. Here's an overall brief rundown of data handling. Data which was recorded at Arecibo is received on 750 GB Hard Disks, then broken down into files of 50.20 GB or smaller for more convenient handling. Those files are sometimes called 'tapes' because the size resembles the amount of data on one tape of an older recorder system. A splitter works with one of the 14 channels in one of those files. An ap_splitter divides the data in 13.42 second sequential chunks, each being a WU. An mb_splitter gets 107.37 seconds of data and breaks it down into 256 frequency subbands, each of which is a WU. Each WU is saved as a file, and a database entry is made in the WORKUNIT table identifying where the file is and much other information such as the basis for estimated crunch time, how far out the deadline should be, etc. The Transitioner checks the database for new WUs, and creates 2 records in the RESULT table because the project setting is 2 for initial replication. Those 2 are then added to the "Results ready to send" queue shown on the server status page. The Feeder notifies the Scheduler about a set of up to 100 of those Results. Those 100 slots are preassigned, some for MB work, some for AP_v5, and some for AP_v505, probably a 96:1:3 ratio now. The Feeder goes to sleep for a few seconds after it has updated those slots, either by filling them all or leaving some empty because there are none of the correct type in "Ready to send". As a Scheduler process handles a request for work from a host, it checks those slots for suitable work. Preferences, host capabilities, or an app_info.xml can all rule out some types. If no work is found a "(Project has no work available)" message is sent, otherwise the name and URL of the workunit and the desired result are added to the reply message, and database fields are updated to show the work "In progress". Also, the executable and any other files needed to do the work are identified in the reply, including the URLs if the host isn't using an app_info.xml. If the estimated crunch time hasn't fulfilled the amount of work the host requested, the Scheduler looks for more suitable work, otherwise it's done and the reply is sent. The host receives the reply containing one or more tasks. I view a task as being a set of directions: 1. Download xxxxx workunit and any other needed files you don't already have. 2. Start yyyyy application to crunch the workunit and produce an output file with the right name. 3. When the application finishes, upload the result file. 4. Sometime after the result has successfully uploaded, report the task complete. When a host reports a result, a Scheduler process updates the RESULT and WORKUNIT tables as needed. Multiple reported results are done in a batch, less costly in terms of database operations. When successful results from 2 hosts (project setting) have been reported the Validator loads the two result files and checks whether they match. If so, one is declared canonical and both are granted credit. Otherwise, the validation state causes the Transitioner to create another RESULT entry. etc. A canonical result is entered into the master science database by the Assimilator, if there are no more "In progress" results for the WU, the workunit file and all associated result files can be deleted. The database workunit and result entries are kept for another day, usually visible to users as web pages. --------------------------- There's a lot of detail I've left out, but that's the high points. I don't know all the details, for that matter. Hope it helps. Joe |
Vistro Send message Joined: 6 Aug 08 Posts: 233 Credit: 316,549 RAC: 0 |
Oh. That makes a lot of sense. You precreate the result records because it would be easier to do it now when you have the free time than later. |
__W__ Send message Joined: 28 Mar 09 Posts: 116 Credit: 5,943,642 RAC: 0 |
Congratulation to the hard working seti staff since the last maintenance (except for the first 2-3 hours, when the wave runs through :-) ) ULs and DLs running smoothly - no errors or retrys for me. They must have found some extra MHz/GBit somewhere, which are rare than truffle. Maybe we should donate them a truffle pig and when they have found enough bandwidth they could have a nice barbecue - with the pig of course ;-). __W__ _______________________________________________________________________________ |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
Yes, since the first time now for nearly 2 months things is settling. My topcruncher is extremely sensitive for server errors and i was getting used to getting some work to crunch around saturday or sunday after a dry cache occured during the Tuesday backups. I was up and running mearly the day after this time with work filling up so it wouldn't drain. Many thanks and extremely nice done s@h staff. This was a really good move to increase the sensitivity of regular MB work paired with the tweaking of Apache on the upload server.. Hope the "Panic thread" doesn't need to be flooded that much in the near future. Kind regards Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Well, a new internet link won't fix the tasks list. You know... I bet that once the gb line is up and running that usage wil lend up topping out at like 110mb lol. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.