Message boards :
Number crunching :
I Just Don't Believe It
Message board moderation
Author | Message |
---|---|
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
This copying of the upload/download folders just doesn't add-up. Ok so I don't know the exact figures for the data sizes being moved, but the upper limit is 1Tb because that is what they have. Unless I've got my math screwed, a Tb is ~1,000,000,000 bytes and 48 hours is 172,800 scconds So, moving 1Tb in 48 hours means that the data is transferring at a stupidly pathetic 6Kb per second - less than that as it's not even a whole Tb. Perhaps UCB should try connecting the systems with a direct serial cable rather than via a dial-up modem! It's not even transferring data at DSL speed... that would be 10 times faster. UCB really need to be asking just why a simple copy/move operation cannot function on their system. IMO, either the network, the raid, or the file system is badly screwed-up. |
Rom Walton (BOINC) Send message Joined: 28 Apr 00 Posts: 579 Credit: 130,733 RAC: 0 |
Wouldn't the slowdown of file lookups, which slows down the validation process, also slow down a copy operation? ----- Rom BOINC Development Team, U.C. Berkeley My Blog |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
UCB really need to be asking just why a simple copy/move operation cannot function on their system. Maybe it's just me, but..... If the problem is "poor performance on the current RAID" and the proposed fix is to "get the files off of the current raid" then isn't the file copy operation likely to be subject to the same "poor performance" issue? |
Martin P. Send message Joined: 19 May 99 Posts: 294 Credit: 27,230,961 RAC: 2 |
This copying of the upload/download folders just doesn't add-up. Not Quite: 1.000 bytes = 1 kilobyte 1.000.000 bytes = 1.000 kB = 1 MB 1.000.000.000 bytes = 1.000.000 kB = 1.000 MB = 1 GB 1.000.000.000.000 bytes = 1.000.000.000 kB = 1.000.000 MB = 1.000 GB = 1 TB So, you are wrong by a factor of 1.000. Mind: I used european thousands seperators! I also left away the 24 and it's multiples in 1024. |
Francis Noel Send message Joined: 30 Aug 05 Posts: 452 Credit: 142,832,523 RAC: 94 |
I snipped a wiki A terabyte (derived from the SI prefix tera-) is a unit of information or computer storage equal to one trillion (one long scale billion) bytes. It is commonly abbreviated TB. Because of irregularities in using the binary prefix in the definition and usage of the kilobyte, the exact number in common practice could be either one of the following: 1,000,000,000,000 bytes – 10004 or 1012. 1,099,511,627,776 bytes – 10244 or 240. This capacity may be expressed unambiguously as a tebibyte. The prefix "tera" originates from the Greek word teras meaning 'monster' I think a couple zeros were missing from your original calculations MikeS. I'll let someone else do the math, I positively sock at it. Edit : I got beat to the poste while reading that fascinating wiki, sorry for the redundancy. mambo |
MicroBeta Send message Joined: 22 Jun 04 Posts: 7 Credit: 224,853 RAC: 0 |
Actually I think that 1,000,000,000 is a Gb. If I have this correct then 6Kb/s would become 6000Kb/s or approx 6Mb/s. Assuming: Kb - Kilobyte = 2^10=1,024 Mb - Megabyet = 2^20=1,048,576 Gb - Gigabyte = 2^30=1,073,471,842 Tb - Terabyte = 2^40=1,099,511,627,776 1,099,511,627,776 bytes/172,800 sec = 6,362,714 b/s If I have this wrong, let me know. Mike |
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
Actually I think that 1,000,000,000 is a Gb. You are right, I was actually working this out when you posted -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
Wouldn't the slowdown of file lookups, which slows down the validation process, also slow down a copy operation? Yes Rom, I expect it would. And that's what is worring me. The slow file lookups are due to individual directory size, not the volume size. If so, the directories may well end-up on different systems, but as I understand it, the directories size/content won't change, so the look-ups will remain the same after this copy process - still slow. |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
Well there you go then, somehow I forgot Gb comes between Mb and Tb - Oops, sorry. Mind you that still only gives a less-than 6Mb/sec transfer rate, which IMO is till very poor for the hardware involved. |
Francis Noel Send message Joined: 30 Aug 05 Posts: 452 Credit: 142,832,523 RAC: 94 |
What if you factor in the directory lookup "overhead" ? mambo |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Rom Yes it would, so finding the real issue of whether it is a hardware or software failure is the hard part. Wouldn't the slowdown of file lookups, which slows down the validation process, also slow down a copy operation? If we look at just the Raid Array itself (hardware) * If the controller talking to the backplane/fabric has an issue. It will have trouble finding the correct strip to look at.. * If the controller talking to the backplane/fabric has an issue. I may read the data from the stripe and calculate incorrect parity and issue a retry. * If the controller talking to the backplane/fabric has a Cache issue. Then it will read "data" slower * If the controller talking to the backplane/fabric has a Cache issue. It may see incorrect parity and issue a retry. * If the backplane/fabric has an issue talking to a single drive in the stripe, you have a retry. * If a cable connecting the Controller to the backplane/fabric has an issue. then data is read/written with incorrect parity and retried. * If the Array is Mirrored, and there is an issue writing the data across "both" stripes normally results in a retry. Reboot, break the mirror and test... * If a drive in the array is failing (presuming no mirror). When data is Read or Written and Parity checked. If parity is not correct, there is a retry. In many of these cases it may only show up as a "Wait" (Controller BIOS reporting) and not be true cause of the "reported" error. So any reasonable Raid Controller has diagnostics that can test the Raid Array, and report failures in the "Controller" log (most keep a limited log for troubleshooting)... It would include Fiber channel and cable connections. In the case of a soft failure of a single drive that does not reach the percentage/threshold indicating a true hardware failure you end up with multiple retries until it succeeds. The OS Logs should show failed ethernet connections and point the issue... R/ Al Please consider a Donation to the Seti Project. |
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0 |
It used to be fast, but now is slow" is a basic summation of what Matt said re the Raid unit. I get the feeling that in a few weeks/months someone @ Berkeley will have a Eurika! moment when they notice that: A "gizmo" in the corner has accidently been switched from "fast" to "slow" OR A plug has come out.... OR A cable has had a nail put through it. OR "Who the @~@~ turned this off!!!!" OR Something as equally silly, that the basic system is able to "compensate" for, but at a major loss of performance. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Ned UCB really need to be asking just why a simple copy/move operation cannot function on their system. Moving the UL/DL to a separate machine will improve performance of UL/DL. The "Validation" will get some relief... and may provide some additional time to determine the true cause. However a portion of the potential problem, has been removed in the troubleshooting process. * If it is an Inode issue then it should be readily apparent... With the reduced number of files, "Things should just fly." * If it is a hardware issue things (validation/etc..) will still be slow... Back to troubleshooting. R/ Al Please consider a Donation to the Seti Project. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I'll tell you exactly what happened. First off - we're just doing uploads. That's 11 million files on disks totalling about 160GB. Raw transfer time would be, if you do the math correctly, just under 1MB/sec over 48 hours. This is about the I/O rate we are seeing. EDIT: It's this slow stricly because of the directory lookups that have to happen for each file being copied off the old file server. Reason why original estimates were off: We did some speed tests first by copying a select few directories from the old server to the new one. We deleted those test copies, and did some more tests on the same directories with three copy processes running - it went 3 times as fast! So we fired that off and guessed it would take about 12-16 hours to complete. However, we didn't take into account that a few test directories were still in cache from our tests. So the three copy processes weren't actually going that much faster than just one. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
[B@H] Ray Send message Joined: 1 Sep 00 Posts: 485 Credit: 45,275 RAC: 0 |
Summerizeing the above post. The large number of files (making it hard to read the directory) along with the large size is what is causing it to take a long time. ------------------ Also remember that they have been haveing problems with the old disk array not working correct for almost 2 Mo. now. That is what caused the past long outages in the first place. So there is a chance that the new (but still old) disk array will be a lot faster and not have the large WTV que after this, well worth not being able to turn in work for two days. So you have to wait a little longer to turn units in, but may get credit a lot faster with the outage, only time will tell. There are newer disk arrays out there that would think that this project is small for them, but cost a lot more than the project has for them. They would be good, but like our families have to work within a budget. The only way to get one of them would be if one of the users found out how much it would cost, and took up a collection from the users to get one. But most users have very little extra and probably would still come up short of getting one. Pizza@Home Rays Place Rays place Forums |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Matt Thank You I'll tell you exactly what happened. R/ Al Please consider a Donation to the Seti Project. |
Anigel Send message Joined: 5 Dec 99 Posts: 101 Credit: 643,544 RAC: 0 |
Remember when working out what a MB, GB, TB is that only disk manufacturers use the incorrect 1KB = 1000 bytes this means they can sell you a 200GB box size drive whilst in usage it has a much lower capacity 1 gigabyte = 8,589,934,592 bits and not the 8,000,000,000 bits these manufacturers use to artifically inflate the size of their drives. Over a 200GB drive, that makes a difference of 1,717,986,918,400 - 1,600,000,000,000 = 117,986,918,400 bits or nearly 14GB So when you buy a 200GB drive you are actually buying a 186GB drive. Scale that up to a 360GB drive and you are actually buying a 335GB one. In any other market this would be treated as false advertising, however somehow hard drive manufacturers seem exempt from this even though they overstate thier products capacity by 7%. Part of Teamseti For SetiBoinc status graphs visit Teamseti status graphs |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
Is this a file move operation, or a file copy > verify > delete source operation? I keep hearing the term "copy" which indicates that these 11 million upload files will all need to be deleted when the copy operation is done. |
SunMicrosystemsLLG Send message Joined: 4 Jul 05 Posts: 102 Credit: 1,360,617 RAC: 0 |
Also, if the destination device contains a 'new' RAID volume this will probably have to be left quite a while to 'sync' on completion of the file copy. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Also, if the destination device contains a 'new' RAID volume this will probably have to be left quite a while to 'sync' on completion of the file copy. If it's an empty RAID, shouldn't it stay in sync as files are added? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.