Message boards :
Number crunching :
New tech news item...
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Check out the new item in tech news... Bottom line: All queues are moving in a positive direction, though some much slower than others, except for the validation queue, which is just barely unable to currently keep up. The only bottleneck slowing validation down is large directory sizes on the upload/download filesystem. This is being addressed in many ways, and we should see this queue start to drain as fixes are applied. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
Awesome tech article, Matt. Glad to see things moving forward, and starting to find some relief. Keep up the great work. My movie https://vimeo.com/manage/videos/502242 |
tekwyzrd Send message Joined: 21 Nov 01 Posts: 767 Credit: 30,009 RAC: 0 |
Thanks for the detailed report. I may be wrong but it seems to me that if the current directories are too large it might help if the results were organized with a directory for each tape. They could then be eliminated after the results are validated and deleted. Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws. Douglas Adams (1952 - 2001) |
ML1 Send message Joined: 25 Nov 01 Posts: 20372 Credit: 7,508,002 RAC: 20 |
Check out the new item in tech news... Matt, thanks for the detailed tech news. Good to see the server side of what's happening! Re: Adding more fan-out directories wouldn't help, as then we would have an equally large directory of subdirectories. Of course, we could make a fan-out of fan-outs, but this would require some significant code changes, as well as long outage to implement, and frankly it's an ungraceful solution. So don't keep us in suspense! What is going to be the graceful solution to this? Fewer files? A database? ReiserFS?! Or another one or two levels of fan-out? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
Thanks for the Info Matt, and it is good that we know that you are working on this. Greetings from Germany NRW Ulli |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Answer: fewer files. We have a lot to delete, and we're deleting as fast as we can without hurting normal operations. Not sure what you mean by database. Our database is just fine. Faster database won't help the current problem. ReiserFS: well, our current upload/download file server doesn't support it. We are pretty certain we can get by without it, as long as we optimize our code and wait for queues to drain. Bear in mind the current condition is pathological, and we should have far less files on disk than we do now. Of course, we are working towards making sure this doesn't happen again (if at all possible)! More fan-out levels: I mention this in the tech note. Would require major programming change (major in that all our working systems would have to be broken open, recompiled, tested, etc.), but this would only solve the current problem. Then would require a long outage to move a half terabyte of files around, which would aggravate the current problem. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
ML1 Send message Joined: 25 Nov 01 Posts: 20372 Credit: 7,508,002 RAC: 20 |
That works fine for now. Will the active upload/download file count stay manageable for the future when s@h-classic is closed? Not sure what you mean by database. Our database is just fine. Faster database won't help the current problem. There are many small files, ReiaserFS is not supported: Hence use a seperate database just for handling the upload results files and even the download WUs?... OK on the fan-out levels being a very big thing to change and to include in all the various bits of code. Perhaps modularise it into another server process dedicated to get and put files? (Or is this getting to be too much like another database?!) Aside: I've recently reshuffled 200GBytes of files between four partitions across two physical disks. Yes, it does take a long time! Thanks for the heads-up. Keep with the good work! Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Sir Ulli Send message Joined: 21 Oct 99 Posts: 2246 Credit: 6,136,250 RAC: 0 |
Check out the new item in tech news... i think there is also an I-net problem there Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:Dokumente und Einstellungenulli>ping setiathome.ssl.berkeley.edu Ping setiathome.ssl.berkeley.edu [128.32.18.152] mit 32 Bytes Daten: Antwort von 128.32.18.152: Bytes=32 Zeit=364ms TTL=235 Antwort von 128.32.18.152: Bytes=32 Zeit=235ms TTL=235 Zeitüberschreitung der Anforderung. Antwort von 128.32.18.152: Bytes=32 Zeit=249ms TTL=235 Ping-Statistik für 128.32.18.152: Pakete: Gesendet = 4, Empfangen = 3, Verloren = 1 (25% Verlust), Ca. Zeitangaben in Millisek.: Minimum = 235ms, Maximum = 364ms, Mittelwert = 282ms C:Dokumente und Einstellungenulli>tracert setiathome.ssl.berkeley.edu Routenverfolgung zu setiathome.ssl.berkeley.edu [128.32.18.152] über maximal 30 Abschnitte: 1 1 ms 1 ms 1 ms 192.168.0.22 2 63 ms 76 ms 160 ms 212-62-80-254.teleos-web.de [212.62.80.254] 3 179 ms 61 ms 185 ms m5-re.dts-online.net [212.62.64.3] 4 163 ms 60 ms 195 ms m10-hf.dts-online.net [212.62.64.30] 5 220 ms 65 ms 145 ms DTS.DO-2-pos130.de.lambdanet.net [217.71.111.29] 6 155 ms 64 ms 71 ms DUS-2-pos210.de.lambdanet.net [217.71.105.57] 7 69 ms 68 ms 201 ms AMS-2-pos100.nl.lambdanet.net [82.197.128.17] 8 96 ms 207 ms 73 ms gsr12416.ams.he.net [195.69.145.150] 9 81 ms 82 ms 201 ms pos0-0.gsr12416.lon.he.net [216.66.24.157] 10 164 ms 203 ms 225 ms pos8-0.gsr12416.nyc.he.net [216.218.200.101] 11 336 ms 344 ms 235 ms pos7-0.gsr12012.sjc.he.net [216.218.254.153] 12 233 ms 243 ms 259 ms pos1-2.gsr12416.fmt.he.net [64.71.128.182] 13 232 ms 245 ms 232 ms pos2-1.gsr12416.pao.he.net [64.62.249.122] 14 232 ms 291 ms 249 ms paix-px1--hurricane-ge.cenic.net [198.32.251.69] 15 368 ms 270 ms 275 ms dc-oak-dc2--oakk-dc1-p2p-1.cenic.net [137.164.22 .193] 16 338 ms 391 ms 278 ms ucb--oak-dc2-ge.cenic.net [137.164.23.30] 17 313 ms 249 ms 258 ms g3-14.inr-202-reccev.Berkeley.EDU [128.32.0.39] 18 278 ms 244 ms 260 ms g6-2.inr-230-spr.Berkeley.EDU [128.32.255.114] 19 247 ms * 286 ms solen.SSL.Berkeley.EDU [128.32.18.209] 20 * * * Zeitüberschreitung der Anforderung. 21 * * * Zeitüberschreitung der Anforderung. 22 * * * Zeitüberschreitung der Anforderung. 23 * * * Zeitüberschreitung der Anforderung. 24 398 ms * * klaatu.ssl.berkeley.edu [128.32.18.152] 25 240 ms * 237 ms klaatu.ssl.berkeley.edu [128.32.18.152] Ablaufverfolgung beendet. C:Dokumente und Einstellungenulli> tracert and also Ping are report Problems... Greetings from Germany NRW Ulli |
Don Erway Send message Joined: 18 May 99 Posts: 305 Credit: 471,946 RAC: 0 |
All the queues are shrinking, except one... I suggest that rather than come up with a way to "solve" this problem of too many files, which should really be a condition that is never allowed to happen anyway, the system just be set up to throttle back on WU output, whenever the queues are higher than an hour or so. Stop pouring out new WUs. Cut them back enough until you see the validator queue start to actually drop, then keep them cut there, until it drops all the way. There are plenty of other worthy projects to take up any spare cycles, and keeping the queues small at all times, guarantees good file access performance, so the whole system can run as designed. |
Ananas Send message Joined: 14 Dec 01 Posts: 195 Credit: 2,503,252 RAC: 0 |
I can confirm the network (traceroute/ping) trouble :-( _____________________________ As of the file system slowdown : It sometimes helps to create a new directory, hardlink all files to the new directory, delete the old directory and rename the new one to the name of the old one. |
N/A Send message Joined: 18 May 01 Posts: 3718 Credit: 93,649 RAC: 0 |
[font='courier,courier new']Don't just cut back on WU production, but re-task the splitter: Make it a temporary validator. Isn't there cluster SW that can do that (and how hard would it be to implement)?[/font] |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
All the queues are shrinking, except one... Remember that some clients have 10 days of work cached. It may take 10 days before the throttling really "throttles" -- and starved clients may just connect more often. |
Toby Send message Joined: 26 Oct 00 Posts: 1005 Credit: 6,366,949 RAC: 0 |
Yep. Definitely something wrong with the network: --- klaatu.ssl.berkeley.edu ping statistics --- 100 packets transmitted, 70 received, [b]30% packet loss[/b], time 107238ms rtt min/avg/max/mdev = 76.058/83.147/103.322/5.213 ms I confirmed from 2 other locations. One of them is on internet2 so the problem appears to be internal to berkeley unless I1 and I2 traffic go over the same wire coming into berkeley. Maybe if we could block out the alien radio signal that is causing interference on the line... A member of The Knights Who Say NI! For rankings, history graphs and more, check out: My BOINC stats site |
Richard Smith Send message Joined: 2 Feb 00 Posts: 19 Credit: 7,319,258 RAC: 0 |
[quote] Why not store these tiny files in a database rather than seperate files? |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
Yep. Definitely something wrong with the network: Maybe this? August 11, 2005 There is a hardware problem with the building network here at SSL. This is affecting the scheduling and web servers. You may see intermittent connection problems. The SSL network folks are working on a fix. |
tekwyzrd Send message Joined: 21 Nov 01 Posts: 767 Credit: 30,009 RAC: 0 |
Yep. Definitely something wrong with the network: I've been connecting to the forum with no problems since just after 4am EST Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws. Douglas Adams (1952 - 2001) |
ML1 Send message Joined: 25 Nov 01 Posts: 20372 Credit: 7,508,002 RAC: 20 |
...The only bottleneck slowing validation down is large directory sizes on the upload/download filesystem. This is being addressed in many ways, ... - Matt Matt, have you noted this good point from doublechaz? Deleting the files does not necessarily reduce the number of directory entries. The directory table remains at the maximum size from whenever you had the maximum number of files listed! Hence, you get no speedup from deleting old files. Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.