Message boards :
Number crunching :
Panic Mode On (111) Server Problems?
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 31 · Next
Author | Message |
---|---|
rob smith Send message Joined: 7 Mar 03 Posts: 22535 Credit: 416,307,556 RAC: 380 |
Yup - both struggle with the vast size of the database they have to plough through. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Good morning everyone. I think Grant has a fair point there, and it would be better if barriers to receiving work only existed where they are strictly necessary. Having said that, the problem really only affects the rate work is completed for computers which can't cache enough work to last out the standard outage - and SETI didn't ask you to build those, either. Ever since SETI transferred to BOINC, the suggestion has been to transfer your resources to an alternative backup project when trouble strikes or available work is low. It will be interesting to see, starting tomorrow, how long the 'standard outage' lasts after the reorganisation last week.Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call.It's no good just splitting work if it's going to be withheld from those that want to process it. Back to the problem at hand. As you might imagine, Eric and I have discussed scheduler issues sporadically over the years. I've looked back through the trail, and the most recent occasion was 26 October 2016, when Eric Korpela wrote: Yes, the patch did get applied to SETI beta yesterday. Things seem to be running well, so will probably be promoted to SETI main in the next day or two.That was to do with fixing a totally different problem - the XML transmission of preferences between computers and projects, #1676 and #1470 - which had been troubling Einstein, but it needed changes here too. My guess is that we picked up the latest central scheduler code at that point. The problems over the botched application deployment a month or two later might have drawn attention to scheduler issues, and caused people to assume that the new Scheduler had arrived in December, too. I don't suppose anybody remembers stress-testing the Beta scheduler in October/November 2016, do they? Me neither. I'll carry on plodding through code and tests as opportunities and inclination allow, but the belated arrival of Spring is tempting me to spend more time outside. |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
Yup - both struggle with the vast size of the database they have to plough through.Possible solution? Database 1-9903 Database 2-0408 Database 3-0913 Database 4-1418 Database 5-1923 Nice manageable chunks which would also alleviate the current database issues as database 5 will come into effect on this years anniversary & would make things fly :-) |
rob smith Send message Joined: 7 Mar 03 Posts: 22535 Credit: 416,307,556 RAC: 380 |
Date splitting is the simplest, but it does not make it any easier because the objective is to compare spacial locations over time, so a spacial split, possibly with some degree of overlap, would be more sensible - a sort of "super pixel" in Nebual terms? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
Understood but like everything else one has to make the best of what is available. As it stands, carrying on as is, the time will come when the database will be too large to handle. Sometimes compromise is more than acceptable & don't forget ALL the data is still available (for when better solutions become available). At least with it split, it will only be a matter of creating the right tables :-) Edit 2: Also after an xx amount of time, the only database that will require maintenance/backup will be the current one which would also simplify the weekly outage :-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Nice manageable chunks which would also alleviate the current database issues as database 5 will come into effect on this years anniversary & would make things fly :-)You're mixing databases again, as many have done before you. It's the science database (Informix) which holds the 20-year archive: it's the BOINC database (MySQL) which holds only a couple of weeks of data which is time-critical and really has to fly - especially on Tuesday night / Wednesday morning when queues for kibble are forming outside the server closet. If you really want to compartmentalise the science DB, can I suggest instead workunits 1 - 500,000,000 500,000,001 - 1,000,000,000 1,000,000,001 - 1,500,000,000 1,500,000,001 - 2,000,000,000 2,000,000,001 - 2,500,000,000 2,500,000,001 -We're still coming up to a nice transition point at 3 billion (work that out in binary if you prefer), but we didn't do the early sets at even speed. |
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
Ah thanks. My solution was for the Informix database, but yours is better :-) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Another thing, and you consistently refuse to accept this - SETI@Home has NEVER promised us a continuous flow of work, so do not expect to see you cache's filled on ever call. +1 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
OK, back to work fetch. My test machine has been running steadily overnight, so the values should have settled a bit. I'm seeing estimates for guppies at 1:38:00 on CPU, 09:49 on GPU - i7 and GTX 670 respectively. That's a nice factor of 10 - 98 minutes to 9.8 minutes - in the speed. I'm running 6 cores, so the CPU queue should last 1.6 times longer than the GPU queue. There are some Arecibo tasks coming through now, so figures will shift about a bit, but I'm saying roughly 0.7 days GPU, 1.2 days CPU. For the first test, I set the cache to 0.8 days and no additional. The first fetch took me immediately to 100 GPU tasks and 83 CPU tasks, which by my current theory should allow both resources to fetch work as needed. This was the next scheduler contact: 09/04/2018 13:54:47 | SETI@home | Sending scheduler request: To fetch work. 09/04/2018 13:54:47 | SETI@home | Reporting 1 completed tasks 09/04/2018 13:54:47 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU 09/04/2018 13:54:47 | SETI@home | [sched_op] CPU work request: 4412.68 seconds; 0.00 devices 09/04/2018 13:54:47 | SETI@home | [sched_op] NVIDIA GPU work request: 14460.88 seconds; 0.00 devices 09/04/2018 13:54:49 | SETI@home | Scheduler request completed: got 2 new tasks 09/04/2018 13:54:49 | SETI@home | Project requested delay of 303 seconds 09/04/2018 13:54:49 | SETI@home | [sched_op] estimated total CPU task duration: 5866 seconds 09/04/2018 13:54:49 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 590 seconds- one for each, which is a promising start. The sequence seems to be continuing, with 09/04/2018 14:10:09 | SETI@home | Sending scheduler request: To fetch work.(there were a couple in between too, but I won't bore you with every detail). The important point is that NVidia work continues to be allocated, and the NV cache is steady at 100 (CPU now 84). I'll let that run for a while (maybe go out for a walk again), and check the logs to make sure the cache has been kept topped up. Then, the acid test - bump the CPU cache up to 100, and see if it becomes harder to get GPU work. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
OK, back to work fetch. My test machine has been running steadily overnight, so the values should have settled a bit. I'm seeing estimates for guppies at 1:38:00 on CPU, 09:49 on GPU - i7 and GTX 670 respectively. That's a nice factor of 10 - 98 minutes to 9.8 minutes - in the speed. I'm running 6 cores, so the CPU queue should last 1.6 times longer than the GPU queue. There are some Arecibo tasks coming through now, so figures will shift about a bit, but I'm saying roughly 0.7 days GPU, 1.2 days CPU. . . I would not anticipate any problems for this test as work seems to be flowing perfectly normally for everyone at the moment, but at least it can establish a baseline for behaviour when things are AOK. The interesting thing would be to re-run the test if/when the problem resurfaces and see if such a configuration tends to provoke or exacerbate the problem when it is happening. Stephen . . |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Fair comment. Prod me if I don't notice problems starting up again - they never seem to trouble my GPU-only SETI crunchers (they do have CPUs too, but I do other things with them). Edit - the Arecibo tasks I'me getting on that machine seem to be a mix of all types - VHAR, middling, and VLAR. That probably makes it a better time for testing. |
rob smith Send message Joined: 7 Mar 03 Posts: 22535 Credit: 416,307,556 RAC: 380 |
A few hours after the end of the outrage should be a good time to see what's going on as that appears to be when the issue gets reported most often. Thinking back - We all know that in the first couple of hours after the outrage there is a big splurge, however I noticed the other week that one of my crunchers was in a 24 hour(ish) back-off, the other three were of the order of 1 hour. On the basis of my admittedly very small sample of four crunchers that would suggest that 25% of crunchers would be suffering a similar delay, and so it might be that there is a substantial second splurge of demand about a day later than the main one and that may well cause some (not all) of the delays and slow work supply. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It's still working normally at the moment, reported tasks are being replaced. I'll let you know the minute reported tasks are not replaced. I'm currently running a One day cache, the CPU tasks are running about 40 tasks to the GPUs 300. My 10 year old CPU will finish the current BLCs in about 70 minutes using the Mac CPU App r3711, the GTX 1060 takes about 3 minutes, and the 1050 Ti just under 5 minutes. That equates to a completed task about every minute, meaning 5 tasks last 5 minutes, or 60 tasks an hour. Mon Apr 9 09:56:42 2018 | SETI@home | [sched_op] Starting scheduler request Mon Apr 9 09:56:47 2018 | SETI@home | Sending scheduler request: To report completed tasks. Mon Apr 9 09:56:47 2018 | SETI@home | Reporting 6 completed tasks Mon Apr 9 09:56:48 2018 | SETI@home | Scheduler request completed: got 6 new tasks Mon Apr 9 10:01:52 2018 | SETI@home | [sched_op] Starting scheduler request Mon Apr 9 10:01:57 2018 | SETI@home | Sending scheduler request: To report completed tasks. Mon Apr 9 10:01:57 2018 | SETI@home | Reporting 8 completed tasks Mon Apr 9 10:01:58 2018 | SETI@home | Scheduler request completed: got 7 new tasks Mon Apr 9 10:07:01 2018 | SETI@home | [sched_op] Starting scheduler request Mon Apr 9 10:07:06 2018 | SETI@home | Sending scheduler request: To report completed tasks. Mon Apr 9 10:07:06 2018 | SETI@home | Reporting 5 completed tasks Mon Apr 9 10:07:07 2018 | SETI@home | Scheduler request completed: got 5 new tasks Mon Apr 9 10:12:15 2018 | SETI@home | [sched_op] Starting scheduler request Mon Apr 9 10:12:20 2018 | SETI@home | Sending scheduler request: To report completed tasks. Mon Apr 9 10:12:20 2018 | SETI@home | Reporting 5 completed tasks Mon Apr 9 10:12:21 2018 | SETI@home | Scheduler request completed: got 5 new tasks Mon Apr 9 10:17:29 2018 | SETI@home | [sched_op] Starting scheduler request Mon Apr 9 10:17:34 2018 | SETI@home | Sending scheduler request: To report completed tasks. Mon Apr 9 10:17:34 2018 | SETI@home | Reporting 6 completed tasks Mon Apr 9 10:17:35 2018 | SETI@home | Scheduler request completed: got 7 new tasks Note that One second turnaround time even when receiving New tasks. Later tonight My CPU cache will be quite Full, so, tomorrow will be meaningless....unless SETI stops sending tasks. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Yes, working 'by the book' here too. It worked for ~3 hours with the cache bouncing off the GPU top limit - getting replacements when there was work to report, limit report when there weren't. So I upped the cache to 1.25 days, and immediately reached both limits at once, 200 in total. The next four RPCs were (nothing to report) 09/04/2018 17:05:02 | SETI@home | Sending scheduler request: To fetch work.(NVidia to report) 09/04/2018 17:10:09 | SETI@home | Sending scheduler request: To fetch work.(CPU to report) 09/04/2018 17:15:18 | SETI@home | Sending scheduler request: To fetch work.(tada - both to report) 09/04/2018 17:20:28 | SETI@home | Sending scheduler request: To fetch work.And it all worked fine. OK, let the bad times roll, and we'll see what we can pluck out of them |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
It should be interesting to see what happens when the hungry hoards descend upon the servers after tomorrow's outage. Meow!! "Time is simply the mechanism that keeps everything from happening all at once." |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So it begins; Machine 1 Mon Apr 9 18:04:05 2018 | SETI@home | Sending scheduler request: To report completed tasks. Machine 2 Mon Apr 9 18:02:42 2018 | SETI@home | Sending scheduler request: To report completed tasks. Machine 3 Mon 09 Apr 2018 06:21:16 PM EDT | SETI@home | Sending scheduler request: To report completed tasks.Starting to get a bit wonky. My machines seem to have recovered, for now. I don't know how others are doing. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Donkey seemed to have come good as well until this Tue 10 Apr 2018 10:24:43 AEST | SETI@home | Finished upload of 02dc17ac.9698.885.10.37.134_1_r491679463_0 Tue 10 Apr 2018 10:25:35 AEST | | Running CPU benchmarks Tue 10 Apr 2018 10:25:35 AEST | | Suspending computation - CPU benchmarks in progress Tue 10 Apr 2018 10:26:06 AEST | | Benchmark results: Tue 10 Apr 2018 10:26:06 AEST | | Number of CPUs: 1 Tue 10 Apr 2018 10:26:06 AEST | | 3066 floating point MIPS (Whetstone) per CPU Tue 10 Apr 2018 10:26:06 AEST | | 14697 integer MIPS (Dhrystone) per CPU Tue 10 Apr 2018 10:26:07 AEST | | Resuming computation Tue 10 Apr 2018 10:28:11 AEST | SETI@home | Sending scheduler request: To fetch work. Tue 10 Apr 2018 10:28:11 AEST | SETI@home | Reporting 1 completed tasks Tue 10 Apr 2018 10:28:11 AEST | SETI@home | Requesting new tasks for NVIDIA Tue 10 Apr 2018 10:28:14 AEST | SETI@home | Scheduler request completed: got 1 new tasks Tue 10 Apr 2018 10:28:16 AEST | SETI@home | Started download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar Tue 10 Apr 2018 10:28:28 AEST | SETI@home | Computation for task 02dc17ac.9698.885.10.37.144_1 finished Tue 10 Apr 2018 10:28:28 AEST | SETI@home | Starting task 02dc17ac.9698.885.10.37.113_0 Tue 10 Apr 2018 10:28:30 AEST | SETI@home | Started upload of 02dc17ac.9698.885.10.37.144_1_r1160258381_0 Tue 10 Apr 2018 10:28:34 AEST | SETI@home | Finished upload of 02dc17ac.9698.885.10.37.144_1_r1160258381_0 Tue 10 Apr 2018 10:30:14 AEST | | Project communication failed: attempting access to reference site Tue 10 Apr 2018 10:30:14 AEST | SETI@home | Temporarily failed download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar: transient HTTP error Tue 10 Apr 2018 10:30:14 AEST | SETI@home | Backing off 00:03:13 on download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar Tue 10 Apr 2018 10:30:16 AEST | | Internet access OK - project servers may be temporarily down. Tue 10 Apr 2018 10:31:46 AEST | SETI@home | Computation for task 02dc17ac.9698.885.10.37.113_0 finished Tue 10 Apr 2018 10:31:46 AEST | SETI@home | Starting task 02dc17ac.9698.885.10.37.120_1 Tue 10 Apr 2018 10:31:48 AEST | SETI@home | Started upload of 02dc17ac.9698.885.10.37.113_0_r1275940128_0 Tue 10 Apr 2018 10:31:52 AEST | SETI@home | Finished upload of 02dc17ac.9698.885.10.37.113_0_r1275940128_0 Tue 10 Apr 2018 10:33:29 AEST | SETI@home | Started download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar Tue 10 Apr 2018 10:34:20 AEST | SETI@home | Sending scheduler request: To fetch work. Tue 10 Apr 2018 10:34:20 AEST | SETI@home | Reporting 2 completed tasks Tue 10 Apr 2018 10:34:20 AEST | SETI@home | Requesting new tasks for NVIDIA Tue 10 Apr 2018 10:34:23 AEST | SETI@home | Scheduler request completed: got 0 new tasks Tue 10 Apr 2018 10:34:23 AEST | SETI@home | No tasks sent Tue 10 Apr 2018 10:34:23 AEST | SETI@home | Tasks for CPU are available, but your preferences are set to not accept them Tue 10 Apr 2018 10:34:23 AEST | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them Tue 10 Apr 2018 10:34:23 AEST | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them Tue 10 Apr 2018 10:34:23 AEST | SETI@home | This computer has reached a limit on tasks in progress Tue 10 Apr 2018 10:35:04 AEST | SETI@home | Computation for task 02dc17ac.9698.885.10.37.120_1 finished Tue 10 Apr 2018 10:35:04 AEST | SETI@home | Starting task 02dc17ac.9698.885.10.37.97_0 Tue 10 Apr 2018 10:35:06 AEST | SETI@home | Started upload of 02dc17ac.9698.885.10.37.120_1_r238800797_0 Tue 10 Apr 2018 10:35:10 AEST | SETI@home | Finished upload of 02dc17ac.9698.885.10.37.120_1_r238800797_0 Tue 10 Apr 2018 10:38:22 AEST | SETI@home | Computation for task 02dc17ac.9698.885.10.37.97_0 finished Tue 10 Apr 2018 10:38:22 AEST | SETI@home | Starting task 02dc17ac.9698.2112.10.37.70_1 Tue 10 Apr 2018 10:38:24 AEST | SETI@home | Started upload of 02dc17ac.9698.885.10.37.97_0_r1440566348_0 Tue 10 Apr 2018 10:38:27 AEST | SETI@home | Finished upload of 02dc17ac.9698.885.10.37.97_0_r1440566348_0 Tue 10 Apr 2018 10:38:30 AEST | | Project communication failed: attempting access to reference site Tue 10 Apr 2018 10:38:30 AEST | SETI@home | Temporarily failed download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar: transient HTTP error Tue 10 Apr 2018 10:38:30 AEST | SETI@home | Backing off 00:06:42 on download of blc04_2bit_blc04_guppi_58152_84527_DIAG_PSR_J0613-0200_0007.4705.0.21.44.240.vlar Tue 10 Apr 2018 10:38:32 AEST | | Internet access OK - project servers may be temporarily down. . . Such is life ... :( Stephen |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
So it begins; . . It started here @ about 8:55 am AEST. But after a couple of those "no tasks sent" there was a correction, then one good RPC then back to the "No tasks sent". . . I stopped and restarted BOINC and it was fine after that and still is on Bertie, but while Donkey was OK too it only lasted until a very different episode which I posted seperately. Stephen :( |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
So it begins; Keith-Windows7 335 SETI@home 4/9/2018 18:21:49 [sched_op] Starting scheduler request 336 SETI@home 4/9/2018 18:21:49 Sending scheduler request: To fetch work. 337 SETI@home 4/9/2018 18:21:49 Reporting 5 completed tasks 338 SETI@home 4/9/2018 18:21:49 Requesting new tasks for NVIDIA GPU 339 SETI@home 4/9/2018 18:21:49 [sched_op] CPU work request: 0.00 seconds; 0.00 devices 340 SETI@home 4/9/2018 18:21:49 [sched_op] NVIDIA GPU work request: 210899.30 seconds; 0.00 devices 341 SETI@home 4/9/2018 18:21:53 Scheduler request completed: got 0 new tasks 342 SETI@home 4/9/2018 18:21:53 [sched_op] Server version 709 343 SETI@home 4/9/2018 18:21:53 No tasks sent 344 SETI@home 4/9/2018 18:21:53 This computer has reached a limit on tasks in progress 345 SETI@home 4/9/2018 18:21:53 Project requested delay of 303 seconds Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Numbskull 331 SETI@home 4/9/2018 18:32:49 [sched_op] Starting scheduler request 332 SETI@home 4/9/2018 18:32:49 Sending scheduler request: To fetch work. 333 SETI@home 4/9/2018 18:32:49 Reporting 6 completed tasks 334 SETI@home 4/9/2018 18:32:49 Requesting new tasks for CPU and NVIDIA GPU 335 SETI@home 4/9/2018 18:32:49 [sched_op] CPU work request: 463922.62 seconds; 0.00 devices 336 SETI@home 4/9/2018 18:32:49 [sched_op] NVIDIA GPU work request: 198533.61 seconds; 0.00 devices 337 SETI@home 4/9/2018 18:32:51 Scheduler request completed: got 0 new tasks 338 SETI@home 4/9/2018 18:32:51 [sched_op] Server version 709 339 SETI@home 4/9/2018 18:32:51 No tasks sent 340 SETI@home 4/9/2018 18:32:51 This computer has reached a limit on tasks in progress 341 SETI@home 4/9/2018 18:32:51 Project requested delay of 303 seconds Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.