Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation
Previous · 1 . . . 48 · 49 · 50 · 51 · 52 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Anonymous Platform here. . . Exactly the same here, I was only able to report by using NTT, but once work fetch is turned back on "no tasks". Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . I have tried reconfiguring the location for this machine to accept only CPU work and 'other work when requested work not available' but it still comes back with 'no tasks'. . . I guess the servers as they are will not talk to anything it has identified as 'anonymous platform'. Stephen :( |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13871 Credit: 208,696,464 RAC: 304 |
I guess the servers as they are will not talk to anything it has identified as 'anonymous platform'.Even running stock, it's hit & miss making contact with the Scheduler, and quite a few responses are "Project has no tasks available" when you do. But when you do get work, you get a lot of it. But I gave up with running it as stock because when I got some SoG WUs the downloads errored out & after that I got almost nothing but CUDA42 work, with the odd CUDA50. With runtimes on par with 10 year old hardware on current hardware it didn't make much sense to continue with that. Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
I'm looking into the problem. Grrrrr..... Thanks Eric for looking into this on the weekend |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1857 Credit: 268,616,081 RAC: 1,349 |
I'm looking into the problem. Grrrrr..... +1 !! |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Debugging the server is virtually impossible. If anyone wants to help.... The setiathome_server branch is at https://github.com/BOINC/boinc/tree/setiathome_server/sched Something goes wrong in the function SCHED_SHMEM::no_work. bool SCHED_SHMEM::no_work(int pid) { if (!ready) return true; for (int i=0; i<max_wu_results; i++) { if (wu_results[i].state == WR_STATE_PRESENT) { wu_results[i].state = pid; return false; } } return true; } This function works properly unless the requesting computer has anonymous platform apps, for which it always returns true. How could that be? I don't know despite additional 500 lines of debugging code. It's almost as if something else is pausing anonymous platform requests until the queue is empty. Well it's bed time now. :( @SETIEric@qoto.org (Mastodon) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Can't you just reload the previous server level 709 code instead of the level 715 code? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Good night Eric. Get some rest. Rested eyes are much more likely to see something than tired eyes. Thank you for taking the time to look at this. Hopefully one of the other here will have some ideas. |
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Unfortunately there a database change that renders the 7.09 server inoperable. :( Really going to bed this time. @SETIEric@qoto.org (Mastodon) |
rob smith Send message Joined: 7 Mar 03 Posts: 22606 Credit: 416,307,556 RAC: 380 |
Angela has called ;-) I've had a quick look at the bowl of spaghetti, sorry "code", that surrounds the lines that Eric posted. It is one of the many tangled bits, and it is quite probable that there is another process lurking around that sets one of the triggers that stops the delivery of tasks to anonymous applications - and I haven't got any of my track&trace tools with me. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13871 Credit: 208,696,464 RAC: 304 |
It's almost as if something else is pausing anonymous platform requests until the queue is empty. Well it's bed time now. :(It's not just Anonymous platform- stock client requests are also taking ages, it's just that Anonymous Platform systems get hit even harder resulting in the host getting nothing. The normal response time for a Scheduler response is 2-3sec. Ever since this issue began, Scheduler response times have been 30sec to 2min or so (Scheduler timeout response). It's just that with the Stock application, while you still get some Scheduler error responses, they aren't as frequent. Nor are the "Project has no tasks available" messages if you do get a valid response, and when you do get work you get a lot of it with a single request. So while many requests on a Stock system still result in no work, you do get enough good requests to keep your cache full. The fact that this delay issue stops Anonymous Platform systems from getting work is a bug in it's own right, the delay is still affecting Stock systems from contacting the Scheduler & getting work when they do. While sorting out what is causing the delay won't fix the underlying bug affecting Anonymous Platform systems, but at least it should get work flowing regularly again, for all platform systems. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22606 Credit: 416,307,556 RAC: 380 |
The other day Eric posted this: This problem may be affecting the rate at which the main project can handle results, so the validation and assimilation queues are getting large, which may affect the rate of work generation. Eric, ever the master of understatement. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
wujj123456 Send message Joined: 5 Sep 04 Posts: 40 Credit: 20,877,975 RAC: 219 |
Unfortunately there a database change that renders the 7.09 server inoperable. :( I also saw this from a different project just now: https://universeathome.pl/universe/forum_thread.php?id=486 I don't know how boinc projects are operated, but if that's not a coincidence, there seems to be some changes forcing every project to upgrade now? Even if it's the worst case everyone just dragged their feet for upgrades after months/years, it's still not very nice to set a deadline or force it to happen around major holidays... |
rob smith Send message Joined: 7 Mar 03 Posts: 22606 Credit: 416,307,556 RAC: 380 |
Each project has its own servers, administrators etc., so this is just a sad coincedence. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
Some quick thoughts to go with the first coffee of the day. 1) Eric mentioned a database change meaning the old code couldn't be used. That means it was a deliberate upgrade, and everything everyone said about not doing that on a pre-holiday Friday needs hanging up in neon fairylights. 2) I think the excessive delays tie in the the re-enabling of 'Resend Lost Results'. Let's treat that as a separate problem. Maybe the upgrade put in a default configuration setting (easy), or maybe it broke the 'off' switch. 3) Eric has given us a code snippet to work from, and a symptom: 'always returns true for anonymous platform'. I looked at the code yesterday, and I looked at the history of changes to that and related files. Just two caught my eye: the addition of keyword support for Science United, and some code to allow tasks to be processed by a specific version number of the science application. Just maybe, one or both of those were added for stock apps, but not for anon plat? It'll give me something to read when I wake up... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
Each project has its own servers, administrators etc., so this is just a sad coincidence.But they all use the same server code, from BOINC. There are occasional problems like security alerts which prompt a mass updating, but unfortunately the Universe administrator doesn't say WHY the server 'must be updated' or WHY it had to happen 'today'. I haven't seen any email traffic to that effect. |
elec999 Send message Joined: 24 Nov 02 Posts: 375 Credit: 416,969,548 RAC: 141 |
Anyone else got getting new work? Sun 22 Dec 2019 04:51:43 AM EST | SETI@home | Sending scheduler request: To fetch work. Sun 22 Dec 2019 04:51:43 AM EST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU Sun 22 Dec 2019 04:52:51 AM EST | SETI@home | Scheduler request completed: got 0 new tasks Sun 22 Dec 2019 04:52:51 AM EST | SETI@home | Project has no tasks available Sun 22 Dec 2019 04:51:42 AM EST | | Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu Sun 22 Dec 2019 04:51:42 AM EST | | log flags: file_xfer, sched_ops, task Sun 22 Dec 2019 04:51:42 AM EST | | Libraries: libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.39.2 librtmp/2.3 Sun 22 Dec 2019 04:51:42 AM EST | | Data directory: /var/lib/boinc-client Sun 22 Dec 2019 04:51:43 AM EST | | CUDA: NVIDIA GPU 0: GeForce RTX 2060 SUPER (driver version 440.36, CUDA version 10.2, compute capability 7.5, 4096MB, 3970MB available, 7311 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 440.36, CUDA version 10.2, compute capability 6.1, 4096MB, 3968MB available, 8186 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | | CUDA: NVIDIA GPU 2: GeForce GTX 1050 Ti (driver version 440.36, CUDA version 10.2, compute capability 6.1, 4040MB, 3978MB available, 2138 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 SUPER (driver version 440.36, device version OpenCL 1.2 CUDA, 7979MB, 3970MB available, 7311 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 440.36, device version OpenCL 1.2 CUDA, 8120MB, 3968MB available, 8186 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | | OpenCL: NVIDIA GPU 2: GeForce GTX 1050 Ti (driver version 440.36, device version OpenCL 1.2 CUDA, 4040MB, 3978MB available, 2138 GFLOPS peak) Sun 22 Dec 2019 04:51:43 AM EST | SETI@home | Found app_info.xml; using anonymous platform Sun 22 Dec 2019 04:51:43 AM EST | | [libc detection] gathered: 2.30, Ubuntu GLIBC 2.30-0ubuntu2 Sun 22 Dec 2019 04:51:43 AM EST | | Host name: seti-AB350-Gaming Sun 22 Dec 2019 04:51:43 AM EST | | Processor: 16 AuthenticAMD AMD Ryzen 7 2700X Eight-Core Processor [Family 23 Model 8 Stepping 2] Sun 22 Dec 2019 04:51:43 AM EST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca Sun 22 Dec 2019 04:51:43 AM EST | | OS: Linux Ubuntu: Ubuntu 19.10 [5.3.0-24-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)] |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14681 Credit: 200,643,578 RAC: 874 |
Anyone else got getting new work?Try reading, instead of writing. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I'm looking into the problem. Grrrrr..... . . Thanks Eric, and lotsa luck! Stephen . . |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
Debugging the server is virtually impossible. If anyone wants to help.... The setiathome_server branch is at Until someone can figure out why and fix it, is it possible to hard code it to return false if the requesting computer has anonymous platform apps? Barring that, does anyone have ideas on how to make the client side hide that it is running anonymous platform? All my PCs were running great until this happened, and I don't want to mess them up by going back to stock, especially given how slow it is. If someone can show me it is really simple to switch to/from stock, I would be willing to try on my slowest box at least to be running something. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.