The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 107 · Next

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035615 - Posted: 4 Mar 2020, 10:58:56 UTC - in response to Message 2035614.  
Last modified: 4 Mar 2020, 11:00:32 UTC

If you;'re not going to process them, why download them??? Why not just set No New Tasks??? Or just set that system to another Venue & disable GPU processing?????
Because it is only that program that runs buggy on my system, and I'm not about to make a 4th configuration to satisfy that particular program.
If you are running Anonymous Platform, then just remove the SoG app from your app_info.xml. If you are not running anonymous, then do this:

Stop Boinc, then edit cc_config.xml to add this line to the <options> section:

<dont_check_file_sizes>1</dont_check_file_sizes>

and copy your best working GPU app executable over the SoG app executable in the project directory. Then restart Boinc.

Now you are actually using this other app to crunch the tasks the scheduler sends for SoG app.


Na. Stock BOINC/SAH configuration. They'd shut this plan class down, and were getting ready to upgrade it recently because they acknowledged the bugginess. Alas, the move to 8.23 won't happen. It's just as easy to set them to abort. They only come one time during the day. I have a gig connection, so it isn't a drain on my system, but more on theirs.
ID: 2035615 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035619 - Posted: 4 Mar 2020, 11:08:42 UTC

Well, looks like delivery is going to be FUBAR the rest of the night. I got one download of significance, but I'm stuck on the got 0 new tasks since. Got my two chapter exams out of the way, so it is time for shuteye. This will sort out sooner or later.
ID: 2035619 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035627 - Posted: 4 Mar 2020, 11:41:13 UTC - in response to Message 2035615.  

It's just as easy to set them to abort.
Easy for you but it makes you part of the database size problem.

Any errored or aborted task is adding an extra result to the database because it gets resent. Even when the splitters have been stopped because the database is too big. Resends adding stuff when splitters have been stopped is a bit like decay heat in a shut down nuclear reactor. If the heat can't be removed (assimilator problem), the result is a meltdown.
ID: 2035627 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035631 - Posted: 4 Mar 2020, 11:48:02 UTC - in response to Message 2035619.  

Well, looks like delivery is going to be FUBAR the rest of the night. I got one download of significance, but I'm stuck on the got 0 new tasks since. Got my two chapter exams out of the way, so it is time for shuteye. This will sort out sooner or later.
How are you able to download anything when all I'm getting is this (the times are UTC+2):

04-Mar-2020 08:17:40 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 08:22:21 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 08:26:36 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 08:36:28 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 08:48:27 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 09:12:37 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 09:58:12 [SETI@home] Scheduler request failed: Failure when receiving data from the peer
04-Mar-2020 11:08:53 [SETI@home] Scheduler request failed: Timeout was reached
04-Mar-2020 11:53:27 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 11:56:57 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 12:35:11 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 12:37:56 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 12:39:41 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 12:42:31 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 12:48:06 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 13:02:13 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 13:21:34 [SETI@home] Scheduler request failed: HTTP internal server error
04-Mar-2020 13:42:33 [SETI@home] Scheduler request failed: HTTP internal server error
ID: 2035631 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2035636 - Posted: 4 Mar 2020, 11:57:16 UTC - in response to Message 2035631.  

How many tasks are you reporting at once? Usually best to report a maximum of 64 tasks (Windows) or 128 tasks (Linux) to keep the file size down (the current Windows apps put an absurd amount of diagnostic data in std_err.txt, bloating the report size). And if that doesn't cure the problem by itself, set 'No New Tasks' while reporting. Then when you're clear, start by requesting small amounts - say 1 hour. All of mine have reported now (~2,000 tasks), and they're starting to refill.
ID: 2035636 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2035637 - Posted: 4 Mar 2020, 11:59:45 UTC - in response to Message 2035631.  

Since I woke up I'm able to report tasks using No New Tasks, last night that wasn't working. A couple of my machines have been able to then download a few tasks ever so often, it's mostly "Project Has No Tasks Available" though.
ID: 2035637 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035650 - Posted: 4 Mar 2020, 12:34:24 UTC - in response to Message 2035636.  

How many tasks are you reporting at once? Usually best to report a maximum of 64 tasks (Windows) or 128 tasks (Linux) to keep the file size down
One hundred at once. Running Linux.
ID: 2035650 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2035653 - Posted: 4 Mar 2020, 12:43:29 UTC

ive had zero problem reporting 250 at once, been doing that for about a year now. which is near the max. you can try to report 1000 or whatever at once, but only ~250 or so will make it through each time.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2035653 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035654 - Posted: 4 Mar 2020, 12:46:54 UTC - in response to Message 2035640.  

But, I'm in no hurry, I will go out to my local Café instead. I get what I want there without any
"Scheduler request completed: got 0 new tasks" replies :-)
- One coffee please.
- Coffee request failed: Timeout was reached
ID: 2035654 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2035665 - Posted: 4 Mar 2020, 13:16:34 UTC

I just receive this msg:

mié 04 mar 2020 08:13:11 EST | SETI@home | Message from server: Your app_info.xml file doesn't have a usable version of AstroPulse v7.


I not do AP for years, and not receive that msg before (at least the lasts months)

Somebody is still playing with the server code? Or just the servers recovering from today's hangover?
ID: 2035665 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035674 - Posted: 4 Mar 2020, 13:28:22 UTC - in response to Message 2035665.  

I just receive this msg:

mié 04 mar 2020 08:13:11 EST | SETI@home | Message from server: Your app_info.xml file doesn't have a usable version of AstroPulse v7.

I not do AP for years, and not receive that msg before (at least the lasts months)
Sounds like a server glitch. Possibly a failed database query when the scheduler was trying to determine if your host wants AP or not.
ID: 2035674 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2035711 - Posted: 4 Mar 2020, 15:03:53 UTC

Alert: The splitter is spitting out errors for the Aricebo MB files. I doubt they are looking at the boards, so I hope they notice.
ID: 2035711 · Report as offensive     Reply Quote
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 2035720 - Posted: 4 Mar 2020, 15:52:19 UTC

I sent word to Eric about the RTS, the creation rate, and the MB splitter errors.
We'll see if he's available today and if he can kick anything back into order.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 2035720 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2035778 - Posted: 4 Mar 2020, 18:16:45 UTC

I seem to be now getting slightly more new tasks on average than I report.

But I have to babysit my queues to prevent them from drying out because without that the scheduler fills my GPU queue letting the CPU starve or vice versa.
ID: 2035778 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2035779 - Posted: 4 Mar 2020, 18:20:12 UTC - in response to Message 2035778.  

I’m getting trickles. But nothing sustainable, even on the slowest host.

I changed my cache to 0.05 days (just over an hour) but I’m never getting more than 10-50 tasks at a time, followed by several rounds of “no tasks available”.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2035779 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2035782 - Posted: 4 Mar 2020, 18:33:17 UTC
Last modified: 4 Mar 2020, 18:33:48 UTC

Results returned and awaiting validation is now at an eye watering 15.25 million might have something to do with the worse than usual recovery problems.

Deadlines for all new work should be 2 weeks. Deadlines for all resends should be 3 days. Get the database back under control before it grinds to a complete halt well before the end of the month.
Grant
Darwin NT
ID: 2035782 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2035790 - Posted: 4 Mar 2020, 18:48:21 UTC - in response to Message 2035627.  
Last modified: 4 Mar 2020, 18:56:32 UTC

It's just as easy to set them to abort.
Easy for you but it makes you part of the database size problem.

Any errored or aborted task is adding an extra result to the database because it gets resent. Even when the splitters have been stopped because the database is too big. Resends adding stuff when splitters have been stopped is a bit like decay heat in a shut down nuclear reactor. If the heat can't be removed (assimilator problem), the result is a meltdown.


Then they shouldn't have started the plan class back up until they had the replacement ready to go. I didn't make the decision to release it, and it makes zero difference if they error out or I abort them, except that they're sent out sooner, and don't spend so much time in the database, so it actually helps the database by shortening the amount of time they're actually IN the database.
ID: 2035790 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2035793 - Posted: 4 Mar 2020, 19:04:24 UTC - in response to Message 2035782.  

Results returned and awaiting validation is now at an eye watering 15.25 million might have something to do with the worse than usual recovery problems.

Deadlines for all new work should be 2 weeks. Deadlines for all resends should be 3 days. Get the database back under control before it grinds to a complete halt well before the end of the month.


The system is having a hard time with assimilation. I have WUs in my queue that both pairs of WUs have been validated from 3-4 days ago. While I would like them to get all the data done before the end of March, I don't think a shorter deadline will help much. They are getting results faster than the system can handle it.
ID: 2035793 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2035798 - Posted: 4 Mar 2020, 19:13:53 UTC - in response to Message 2035790.  

I have a list of similar Hosts I look through ever so often just to see how things are going. If you look through the list you will see Most people aren't having any trouble with the Mac SoG App. In fact, More people are having trouble with the non-SoG App & the CPU App than the SoG App. I doubt very seriously the Admin will risk changing Apps when the App is working so well just to satisfy ONE person that insists on running settings that doesn't work on his machine. So, if you like Aborting tasks go right ahead, it makes You look much worse than the App when you do so.

I'm not going to bother making them links,
https://setiathome.berkeley.edu/results.php?hostid=8708447
https://setiathome.berkeley.edu/results.php?hostid=8839551
https://setiathome.berkeley.edu/results.php?hostid=8710275
https://setiathome.berkeley.edu/results.php?hostid=8724350
https://setiathome.berkeley.edu/results.php?hostid=8570864
https://setiathome.berkeley.edu/results.php?hostid=8726177
https://setiathome.berkeley.edu/results.php?hostid=8248568
https://setiathome.berkeley.edu/results.php?hostid=7988896
https://setiathome.berkeley.edu/results.php?hostid=8820607
https://setiathome.berkeley.edu/results.php?hostid=8458558
https://setiathome.berkeley.edu/results.php?hostid=8727894
https://setiathome.berkeley.edu/results.php?hostid=8766223
https://setiathome.berkeley.edu/results.php?hostid=8836942
https://setiathome.berkeley.edu/results.php?hostid=8224001
https://setiathome.berkeley.edu/results.php?hostid=8766223
https://setiathome.berkeley.edu/results.php?hostid=8184290
https://setiathome.berkeley.edu/results.php?hostid=8803118
https://setiathome.berkeley.edu/results.php?hostid=8555993
https://setiathome.berkeley.edu/results.php?hostid=8363810
https://setiathome.berkeley.edu/results.php?hostid=7341185
https://setiathome.berkeley.edu/results.php?hostid=7341185
https://setiathome.berkeley.edu/results.php?hostid=2991797
https://setiathome.berkeley.edu/results.php?hostid=8732186
https://setiathome.berkeley.edu/results.php?hostid=8644421
https://setiathome.berkeley.edu/results.php?hostid=8514780
https://setiathome.berkeley.edu/results.php?hostid=8449550
https://setiathome.berkeley.edu/results.php?hostid=8357828
https://setiathome.berkeley.edu/results.php?hostid=8402831
https://setiathome.berkeley.edu/results.php?hostid=8453762
https://setiathome.berkeley.edu/results.php?hostid=5404420
https://setiathome.berkeley.edu/results.php?hostid=8315618
https://setiathome.berkeley.edu/results.php?hostid=8835998
https://setiathome.berkeley.edu/results.php?hostid=7465217
https://setiathome.berkeley.edu/results.php?hostid=8692870
https://setiathome.berkeley.edu/results.php?hostid=8461718
ID: 2035798 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19093
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2035815 - Posted: 4 Mar 2020, 20:55:29 UTC - in response to Message 2035793.  

The system is having a hard time with assimilation.

I've been saying that for days, but no one listens, or say's "no, it's the RAM being swamped."
Fix the Assimilator and the RAM problem will go away.
ID: 2035815 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.