Panic Mode On (111) Server Problems?

Message boards : Number crunching : Panic Mode On (111) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 31 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1926899 - Posted: 28 Mar 2018, 15:45:44 UTC - in response to Message 1926895.  

Works just fine here.

You win the lotto! Still nothing here. Even if i reduce the number of reported WU to 16.
ID: 1926899 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1926902 - Posted: 28 Mar 2018, 15:58:03 UTC - in response to Message 1926900.  
Last modified: 28 Mar 2018, 15:58:26 UTC

Works just fine here.

You win the lotto! Still nothing here. Even if i reduce the number of reported WU to 16.

Hehe, I edited out "Works just fine here", after I realized that it didn't really work fine at all :-)

Just get the first completed scheduler request now. Still no new WU but that will take a while.

Wed 28 Mar 2018 10:55:52 AM EST | SETI@home | Scheduler request completed: got 0 new tasks
Wed 28 Mar 2018 10:55:52 AM EST | SETI@home | Project has no tasks available

ID: 1926902 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1926903 - Posted: 28 Mar 2018, 15:58:41 UTC - in response to Message 1926899.  

I finally got everything reported and started catching a very few loose splatters from the splitters.
ID: 1926903 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1926906 - Posted: 28 Mar 2018, 16:05:11 UTC

I'm not an "anarchist", but is this project about done, lacking any leadership and having lost any partner respect. The web pages are dated or wrong and are almost never updated. The statistics are at best delayed and unkempt but generally seem untrustworthy. For example, what's with the 71 V7 results waiting to be purged (for many months). Weekly downtimes are typically well in excess of the handful of hours advertised on the project page, which hasn't changed its wording in years I think. There is almost no interaction with, or education of the partner network on anything. For example, today there hasn't been a post on the "News" board for 19 days. Looking at the number of complaints posted on the bulletin board, those related to seti management seem to be rising. Is anybody home? Are they changing something to try to be "better"? Improved server software architecture? Client software? Metrics? I've been turned off for awhile and have added a second backup project. I did so in part to find a project that appreciates the meager resources I can offer: just a desktop or two with no GPU's. But I find that backup project is becoming the main project and have reversed its role with SETI.

Please Administrators, don't this venerable project fall apart unless that is a reasoned decision, not based on lack of continued enthusiasm.
ID: 1926906 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926907 - Posted: 28 Mar 2018, 16:06:41 UTC - in response to Message 1926889.  

Looks just like mine Juan. I am now up to 3 hour backoffs.

For HTTP internal server error only:

Try reducing the number of tasks reported at once. In cc_config.xml, set

<max_tasks_reported>N</max_tasks_reported>
Report at most N tasks per scheduler RPC. Try N=50 or N=64 if your computer has lots of tasks to report and is having trouble completing a scheduler RPC.
Reduced to 64, still not works..... FYI My host has more than 2000 WU to report.

Wed 28 Mar 2018 09:27:47 AM EST | SETI@home | Sending scheduler request: Requested by user.
Wed 28 Mar 2018 09:27:47 AM EST | SETI@home | Reporting 64 completed tasks
Wed 28 Mar 2018 09:27:47 AM EST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Wed 28 Mar 2018 09:28:57 AM EST | SETI@home | Scheduler request failed: HTTP internal server error
Keep reducing the number reported until you can start whittling down the backlog.

Last time I suggested this, I reported that batches of 150 and 191 had failed, batches of 64 were getting through. Much the same happened today.

Take a look at the size of sched_request_setiathome.berkeley.edu.xml (in your BOINC sata directory). Again from last time, 2.4 MB failed, 845 KB succeeded. See if you can get yours below 1 MB.
ID: 1926907 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1926908 - Posted: 28 Mar 2018, 16:15:06 UTC

The status page got a good update. Looks like someone is working on the problems.

Truthfully I'd like all hands on fixing the problem instead of informational updates. Isn't like you can predict when the system will be up and working again anyway.
ID: 1926908 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9957
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1926913 - Posted: 28 Mar 2018, 16:47:02 UTC - in response to Message 1926909.  

OK, getting new tasks now:

SETI@home 2018-03-28 18:16:05 Requesting new tasks for NVIDIA GPU and Intel GPU
SETI@home 2018-03-28 18:16:11 Scheduler request completed: got 50 new tasks


Same here, trick is now to actually download them!!
ID: 1926913 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1926915 - Posted: 28 Mar 2018, 16:52:34 UTC
Last modified: 28 Mar 2018, 16:55:18 UTC

There's an issue with the splitters: the governor hasn't kicked in, so there's over 1 .2 million WU's available; and, when I first looked, the splitters were still going at 59 WU produced per second! (they've backed down to 8/sec, on a second look)... (Normally [unless they've quite recently changed something] the governor kicks in at around .5 million...) This may have something to do with why people are getting the dreaded "Project has no tasks available", as the "next to send " queue has only 400 slots (IIRC) available - and may not be filling them when empty, due to the excessive I/O of the splitters on the database.
.

Hello, from Albany, CA!...
ID: 1926915 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1926918 - Posted: 28 Mar 2018, 17:05:36 UTC - in response to Message 1926915.  

The server cache is 600k, when it is overfull (like now) it is because of all the resends the system has identified and inserted in.
ID: 1926918 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1926919 - Posted: 28 Mar 2018, 17:09:14 UTC - in response to Message 1926918.  
Last modified: 28 Mar 2018, 17:11:05 UTC

Results ready to send	0	3,713	1,109,512

We could only especulate but 1 MM???
All the 0 (zero) Tapes done, maybe another source of the problems.
I have > 100 WU ready to DL but they not sttart. HTTP error....
Time to go for something cool to refresh myself is at 32C and rising here.
ID: 1926919 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926921 - Posted: 28 Mar 2018, 17:20:31 UTC - in response to Message 1926913.  

trick is now to actually download them!!
I can't connect to 119, getting nothing but header errors from 127. Gonna be a long night.
ID: 1926921 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1926922 - Posted: 28 Mar 2018, 17:21:26 UTC

I got maybe 30 tasks downloaded before I started having download issues
ID: 1926922 · Report as offensive
mmonnin
Volunteer tester

Send message
Joined: 8 Jun 17
Posts: 58
Credit: 10,176,849
RAC: 0
United States
Message 1926924 - Posted: 28 Mar 2018, 17:27:20 UTC - in response to Message 1926906.  

I'm not an "anarchist", but is this project about done, lacking any leadership and having lost any partner respect. The web pages are dated or wrong and are almost never updated. The statistics are at best delayed and unkempt but generally seem untrustworthy. For example, what's with the 71 V7 results waiting to be purged (for many months). Weekly downtimes are typically well in excess of the handful of hours advertised on the project page, which hasn't changed its wording in years I think. There is almost no interaction with, or education of the partner network on anything. For example, today there hasn't been a post on the "News" board for 19 days. Looking at the number of complaints posted on the bulletin board, those related to seti management seem to be rising. Is anybody home? Are they changing something to try to be "better"? Improved server software architecture? Client software? Metrics? I've been turned off for awhile and have added a second backup project. I did so in part to find a project that appreciates the meager resources I can offer: just a desktop or two with no GPU's. But I find that backup project is becoming the main project and have reversed its role with SETI.

Please Administrators, don't this venerable project fall apart unless that is a reasoned decision, not based on lack of continued enthusiasm.


There's a difference between the site science and project pages not being updated and the BOINC server backend being updated. As far as I'm concerned the latter is the important part and the admins are updating that. There are many, many BOINC projects that are not on the BOINC site version that moved the links to the top/username on the right. Many still have the 'Main page · Your account · Message boards' links at the bottom.
ID: 1926924 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 1926925 - Posted: 28 Mar 2018, 17:30:35 UTC - in response to Message 1926924.  

There's a difference between the site science and project pages not being updated and the BOINC server backend being updated. As far as I'm concerned the latter is the important part and the admins are updating that. There are many, many BOINC projects that are not on the BOINC site version that moved the links to the top/username on the right. Many still have the 'Main page · Your account · Message boards' links at the bottom.


And I suspect that maybe why we had a 24 hour outage, since there was some changes to server side code that may have broken something, and since S@H is the guinea pig of untested BOINC code.........
ID: 1926925 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1926928 - Posted: 28 Mar 2018, 18:54:42 UTC - in response to Message 1926907.  

Keep reducing the number reported until you can start whittling down the backlog.

Last time I suggested this, I reported that batches of 150 and 191 had failed, batches of 64 were getting through. Much the same happened today.

Take a look at the size of sched_request_setiathome.berkeley.edu.xml (in your BOINC sata directory). Again from last time, 2.4 MB failed, 845 KB succeeded. See if you can get yours below 1 MB.


Thanks Richard. It worked. After getting 50 to upload each time it went into No Task Available.

It has finally gotten up to being near full cache with not having to babysit the dowload process.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1926928 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926933 - Posted: 28 Mar 2018, 19:27:04 UTC - in response to Message 1926928.  

Yes, all looking like it's got up to speed now - both work assignment and downloads. I'm starting to wind down the backup projects and set normal running caches. Then some beer to celebrate.
ID: 1926933 · Report as offensive
mmonnin
Volunteer tester

Send message
Joined: 8 Jun 17
Posts: 58
Credit: 10,176,849
RAC: 0
United States
Message 1926935 - Posted: 28 Mar 2018, 19:44:16 UTC - in response to Message 1926925.  

There's a difference between the site science and project pages not being updated and the BOINC server backend being updated. As far as I'm concerned the latter is the important part and the admins are updating that. There are many, many BOINC projects that are not on the BOINC site version that moved the links to the top/username on the right. Many still have the 'Main page · Your account · Message boards' links at the bottom.


And I suspect that maybe why we had a 24 hour outage, since there was some changes to server side code that may have broken something, and since S@H is the guinea pig of untested BOINC code.........


We'd have to wait for an admin to say why it was down extra long.

There is a SETI Beta project where BOINC server versions are tested before moving to SETI proper.
ID: 1926935 · Report as offensive
Profile JL

Send message
Joined: 15 Apr 00
Posts: 5
Credit: 63,734,518
RAC: 38
United States
Message 1926936 - Posted: 28 Mar 2018, 19:45:15 UTC

Keep getting "transient HTTP error" when my machines try to download work.
The "Backing off" time keeps getting longer.
I'll just wait for the issues to subside, eventually I'll get them downloaded.
ID: 1926936 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1926943 - Posted: 28 Mar 2018, 20:20:20 UTC - in response to Message 1926936.  

Keep getting "transient HTTP error" when my machines try to download work.
The "Backing off" time keeps getting longer.
I'll just wait for the issues to subside, eventually I'll get them downloaded.

I suggest you read more from the recent posts to this thread.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1926943 · Report as offensive
Profile JL

Send message
Joined: 15 Apr 00
Posts: 5
Credit: 63,734,518
RAC: 38
United States
Message 1926953 - Posted: 28 Mar 2018, 21:15:10 UTC - in response to Message 1926943.  

Keep getting "transient HTTP error" when my machines try to download work.
The "Backing off" time keeps getting longer.
I'll just wait for the issues to subside, eventually I'll get them downloaded.

I suggest you read more from the recent posts to this thread.

My problem isn't reporting, it's downloading work.
I guess I could set <http_debug>, but that only gives me more verbose errors?
ID: 1926953 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (111) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.