Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Argh - I have some news to pass on, and spent some time composing it - in the old thread. Hang on while I re-write it. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
Since I am using the All-In-One, I don't even have a stock to revert to. I'd need to archive the BOINC folder, download/install the detested Repository version, reconnect to SETI, download/install all the setup/apps/work including some ancient and slow CUDA50 that takes 10x as long to finish if it doesn't crash, then when this is fixed (which with my luck will happen exactly when I have completed this) wait for the work to complete, uninstall it, unpack the All-In-One back and hope for the best... ... on eight computers. Or I could just connect to Einstein. Takes about ten seconds apiece. Much easier. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
The LHC project at CERN now has the responsibility for testing and releasing new server code. Their website says: Server upgradeDespite that "BOINC server release 1.2", their server reports '22/12/2019 12:51:47 | LHC@home | [sched_op] Server version 715', same as ours. I'll be having words about that. But I ran a test. Cleaned up my account, attached again, and got the statutory single initial task. Then, I wrapped up their application in an app_info.xml file (losing that initial task to a few finger-fumbles along the way - we all do it!), but eventually getting Anonymous Platform working properly. First off, I got my initial task back as a 'resent lost task' - exactly as I should have done. That bit is installed and working. But at every work request since then, the LHC server has responded 'internal server error'. It only happens when work is requested and a task is already running. Bingo! We have a reproduction of the problem here, on an independent project, without all the congestion and delays. And that project is well resourced, and has a vested interest in getting the problem sorted. I'll be writing to the guys once I've got this posted. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Since I am using the All-In-One, I don't even have a stock to revert to. I'd need to archive the BOINC folder, download/install the detested Repository version, reconnect to SETI, download/install all the setup/apps/work including some ancient and slow CUDA50 that takes 10x as long to finish if it doesn't crash, then when this is fixed (which with my luck will happen exactly when I have completed this) wait for the work to complete, uninstall it, unpack the All-In-One back and hope for the best... You aren't quite correct. It's Very simple to switch from Anonymous platform to Stock even with the All-In-One. All you have to do is change the Names on the two files app_info.xml & app_config.xml to something as app_info1.xml & app_config1.xml, that will revert you to Stock. To change back to Anonymous platform rename the files to the original names app_info.xml & app_config.xml . That's All that needs to be done, Nothing Else...NADA. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
Since I am using the All-In-One, I don't even have a stock to revert to. I'd need to archive the BOINC folder, download/install the detested Repository version, reconnect to SETI, download/install all the setup/apps/work including some ancient and slow CUDA50 that takes 10x as long to finish if it doesn't crash, then when this is fixed (which with my luck will happen exactly when I have completed this) wait for the work to complete, uninstall it, unpack the All-In-One back and hope for the best... That is simple enough. If one does that, gets some tasks, can one then return these two files to their original names and run the tasks with the special sauce apps? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
app_config shouldn't be a problem. You may get some warning messages about unrecognised applications, but you can ignore those. app_config can be used in either stock or AP mode. More importantly, you need to Restart the client (so it forgets about app_info). I always find I also need to reset the project - which deletes all files, with recent versions of BOINC. Hence my advice to take a backup... |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I may be an old head, but why doesn't someone back out the changes made that caused this catastrophe in the first place and bounce the servers. Then sandbox the new code until the bug is rectified? At this moment I've 18 hrs (33 CPU tasks) left & 19 tasks that have not been able to report. U/ls have not been a problem. GPUs have been assigned to Milkyway. I don't buy computers, I build them!! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That is simple enough. If one does that, gets some tasks, can one then return these two files to their original names and run the tasks with the special sauce apps? The Tasks are assigned to a certain <platform>XXXXXXXXXXX</platform>, <version_num>XXX</version_num> and <plan_class>XXXXX</plan_class> in the file client_state.xml. Once the tasks are assigned in the client_state.xml you CAN NOT change any of those values without trashing the Tasks. So NO, you can't simply change from Stock to Anonymous platform IF You have EXISTING Tasks assigned in the client_state.xml file. The ONLY way you can change is to make SURE the 3 values in the client_state.xml file match the values in the app_info.xml file EXACTLY. |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
You can, if the platform, version and plan_class strings in app_info match the values for the stock tasks you have received. That's how the Lunatics installer worked: all known platform, version and plan_class combinations were covered in the supplied app_info files.That is simple enough. If one does that, gets some tasks, can one then return these two files to their original names and run the tasks with the special sauce apps? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thanks, TBar. Once the T3500 finishes its small quota of stock, I'm going back to special sauce and wait for the fix. Need to get Einstein fired up to keep the house warm until then.I really don't see the difference between running Stock tasks at SETI or Stock tasks at Einstein. All that is required to go from Anonymous platform to Stock is to change the name on Two files, NOTHING ELSE. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
At this moment I've 18 hrs (33 CPU tasks) left & 19 tasks that have not been able to report... Setting "No new tasks" and then hitting or allowing Update will resolve this. Cause is that the scheduler is now checking for and resending lost tasks on every connection which is causing connects to fail, but only if work is requested. If that still doesn't resolve it, then edit cc_config.xml in your main BOINC folder and ensure there's a line <max_tasks_reported>##</max_tasks_reported>. The largest possible value for ## is 255 (anything over that it scales back to 255 without advising.) I would try maybe 50 at first. Edit: I've also tried renaming app_config.xml and app_info.xml, resetting the project and restarting the client. On 2/3 machines it still doesn't work... No tasks available even with half a million in the RTS queue. :^( Some setting is persisting. I guess I detach and reattach next. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I have written to the Server Release Manager at CERN, under the title 'Internal server errors in BOINC server release 1.2' - which should get his attention. Copies to Eric and David so they're kept in the loop. The server guy at CERN has a young family, so he won't be desperately keen to receive a report like this during the holiday season. Progress will be slow, but I'll report any feedback I get. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Edit: I've also tried renaming app_config.xml and app_info.xml, resetting the project and restarting the client. On 2/3 machines it still doesn't work... No tasks available even with half a million in the RTS queue. :^( Some setting is persisting. I guess I detach and reattach next.I think you have to restart the client, then reset the project - in that order - for it to work. Even then, new work isn't guaranteed while the server is being thrashed within an inch of its life, but provided no errors are being reported in the Event Log, it should come through eventually. Detaching certainly isn't necessary. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Edit: I've also tried renaming app_config.xml and app_info.xml, resetting the project and restarting the client. On 2/3 machines it still doesn't work... No tasks available even with half a million in the RTS queue. :^( Some setting is persisting. I guess I detach and reattach next.How long did you wait for tasks? It can take 20 to 30 minutes to get the first tasks. Renaming the files Only, works perfectly fine on all my Macs and Linux machines. I guess I just have those special machines, Oh Well. Yes, you do have to restart BOINC as you do anytime you make a change to the app_info.xml....but I thought everyone knew that. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
I have written to the Server Release Manager at CERN, under the title 'Internal server errors in BOINC server release 1.2' - which should get his attention. Copies to Eric and David so they're kept in the loop. Thanks for the update and effort, Richard. The following is just me venting and not directed at anyone. I would be sympathetic to the holiday part, but some yahoo decided to do this right before a weekend, a two week holiday for some. I've spent probably 4 hours paying attention to an issue which shouldn't exist for this hobby on my vacation time. The anonymous problem was apparently well-known on the beta and yet someone thought this was a good idea to shove into production. If I did that on the industrial pilot unit PLCs for which I'm responsible, I'd be in trouble. Even my T3500 stock was having trouble getting new jobs. I suspect the system is growing more unstable based on that and the growing replica lag. That's why I'm not going to chase it by converting my other boxes. I'll wait till they fix it and help Einstein in the meanwhile. *rant off* |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.