Message boards :
Number crunching :
Panic Mode On (115) Server Problems?
Message board moderation
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 30 · Next
Author | Message |
---|---|
mrchips ![]() Send message Joined: 12 Dec 04 Posts: 17 Credit: 26,590,842 RAC: 8 ![]() |
I pay with electric and computer usage |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I pay with electric and computer usage . . We all contribute that. Some more than others. But do you really need that cup of coffee every day? $10 from each volunteer would really give the project the financial power to fix things in short order if they go bung Stephen ?? . |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Upload server is having issues it seems. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Jun 99 Posts: 11451 Credit: 29,581,041 RAC: 66 ![]() ![]() |
Einstein has had that problem for a while, maybe it is contagious? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Einstein still does. Shouldn't be bothering Seti. I have <max_file_xfers>8</max_file_xfers> <max_file_xfers_per_project>8</max_file_xfers_per_project> which should be enough to handle my 3 projects. [Edit] Maybe so. I forced some of the stuck Einstein uploads to clear at 100% and the stuck Seti uploads went instantly. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects.Stalled uploads for one project don't (shouldn't) impact other projects. I had 20 Einstein tasks uploading - my limit is at 8 - when this happened: 08/03/2019 22:30:52 | SETI@home | [sched_op] NVIDIA GPU work request: 6298.76 seconds; 0.00 devices 08/03/2019 22:30:55 | SETI@home | Scheduler request completed: got 7 new tasks 08/03/2019 22:30:55 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 6621 seconds Einstein upload failure should only affect Einstein work fetch. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects.Stalled uploads for one project don't (shouldn't) impact other projects. I had 20 Einstein tasks uploading - my limit is at 8 - when this happened: I agree in theory but not what I am observing. I am getting stuck Seti downloads when the Einstein tasks are trying again. It doesn't make sense that Einstein would affect other projects but if I understand the <max_file_xfers>N</max_file_xfers> parameter for cc_config.xml, that is the global number of file transfers for all of BOINC. Since I run Seti, Einstein, MilkyWay and GPUGrid concurrently on 4 of 5 machines, I regularly exceed 8 simultaneous connections in the client when all projects are crunching, reporting and uploading. The docs say 8 is the default for that parameter. So I took a gamble and increased to 16. It seems to have worked. [Edit] I wasn't talking about work fetch in your example. I was talking about stalled downloads after a Seti workfetch. They were getting stalled and put into backoffs. Work fetch is fine for all projects except for Einstein of course with its stalled uploads. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Even so, if you leave the <max_per_project> at 2, the Einstein stall should leave 6 comms slots free for the other projects to share - and even those two should only be blocked for the 60 second Einstein gateway timeout. Edit - Gary says the Einstein upload restart worked about 10 minutes ago, and all mine had cleared by the time I looked. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects.Stalled uploads for one project don't (shouldn't) impact other projects. I had 20 Einstein tasks uploading - my limit is at 8 - when this happened: . . But if you have your max transfers set to 8 and there are 8 stalled Einstein uploads would that not prevent any further uploads or downloads for any project? Stephen ? ? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects.Stalled uploads for one project don't (shouldn't) impact other projects. I had 20 Einstein tasks uploading - my limit is at 8 - when this happened: That is apparently what was happening. I have project transfers set to 8 because of my large download counts and many computers vieing for bandwidth. The sooner one computer can get the 100-200 tasks on each request, the sooner the other computers can get access to the download pipe unencumbered. The default of 2 transfers leaves the connection in perpetual download with the number of tasks I download and number of hosts contacting all the project servers. [Edit] And things go south fast when I need to upload a GPUGrid task. I only have a nominal 1 Mbs upload pipe and it takes 20+minutes to upload the big 80MB result file for each finished task. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() ![]() ![]() Send message Joined: 25 Dec 00 Posts: 31357 Credit: 53,134,872 RAC: 32 ![]() ![]() |
Had to increase <max_file_xfers>N</max_file_xfers> to 16 to prevent the stalled Einstein uploads from impacting my other projects.Stalled uploads for one project don't (shouldn't) impact other projects. I had 20 Einstein tasks uploading - my limit is at 8 - when this happened: That is correct. Of course until the transfer times out it does hold one of the slots. Depending on backoffs it might appear as if another project is also being held as I believe BOINC will retry failed downloads before it moves on to others on the theory they are closer to timeouts. I'm not sure how it handles the situation if is also doesn't get a ping back from google and goes into not connected to the internet mode. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I had my timeouts set at 90 seconds. But I had my transfers set at 8 and 8. So not enough transfers allotted with 10 or more Einstein tasks trying to upload. Reconfiguring to 16 and 8 was enough to get things flowing again. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
. . But if you have your max transfers set to 8 and there are 8 stalled Einstein uploads would that not prevent any further uploads or downloads for any project?No. An upload which is stalled (in either transfer backoff or project backoff) doesn't affect anything except the project concerned. Every time a task completes, BOINC will try the uploads for that task (but not the others) just once, then go back into backoff. But if that new upload gets through, it'll start retrying the others. It's only when uploads are 'active' (but possibly heading for a timeout) that they get in the way. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
So it was me repeatedly trying to get the stuck Einstein uploads to finish by hitting retry and having them all in "Active" is what prevented the others from uploading or downloading and going into backoff. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Sounds plausible. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1647 Credit: 12,921,799 RAC: 89 ![]() ![]() |
Lots of data files sitting complete at (128) it would be nice if they didn't load any more data after the outage so we can burn through some of the smaller files. Just my thoughts in an ideal world. It would kinda be nice if we could have a feature that told us how many files we have processed in the last 24 hours. I am aware that you can do it by basic maths but you would need to check it each day at the same time. Also hard to get an accurate number if they have added more data. I am aware that this probably will not happen as it would more strain on the database. ![]() |
![]() ![]() ![]() ![]() Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 ![]() ![]() |
If I remember right, Eric said they do ~1.2TB of data from Greenbank per day. So 1024 * 1.2 / 52.39GB = 23.5 files. My best guess there .... ![]() |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1647 Credit: 12,921,799 RAC: 89 ![]() ![]() |
If I remember right, Eric said they do ~1.2TB of data from Greenbank per day. Thank you for the information very interesting. Now if only we could see what files have been processed before they are removed. I am aware as I said in my previous post that there are approximately 9 files sitting at (128) I guess in a way this is showing me they are complete. :) I have a feeling these maybe stuck. ![]() |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1859 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
Thank you for the information very interesting. Now if only we could see what files have been processed before they are removed. I am aware as I said in my previous post that there are approximately 9 files sitting at (128) I guess in a way this is showing me they are complete. :) I have a feeling these maybe stuck.No, it means the SSP is borked, as it has been intermittently for a while. If you look, it will also say that there are 28 GBT splitters running (channels in progress), when only 14 are provisioned. If you see that, pretty much everything else is possibly inaccurate as well. This seems to include failing to drop off the page files that have been completely split. No way to know for sure, but my suspicion is that when they redid the throttle process a while back they failed to account for updating some portion of the SSP, as this seems to happen after the throttle kicks in. ![]() ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.