Manually uploading client_state.xml to s@h

Message boards : Number crunching : Manually uploading client_state.xml to s@h
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034197 - Posted: 18 Sep 2010, 21:47:28 UTC

I did post it just now to him..

Was away during the evening here and it retried until just recent.
Files that size simply can't be uploaded properly to s@h and if mr Ageless here hadn't been so kind to report to dave about it i would have needed to push the detach button and waste aprox of halv a million in credit.

Hats of to all and especially Ageless..

Thank you!


_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034197 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034209 - Posted: 18 Sep 2010, 22:43:32 UTC - in response to Message 1034197.  
Last modified: 18 Sep 2010, 22:44:09 UTC

I did post it just now to him..

Was away during the evening here and it retried until just recent.
Files that size simply can't be uploaded properly to s@h and if mr Ageless here hadn't been so kind to report to dave about it i would have needed to push the detach button and waste aprox of halv a million in credit.

Hats of to all and especially Ageless..

Thank you!

Yikes....hope DA can sort this one....
I had a lotta trouble getting my backed up results reported this time around too, was almost in the same boat with you.
Don't hit that detach button just yet.

Meowa meowa zoom.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1034209 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1034245 - Posted: 18 Sep 2010, 23:43:16 UTC - in response to Message 1034197.  

Hats of to all and especially Ageless..

You're welcome. Let's hope it's fixed for you before you have all those 27,000+ tasks ready to report. ;-)
ID: 1034245 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1034435 - Posted: 19 Sep 2010, 11:20:14 UTC

[trac]changeset:22389[/trac] says
scheduler: fix crashing bug when client reports a large # (1000+) of results (256KB not enough for query in this case)


Now you still need available bandwidth.
ID: 1034435 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034439 - Posted: 19 Sep 2010, 11:33:08 UTC

Well it still hasn't been able to send it through to the servers.

I sent the file to Dave and used 7z to zip it and it went from staggering 31MB to 349 KB in size!

How come boinc doesn't zip the files before sending them to the servers in the core?

That would save extremely amount of resends on their congested connection to the internet!


For now the file is up to 8906 wus to report and i have since this night set to NNT just to be sure and not congest my side of the bandwidth :)
Hope this is sorted out in the long run.

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034439 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1034442 - Posted: 19 Sep 2010, 11:51:57 UTC - in response to Message 1034439.  

How come boinc doesn't zip the files before sending them to the servers in the core?

That would save extremely amount of resends on their congested connection to the internet!

Yes, but it would add enormous overhead to the server needing to decompress all those files before being able to read them, plus probably recompress the answer back to you all (sched_reply*.xml).

It would also require a rewrite of the client. It is now capable of (de)compressing project files only (as zip and tar.gz). This could easily be done, were it not that the extra overhead on an already pounced upon server isn't something you want to add.
ID: 1034442 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034504 - Posted: 19 Sep 2010, 15:47:14 UTC - in response to Message 1034442.  
Last modified: 19 Sep 2010, 15:54:36 UTC

How come boinc doesn't zip the files before sending them to the servers in the core?

That would save extremely amount of resends on their congested connection to the internet!

Yes, but it would add enormous overhead to the server needing to decompress all those files before being able to read them, plus probably recompress the answer back to you all (sched_reply*.xml).

It would also require a rewrite of the client. It is now capable of (de)compressing project files only (as zip and tar.gz). This could easily be done, were it not that the extra overhead on an already pounced upon server isn't something you want to add.


That's true but at this rate servers are progressing huge parallelism the interconnection doesn't!

So there is a battle where the bottleneck are going to be situated.

* No compression = the interconnection and all 300000 clients trying to resend all data
* Compression = cpu's in the servers are somewhat congested with decompressing data with added CRC to be sure that information wasn't compromised on the way.
Ofcourse there will be resends there too but not to this extent.

So the question is only where does we want the bottleneck to be? Dropped packets / resends or slower data progressing within the upload/download server?

Kind Regards Vyper

P.S 9064 Wu's for the time beeing & Sched_request is 34 Mb in size atm. D.S

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034504 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034523 - Posted: 19 Sep 2010, 17:04:39 UTC

I'm starting to wonder.

Is there any way i can chop it up in pieces for myself reporting 999 at a time , connect again so it let the boinc client clear those 999 and then shut it down again.

Edit it again , next 999 and so forth?

Is there even a possibility to do this without clearing out all completed wu's?

Regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034523 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1034525 - Posted: 19 Sep 2010, 17:37:37 UTC - in response to Message 1034523.  
Last modified: 19 Sep 2010, 17:40:24 UTC

I'm starting to wonder.

Is there any way i can chop it up in pieces for myself reporting 999 at a time , connect again so it let the boinc client clear those 999 and then shut it down again.

Edit it again , next 999 and so forth?

Is there even a possibility to do this without clearing out all completed wu's?

Regards Vyper



You could test on a lower throughput host, with just a few results to report, to see what happens by tampering with the scheduler request before it goes up. That way the consequences would be less severe if it all went horribly wrong (which is likely ;) ). How you prevent the scheduler request from going up before test edits can be made would probably be a manual excercise, and a bit fiddly on the test machine.

AFAICT, one thing that seems to take up a large proportion of the scheduler request are the in progress results. Since we know from the ghosts situation the server doesn't process those, then removing those from the sheduler request *may not* break it.

My theory is that if you remove all but a small number of completed report results from the scheduler request, but maintain properly formed request (hard to do because it doesn't seem to have dos style line endings!) only those in the request will be acknowledged as reported, then the next (normally) generated request should be that much shorter.

I'm glad I don't have to mess around myself to see if such a process might actually work... If it doesn't, and it looks like the server side fix will be a long time then I'll examine the option of fiddling with the report sequence in the Boinc client (though I'd rather not!).

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1034525 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1034532 - Posted: 19 Sep 2010, 17:53:53 UTC - in response to Message 1034525.  

My theory is that if you remove all but a small number of completed report results from the scheduler request, but maintain properly formed request (hard to do because it doesn't seem to have dos style line endings!)...

Aren't there some tools like unix2dos and dos2unix available for windows?

Gruß,
Gundolf
ID: 1034532 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1034537 - Posted: 19 Sep 2010, 18:00:01 UTC - in response to Message 1034532.  

My theory is that if you remove all but a small number of completed report results from the scheduler request, but maintain properly formed request (hard to do because it doesn't seem to have dos style line endings!)...

Aren't there some tools like unix2dos and dos2unix available for windows?

Gruß,
Gundolf


Sure. Since I don't generally do that kind of thing myself, I can't recommend a specific tool for the job, though I have certainly used editors that will read both kinds, and preserve the style used in that file.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1034537 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1034542 - Posted: 19 Sep 2010, 18:16:55 UTC

I have to wonder.........

Is this something that internet protocols just cannot handle......I mean, they can handle dropped connections and such....

Or just something that the Seti servers are unable to cope with?


I mean......dropped comms are something that has existed since day 1 on the internet.
S*** happens on the road.

Why does this result in such bad things here?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1034542 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034551 - Posted: 19 Sep 2010, 18:49:07 UTC
Last modified: 19 Sep 2010, 18:49:35 UTC

Well internet have but remember there are things called DDOS attacks which could more or less render whole websites inoperable..
This is in a certain way what happens when the servers get online again, all hosts are trying to get through..
It's like fitting a river in a straw.. Some water needs to stay put while the rest pours through..
But if the river consisted in compression there would be much less water getting through to the servers so reduce the lake to about 1,1% it's original size and reduce the straw to about 50% it's original size..
You can quite fast see that the lake is not that big anylonger and the time needed to let the river flow will be much less timeconsuming and bandwidth consuming letting more work out from the servers instead of alot of bad transmissions and resends..
This is only my view as a consultant and technician because we have an urge to do something good even better.
I don't know how it would be programmed wisely, but it would be better for the small straw :)

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034551 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1034632 - Posted: 19 Sep 2010, 23:31:41 UTC

The core client builds the sched_request just before sending it, I can't think of any way to pause between those two actions to allow editing it. A proxy could potentially trap the request and allow editing before sending it on, but nobody has written that kind of proxy AFAIK.

What can be successfully edited is the client_state.xml. It wouldn't be easy, there's a <file_info> and <workunit> for each WU plus the files to which those refer, and a <result> for each finished (done and uploaded) task. But it would be possible to make multiple copies of the boinc data directory hierarchy, each of which has a subset of the information.

As preparation you'd want to set NNT, suspend all unstarted tasks and let those running finish and upload, then disable network activity. After that shut BOINC down and do the editing and file moving.

To use the subsets after doing all that editing and moving of files, you'd restart BOINC with one of the subsets in place, allow network activity and do an Update to report that subset. Then disable network activity and shut down again. Check what the <rpc_seqno> has been updated to and put that in the next subset when you move it in.

Consider that a sketch, I could easily have missed some details. There may be possible simplfications too, for instance I'm not totally sure the WU files need to be kept though BOINC doesn't delete them until their results have been reported.
                                                             Joe
ID: 1034632 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1034761 - Posted: 20 Sep 2010, 5:09:24 UTC

I presume that since the limited "resend lost tasks" feature has been seen, the Scheduler code has been rebuilt and probably includes the changeset 22389 fix to allow reporting large numbers of results. I have my fingers crossed for those of you who need it!
                                                               Joe
ID: 1034761 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1034849 - Posted: 20 Sep 2010, 14:50:42 UTC

It hasn't been a problem reporting atleast 3000 Wu's from what i have experienced.
That hasn't been a issue so far it seems, is there perhaps a limit at 4095?
Only number that would add up in my mind after that it's Hex 1000..

Regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1034849 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1034883 - Posted: 20 Sep 2010, 16:24:48 UTC - in response to Message 1034849.  

It hasn't been a problem reporting atleast 3000 Wu's from what i have experienced.
That hasn't been a issue so far it seems, is there perhaps a limit at 4095?
Only number that would add up in my mind after that it's Hex 1000..

Regards Vyper

I don't think it's an exact specific number. The reports are rewritten as a batch update to the database, the old method had a limit of 256KB for that query. The new method builds the query in a string, I don't know if there's any specific limit to the length. In either case, each result name is included and they vary in length.

The question is whether the one with over 7000 can be successfully transferred and handled since the changes yesterday. I can't imagine Dr. Anderson not including that bug fix when rebuilding the Scheduler code.
                                                                Joe
ID: 1034883 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1035014 - Posted: 20 Sep 2010, 22:11:35 UTC - in response to Message 1034883.  
Last modified: 20 Sep 2010, 22:22:28 UTC


I don't think it's an exact specific number. The reports are rewritten as a batch update to the database, the old method had a limit of 256KB for that query. The new method builds the query in a string, I don't know if there's any specific limit to the length. In either case, each result name is included and they vary in length.

The question is whether the one with over 7000 can be successfully transferred and handled since the changes yesterday. I can't imagine Dr. Anderson not including that bug fix when rebuilding the Scheduler code.
                                                                Joe


Hi it's working now and apart of that my assumptions of the 4095 limit was fairly correct because more than that made the request to get as big as 1GB making the scheduler go ouf of memory..

So in a way i was correct after all :D
Man i'm so excited my hunch was fairly spot on.

Well as for now it's chugging along reporting in 1000 wu chunks.

Thanks everyone for digging out this frustrating issue. And a big warm thank you to Dave who adressed this issue so quickly and painlessly.

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1035014 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1035057 - Posted: 21 Sep 2010, 0:17:23 UTC - in response to Message 1035014.  


I don't think it's an exact specific number. The reports are rewritten as a batch update to the database, the old method had a limit of 256KB for that query. The new method builds the query in a string, I don't know if there's any specific limit to the length. In either case, each result name is included and they vary in length.

The question is whether the one with over 7000 can be successfully transferred and handled since the changes yesterday. I can't imagine Dr. Anderson not including that bug fix when rebuilding the Scheduler code.
                                                                Joe


Hi it's working now and apart of that my assumptions of the 4095 limit was fairly correct because more than that made the request to get as big as 1GB making the scheduler go ouf of memory..

So in a way i was correct after all :D
Man i'm so excited my hunch was fairly spot on.

Well as for now it's chugging along reporting in 1000 wu chunks.

Thanks everyone for digging out this frustrating issue. And a big warm thank you to Dave who adressed this issue so quickly and painlessly.

Kind regards Vyper

So, Boinc is now splitting the file up somehow and reporting smaller batches, if I understand you correctly?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1035057 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 1035146 - Posted: 21 Sep 2010, 5:59:43 UTC - in response to Message 1035057.  


So, Boinc is now splitting the file up somehow and reporting smaller batches, if I understand you correctly?


Yup, it doesn't split the file in that way, it only grabs the first 1000 and tick them as reported and reporting that back to the client now.
Next time your client report back it has reduced the amount with aprox 1000 too.

Regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 1035146 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Manually uploading client_state.xml to s@h


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.