disk usage


log in

Advanced search

Message boards : Number crunching : disk usage

Author Message
N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12436
Credit: 14,820,956
RAC: 4,970
United States
Message 1344184 - Posted: 8 Mar 2013, 20:24:29 UTC

I was rather surprised this morning when both Einstein and Seti told me that one of my rigs had no tasks in progress. Checking the messages, they are both saying I don't have enough room on my hard drive. It's a rather small drive, but I don't store a lot on it and don't use the machine for anything other than BOINC, MagicJack, and my radioreference.com feed.

Do Boinc projects ever leave stuff on hard drives that they don't need any more? Is there a way I can clean it out? Or should I be looking elsewhere for a culprit?

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24860
Credit: 34,402,309
RAC: 14,654
Germany
Message 1344229 - Posted: 8 Mar 2013, 22:15:51 UTC

You are totally right Mark.
Einstein leaves a lot on the hard drive.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8755
Credit: 52,703,900
RAC: 30,664
United Kingdom
Message 1344239 - Posted: 8 Mar 2013, 22:32:20 UTC - in response to Message 1344229.

You are totally right Mark.
Einstein leaves a lot on the hard drive.

Yes, that's a deliberate feature called "locality scheduling".

You can get a lot of different workunits out of the same set of data files. Once you've downloaded them once, you may not need to download any more for several days.

But if you just delete the files, BOINC will download them again at the next restart. Search the Einstein forums (or ask Gary Roberts) how to clean your client_state.xml file so you don't waste that download bandwidth.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5466
Credit: 313,408,366
RAC: 171,687
Brazil
Message 1344242 - Posted: 8 Mar 2013, 22:35:25 UTC

E@H uses a big 8MB or more WU, so if you have a large cache (10 days for example) it will use a lot of HDD space.

If you run both projects at the same time the best thing to do is use a small days cache (anyway the limit of 100WU still on SETI) to avoid that.

I use 1 day cache on my hosts, that allow SETI DL all the 100WU and keep the number of E@H WU in a confortable margin.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8755
Credit: 52,703,900
RAC: 30,664
United Kingdom
Message 1344247 - Posted: 8 Mar 2013, 22:44:55 UTC

Note that Einstein has different types of WU, same as we have MB and AP.

The WUs for GPUS, 'BRP4' (Binary Radio Pulsar) use 'one time only' data files - a big download (although they've compressed them recently, so less than 16 MB per WU to download), but clean up after themselves. You don't need a big cache at Einstein, they have reliable servers.

It's the CPU tasks (Gravitational Wave S6 LineVeto search) which hang on to the data files for re-use.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12392
Credit: 2,664,512
RAC: 1,052
Netherlands
Message 1344250 - Posted: 8 Mar 2013, 22:53:11 UTC - in response to Message 1344229.

It doesn't just leave a lot on the hard drive, it's keeping the files for when you're returning to running a task with similar sub-sets of data, do you don't have to download all that data again. Don't forget that Einstein uses Locality Scheduling which is to minimize the amount of data transfer to hosts by preferentially sending jobs to hosts that already have some or all of the input files required by those jobs.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4586
Credit: 121,521,651
RAC: 58,155
United States
Message 1344330 - Posted: 9 Mar 2013, 3:44:10 UTC

Sometimes I do find stray files from tasks that BOINC has run & for whatever reason not deleted once finished. Normally I only look on machines when they fall back to one of their backup projects. At most it only turns out to be a few tasks.

I would probably just check the size of the BOINC folder on that drive & see if it seems larger than expected.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 944
Credit: 25,163,942
RAC: 2,860
Australia
Message 1344332 - Posted: 9 Mar 2013, 3:56:53 UTC

The quickest and easiest way to clean up Einstein is to detach and reattach (or as its now known remove and then add).

Quite often they have moved through files but won't clean up until you do the above or the entire run finishes.
____________
BOINC blog

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12392
Credit: 2,664,512
RAC: 1,052
Netherlands
Message 1344465 - Posted: 9 Mar 2013, 9:28:42 UTC - in response to Message 1344332.
Last modified: 9 Mar 2013, 9:34:13 UTC

Quite often they have moved through files but won't clean up until you do the above or the entire run finishes.

Quite often? Weird then that I have never have seen this happen in all the years I've run Einstein. Got links to all the many threads on their forums where you and others complain about it and moderators/admins answer that they know it happens quite often? Or is the Quite Often just a personal feeling (because you don't understand LS)?

Edit: it may have happened once, when they switched from the S5 search to the S6 search, but since all searches since then have all been a form or continuation of the S6 search, the data files have been the same. Thus no need to remove them when e.g. S6Bucket stops and S6LV starts.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 944
Credit: 25,163,942
RAC: 2,860
Australia
Message 1344482 - Posted: 9 Mar 2013, 10:09:51 UTC - in response to Message 1344465.
Last modified: 9 Mar 2013, 10:12:03 UTC

Quite often they have moved through files but won't clean up until you do the above or the entire run finishes.

Quite often? Weird then that I have never have seen this happen in all the years I've run Einstein. Got links to all the many threads on their forums where you and others complain about it and moderators/admins answer that they know it happens quite often? Or is the Quite Often just a personal feeling (because you don't understand LS)?

Edit: it may have happened once, when they switched from the S5 search to the S6 search, but since all searches since then have all been a form or continuation of the S6 search, the data files have been the same. Thus no need to remove them when e.g. S6Bucket stops and S6LV starts.


The WU are for a particular frequency which as far as I understand they work their way up the band. Even though they may be at the upper end of the band the data files for the lower ones will still be left on your machine. As I said these would not normally be deleted unless the search finished or the user were to detach/reattach.

While there may be no need to remove them the question was asked how to.

I don't need to remove them I have plenty of disk space. I understand how locality scheduling works so I have no reason to complain about this, I was merely answering the question.
____________
BOINC blog

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8732
Credit: 61,607,842
RAC: 59,562
United Kingdom
Message 1344492 - Posted: 9 Mar 2013, 10:22:17 UTC

Sorry to say Ageless Einsein is still "bad" at not clearing up after itself. Particularly so when it suffers time-outs.
One cruncher on which I ran Einstein as a "priority 0" back up had over 2Gb of time-out files, and another has over 200Mb of them. Both these were after all visible work had completed, with "No New Tasks" set and so should have been down to the barest minimum, not hundreds of Mb of "rubbish" left lying around.

____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4893
Credit: 83,860,029
RAC: 21,149
United States
Message 1344503 - Posted: 9 Mar 2013, 11:41:58 UTC

This thread prompted me to look at my Einstein work. I had completed all my tasks a while back, but had about 6.5 Gig of files still on my rig. I detached, and reattached, and they cleared. I don't know what they were, and I have plenty of hard drive space, so I really didn't care. I was surprised that there was 6.5 Gig.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5466
Credit: 313,408,366
RAC: 171,687
Brazil
Message 1344510 - Posted: 9 Mar 2013, 12:15:24 UTC

I take a look at my E@H data and nothing like this noticed. Maybe something else is in place there.
____________

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 356
Credit: 152,959,857
RAC: 93,670
Australia
Message 1344844 - Posted: 10 Mar 2013, 8:07:39 UTC

Very rarely, I've seen a scheduler request to the Einstein@Home servers result in a 'request from the server' to delete some data files, but they will only be very old ones. In general, they seem to keep data around for future search runs. They also seem very reluctant to delete old application versions.

For the record, I currently have 1.53 GiB in my Einstein directory accumulated since May 2012, compared with 213.61 MiB for SETI@home. I don't process Einstein WUs very much.
____________
Soli Deo Gloria

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12436
Credit: 14,820,956
RAC: 4,970
United States
Message 1345186 - Posted: 11 Mar 2013, 3:36:25 UTC

I detached and reattached Einstein, with no noticeable difference in the space used on my drive. It also did not solve my problem.

It turns out that, despite my earlier statement, I am storing a lot of files on it. I identified some I could safely delete and did so, then clicked Update for both projects. Seti deferred me and opportunistic Einstein downloaded over 100 tasks. Seti eventually got 100 as well.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Highlander
Avatar
Send message
Joined: 5 Oct 99
Posts: 154
Credit: 31,787,472
RAC: 9,393
Germany
Message 1345208 - Posted: 11 Mar 2013, 5:52:19 UTC

You mean this one?

TreeSize Free Portable -> http://portableapps.com/apps/utilities/treesize-free-portable
____________

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12436
Credit: 14,820,956
RAC: 4,970
United States
Message 1345317 - Posted: 11 Mar 2013, 14:03:20 UTC - in response to Message 1345187.

Don't forget to empty the trash bin if you have not already done so.

That was the first and last thing I did.

I also usually find a number of good sized log and dump files that Windows writes to disk after a crash. You can turn this off.

I used to have a nifty little program that would graphically display the entire hard drive disk usage, but I cannot find it now.

I also ran disk cleanup, but it didn't make a lot of difference either.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Message boards : Number crunching : disk usage

Copyright © 2014 University of California