Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 148 · 149 · 150 · 151 · 152 · 153 · 154 . . . 162 · Next
Author | Message |
---|---|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
That's the difference between the Checkpoint problem and the Missed Pulse problem. The Checkpoint problem will give an Immediate Overflow after resuming, the Other problem will Miss All Pulses...on occasion. I've seen both not happen all of the time, some times the Checkpoint works, some times it doesn't. Some times you Miss Pulses, some times you don't The only common thing is when it fails people want to know why.Now you're getting me completely confused. I had a similar experience just now. I set checkpointing to 10 seconds on the nominal app, allowed it to run to ~50% (several minutes on a GTX 1650). Killed BOINC, and restarted it, and the task restarted from the beginning. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...Now you're getting me completely confused.,,Here you go, read up on the comments from one of your old Buds. He seemed annoyed about the Checkpoint problem back then, and I think you are in a few of His threads back then, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1906253#1906253 Do a search on his user ID and checkpoint, he'll refresh your memory for you. As far as I know, not much has changed since then. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
OK, I've read the article (well, skimmed it), and confirmed that my system has the default values for lazy writes:Thanks. But did you also see my comment about checkpoint files from my CPU project tasks appearing, and updating, precisely on cue?Yes, I did. All that has to happen for the cpu tasks to write out their state is to have the application do the fsync () call. Must be what is happening for that case. /proc/sys/vm/dirty_expire_centisecs 3000 // 30 seconds /proc/sys/vm/dirty_writeback_centisecs 500 // 5 secondsI suspended a task from BOINC Manager 10 seconds after the checkpoint file should have been written: that was at least five minutes ago. Nothing has appeared in the slot folder yet. This machine is currently running off a single 512GB M.2 PCIe SSD: I don't think it's got that long a disk write queue! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
what's the best most fool-proof way to reproduce this scenario then? the goal posts keep moving it seems. I have a card with GDDR5 memory, and want to try to trigger this to happen. if it's a real issue with the app+GDDR5 and not some other system specific issue, it should be easily reproducible with a well defined procedure. i've left the monitor running, rebooted several times but have been unable to see any missed best pulses so far.Here's a Post from Yesterday, I'm fairly certain you read it, " I later found my other machines had the same problem once I turned the monitors on and started actually using the machines." That's fairly Clear that the problem doesn't happen when the monitor is OFF. How you went from that to saying having the monitor Off Is fine for testing is anyone's guess. I'd suggest trying the same configurations the others had when they experienced the problem. Juan was running 4 GDDR5 1070s, the other posted here was running 4 or 5 GDDR5 980s. I've seen the problem when running any number of GDDR5 Multiple GPUs, anywhere from 2 to 14 GPUs. If you are using a Single card, you are out of the known sample. I'd say three would be a good start, and most people have one connected to the monitor, the others disconnected. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
As far as I know, not much has changed since then.And you probably know more than me, because all the development discussion takes place in a forum I don't have access to. @ GPUUG: Any news on whether the checkpointing problem has been addressed since 10 Dec 2017 (except this week, of course)? (I'd better go and read that ReadMe file!) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
what's the best most fool-proof way to reproduce this scenario then? the goal posts keep moving it seems. I have a card with GDDR5 memory, and want to try to trigger this to happen. if it's a real issue with the app+GDDR5 and not some other system specific issue, it should be easily reproducible with a well defined procedure. i've left the monitor running, rebooted several times but have been unable to see any missed best pulses so far.Here's a Post from Yesterday, I'm fairly certain you read it, " I later found my other machines had the same problem once I turned the monitors on and started actually using the machines." I don’t know why you think I had the monitor Off. I stated I left the monitor running = ON. I’m unclear as to the monitor being a contributing factor. In one sense you say that the the monitor being off prevents it from happening, yet at the same time you say it only affects the cards NOT hooked to the monitor? Can you explain that? I’ll look into shuffling some cards around, or picking up a cheap GDDR5 card to try to replicate this with Multi-GPU. Not looking hopeful though, both Juan and Keith seem to have been unsuccessful as well. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
As far as I know, not much has changed since then.And you probably know more than me, because all the development discussion takes place in a forum I don't have access to. I recompiled an app with a modification to the code in attempt to remove checkpointing. Juan instructed me which line to change (he doesn’t yet know how to compile the special app, so I did that for him). But I haven’t really seen different behavior between the default and modified apps. I’m not seeing it check point. I would assume if I set checkpoint to 10 seconds that it would checkpoint, but even on the default unchanged app it still starts over from the beginning Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This; I believe that's YOU telling Viper testing without a monitor is fine? It certainly looks that way.Does this host https://setiathome.berkeley.edu/results.php?hostid=8570185 fit in the criteria of the problem?According to Tbar, yes it fits. Do a system reboot and see what happens. what's the best most fool-proof way to reproduce this scenario then? the goal posts keep moving it seems. I have a card with GDDR5 memory, and want to try to trigger this to happen. if it's a real issue with the app+GDDR5 and not some other system specific issue, it should be easily reproducible with a well defined procedure. i've left the monitor running, rebooted several times but have been unable to see any missed best pulses so far.Here's a Post from Yesterday, I'm fairly certain you read it, " I later found my other machines had the same problem once I turned the monitors on and started actually using the machines." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
(I'd better go and read that ReadMe file!)Which says, in its entirety, 6) The App may give Incorrect results on a restarted task. One way to avoid restarted tasks is to set the checkpoint higher than the longest task's estimated run-time, and also avoid suspending/resuming a task.The ReadMe itself has a datestamp of 07 December 2019, and a reference to the CUDA 10.2 app, so I think it's current (again, before this week's changes). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Well it’s simple. I was unaware that the monitor’s presence was so important since you claimed it only affected cards not attached to the monitor. oh wells *shrug* Any comment on why that is? Since it doesn’t affect the cards attached to the monitor, why would it matter if one wasn’t plugged in at all? This issue is getting more and more fringe the more you give additional nuggets to the exact reproduction. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
(I'd better go and read that ReadMe file!)Which says, in its entirety, oh that's another case. suspend/resume on a task that was running for >5 mins, caused it to restart from the beginning. checkpoint set to 10 seconds. running unmodified code (as far as checkpointing is concerned) see the results: https://setiathome.berkeley.edu/result.php?resultid=8530898546 No immediate overflow. hmm <core_client_version>7.16.1</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 1650, 3908 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 14 pciBusID = 2, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1650 is okay SETI@home using CUDA accelerated device GeForce GTX 1650 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 --------------------------------------------------------- SETI@home v8 enhanced x41p_V0.99b1p3, CUDA 10.2 special ------------------------------------------------------------------------- Modifications done by petri33, Mutex by Oddbjornik. Compiled by Ian (^_^) ------------------------------------------------------------------------- Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.013941 Sigma 97 Sigma > GaussTOffsetStop: 97 > -33 Thread call stack limit is: 1k setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 1650, 3908 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 14 pciBusID = 2, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1650 is okay SETI@home using CUDA accelerated device GeForce GTX 1650 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 --------------------------------------------------------- SETI@home v8 enhanced x41p_V0.99b1p3, CUDA 10.2 special ------------------------------------------------------------------------- Modifications done by petri33, Mutex by Oddbjornik. Compiled by Ian (^_^) ------------------------------------------------------------------------- Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.013941 Sigma 97 Sigma > GaussTOffsetStop: 97 > -33 Thread call stack limit is: 1k namedMutex: Previous mutex lock holder died in a bad way. namedMutex: mutex is now consistent and the lock has been acquired. Acquired CUDA mutex at 13:35:45,303 Spike: peak=25.86527, time=6.711, d_freq=1420131089.24, chirp=0, fft_len=128k Spike: peak=26.89025, time=20.13, d_freq=1420122652.43, chirp=0, fft_len=128k Spike: peak=24.61807, time=6.711, d_freq=1420131089.25, chirp=0.00092426, fft_len=128k Spike: peak=27.19312, time=20.13, d_freq=1420122652.44, chirp=0.00092426, fft_len=128k Spike: peak=25.89173, time=6.711, d_freq=1420131089.23, chirp=-0.00092426, fft_len=128k Spike: peak=24.62306, time=6.711, d_freq=1420131089.23, chirp=-0.0018485, fft_len=128k Spike: peak=27.94842, time=20.13, d_freq=1420122652.45, chirp=-0.0027728, fft_len=128k Spike: peak=26.68917, time=20.13, d_freq=1420122652.43, chirp=0.003697, fft_len=128k Spike: peak=25.5317, time=20.13, d_freq=1420122652.43, chirp=-0.003697, fft_len=128k Spike: peak=25.04365, time=20.13, d_freq=1420122652.44, chirp=0.0046213, fft_len=128k Spike: peak=25.17622, time=6.711, d_freq=1420122297.96, chirp=-0.0064698, fft_len=128k Spike: peak=27.33359, time=20.13, d_freq=1420122652.45, chirp=-0.0064698, fft_len=128k Spike: peak=24.86638, time=20.13, d_freq=1420122652.43, chirp=0.0073941, fft_len=128k Spike: peak=26.38151, time=6.711, d_freq=1420122297.95, chirp=-0.0073941, fft_len=128k Spike: peak=26.73647, time=6.711, d_freq=1420122297.95, chirp=-0.0083183, fft_len=128k Spike: peak=26.15466, time=6.711, d_freq=1420122297.94, chirp=-0.0092426, fft_len=128k Spike: peak=24.64791, time=6.711, d_freq=1420122297.93, chirp=-0.010167, fft_len=128k Spike: peak=25.53647, time=20.13, d_freq=1420122652.45, chirp=-0.010167, fft_len=128k Spike: peak=24.43285, time=6.711, d_freq=1420131089.24, chirp=-0.011091, fft_len=128k Pulse: peak=4.565244, time=53.74, period=13.04, d_freq=1420128421.21, score=1.025, chirp=-14.74, fft_len=1024 Pulse: peak=0.9206985, time=53.71, period=0.8667, d_freq=1420127920.98, score=1.147, chirp=24.411, fft_len=512 Pulse: peak=6.264369, time=53.7, period=16.99, d_freq=1420123459.99, score=1.07, chirp=25.878, fft_len=256 Pulse: peak=4.276228, time=53.7, period=10.35, d_freq=1420126806.94, score=1.092, chirp=-38.951, fft_len=256 Pulse: peak=11.43044, time=53.7, period=34.73, d_freq=1420125784.12, score=1.093, chirp=-69.364, fft_len=256 Pulse: peak=6.072789, time=53.7, period=13.55, d_freq=1420130468.28, score=1.045, chirp=70.431, fft_len=256 Pulse: peak=1.214941, time=53.7, period=1.475, d_freq=1420125425.59, score=1.143, chirp=-80.303, fft_len=256 Pulse: peak=7.049133, time=53.7, period=20.71, d_freq=1420130855.44, score=1.037, chirp=81.903, fft_len=256 Pulse: peak=7.853018, time=53.74, period=24.61, d_freq=1420130212.63, score=1.008, chirp=-97.11, fft_len=1024 Normal release of CUDA mutex after 275.928 seconds at 13:40:21,231 Best spike: peak=27.94842, time=20.13, d_freq=1420122652.45, chirp=-0.0027728, fft_len=128k Best autocorr: peak=17.38795, time=87.24, delay=6.4449, d_freq=1420126627.44, chirp=-3.7331, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.122e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0.9206985, time=53.71, period=0.8667, d_freq=1420127920.98, score=1.147, chirp=24.411, fft_len=512 Best triplet: peak=0, time=-2.122e+11, period=0, d_freq=0, chirp=0, fft_len=0 Spike count: 19 Autocorr count: 0 Pulse count: 9 Triplet count: 0 Gaussian count: 0 13:40:21 (15874): called boinc_finish(0) </stderr_txt> ]]> Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
oh that's another case.Yes, that's to be expected. GPU apps are never kept in memory when suspended (whatever the setting of LAIM). So the application always starts from cold, but whether the task resumes from checkpoint - well, that's what we're discussing here. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
and this one. paused it at ~85%, and it restarted from the beginning. same app and checkpoint settings as before. again, no overflow or signs of obvious problem. i'll have to keep an eye to see if they validate. https://setiathome.berkeley.edu/result.php?resultid=8530898585 <core_client_version>7.16.1</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 1650, 3908 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 14 pciBusID = 2, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1650 is okay SETI@home using CUDA accelerated device GeForce GTX 1650 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 --------------------------------------------------------- SETI@home v8 enhanced x41p_V0.99b1p3, CUDA 10.2 special ------------------------------------------------------------------------- Modifications done by petri33, Mutex by Oddbjornik. Compiled by Ian (^_^) ------------------------------------------------------------------------- Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.010006 Sigma 72 Sigma > GaussTOffsetStop: 72 > -8 Thread call stack limit is: 1k Acquired CUDA mutex at 13:34:15,703 Spike: peak=24.24003, time=62.99, d_freq=7768989285.06, chirp=13.136, fft_len=128k Spike: peak=25.17315, time=62.99, d_freq=7768989285.06, chirp=13.15, fft_len=128k Spike: peak=25.2738, time=62.99, d_freq=7768989285.05, chirp=13.151, fft_len=128k setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 1650, 3908 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 14 pciBusID = 2, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1650 is okay SETI@home using CUDA accelerated device GeForce GTX 1650 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 --------------------------------------------------------- SETI@home v8 enhanced x41p_V0.99b1p3, CUDA 10.2 special ------------------------------------------------------------------------- Modifications done by petri33, Mutex by Oddbjornik. Compiled by Ian (^_^) ------------------------------------------------------------------------- Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.010006 Sigma 72 Sigma > GaussTOffsetStop: 72 > -8 Thread call stack limit is: 1k Acquired CUDA mutex at 13:40:21,231 Spike: peak=24.24003, time=62.99, d_freq=7768989285.06, chirp=13.136, fft_len=128k Spike: peak=25.17315, time=62.99, d_freq=7768989285.06, chirp=13.15, fft_len=128k Spike: peak=25.2738, time=62.99, d_freq=7768989285.05, chirp=13.151, fft_len=128k Pulse: peak=1.465625, time=45.84, period=2.009, d_freq=7768995643.74, score=1.034, chirp=18.717, fft_len=512 Pulse: peak=5.816671, time=45.84, period=15.09, d_freq=7768992597.84, score=1.015, chirp=21.51, fft_len=512 Autocorr: peak=17.91966, time=28.63, delay=1.8598, d_freq=7768994010.02, chirp=22.084, fft_len=128k Pulse: peak=4.197519, time=45.81, period=8.652, d_freq=7768993221.95, score=1.008, chirp=-26.817, fft_len=32 Pulse: peak=7.609076, time=45.99, period=19.45, d_freq=7768990617.88, score=1.03, chirp=-42.025, fft_len=4k Pulse: peak=9.224659, time=46.17, period=26.49, d_freq=7768989424.6, score=1.004, chirp=-46.739, fft_len=8k Pulse: peak=3.718736, time=45.9, period=9.06, d_freq=7768993476.43, score=1.015, chirp=73.853, fft_len=2k Pulse: peak=6.40499, time=45.99, period=17.45, d_freq=7768991108.61, score=1.038, chirp=-76.734, fft_len=4k Pulse: peak=6.402134, time=45.99, period=17.45, d_freq=7768991109.83, score=1.037, chirp=-76.769, fft_len=4k Pulse: peak=1.796793, time=45.82, period=2.734, d_freq=7768991086.68, score=1.045, chirp=82.688, fft_len=128 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 1650, 3908 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 14 pciBusID = 2, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1650 is okay SETI@home using CUDA accelerated device GeForce GTX 1650 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 --------------------------------------------------------- SETI@home v8 enhanced x41p_V0.99b1p3, CUDA 10.2 special ------------------------------------------------------------------------- Modifications done by petri33, Mutex by Oddbjornik. Compiled by Ian (^_^) ------------------------------------------------------------------------- Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.010006 Sigma 72 Sigma > GaussTOffsetStop: 72 > -8 Thread call stack limit is: 1k namedMutex: Previous mutex lock holder died in a bad way. namedMutex: mutex is now consistent and the lock has been acquired. Acquired CUDA mutex at 13:44:28,979 Spike: peak=24.24003, time=62.99, d_freq=7768989285.06, chirp=13.136, fft_len=128k Spike: peak=25.17315, time=62.99, d_freq=7768989285.06, chirp=13.15, fft_len=128k Spike: peak=25.2738, time=62.99, d_freq=7768989285.05, chirp=13.151, fft_len=128k Pulse: peak=1.465625, time=45.84, period=2.009, d_freq=7768995643.74, score=1.034, chirp=18.717, fft_len=512 Pulse: peak=5.816671, time=45.84, period=15.09, d_freq=7768992597.84, score=1.015, chirp=21.51, fft_len=512 Autocorr: peak=17.91966, time=28.63, delay=1.8598, d_freq=7768994010.02, chirp=22.084, fft_len=128k Pulse: peak=4.197521, time=45.81, period=8.652, d_freq=7768993221.95, score=1.008, chirp=-26.817, fft_len=32 Pulse: peak=7.609076, time=45.99, period=19.45, d_freq=7768990617.88, score=1.03, chirp=-42.025, fft_len=4k Pulse: peak=9.224659, time=46.17, period=26.49, d_freq=7768989424.6, score=1.004, chirp=-46.739, fft_len=8k Pulse: peak=3.718736, time=45.9, period=9.06, d_freq=7768993476.43, score=1.015, chirp=73.853, fft_len=2k Pulse: peak=6.40499, time=45.99, period=17.45, d_freq=7768991108.61, score=1.038, chirp=-76.734, fft_len=4k Pulse: peak=6.402134, time=45.99, period=17.45, d_freq=7768991109.83, score=1.037, chirp=-76.769, fft_len=4k Pulse: peak=1.796793, time=45.82, period=2.734, d_freq=7768991086.68, score=1.045, chirp=82.688, fft_len=128 Pulse: peak=3.353529, time=45.84, period=6.621, d_freq=7768990246.3, score=1.008, chirp=93.583, fft_len=512 Normal release of CUDA mutex after 248.982 seconds at 13:48:37,961 Best spike: peak=25.2738, time=62.99, d_freq=7768989285.05, chirp=13.151, fft_len=128k Best autocorr: peak=17.91966, time=28.63, delay=1.8598, d_freq=7768994010.02, chirp=22.084, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=1.796793, time=45.82, period=2.734, d_freq=7768991086.68, score=1.045, chirp=82.688, fft_len=128 Best triplet: peak=0, time=-2.124e+11, period=0, d_freq=0, chirp=0, fft_len=0 Spike count: 3 Autocorr count: 1 Pulse count: 10 Triplet count: 0 Gaussian count: 0 13:48:38 (16567): called boinc_finish(0) </stderr_txt> ]]> Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
oh that's another case.Yes, that's to be expected. GPU apps are never kept in memory when suspended (whatever the setting of LAIM). So the application always starts from cold, but whether the task resumes from checkpoint - well, that's what we're discussing here. well that's what i'm trying to even get to happen. so far i've been unsuccessful in getting anything to resume from a checkpoint. it always just starts over from the beginning (which if i'm not mistaken, is the behavior we want anyway right?). can you give instruction on how you're getting a task to restart from a checkpoint? do i need to just kill boinc unexpectedly? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
well that's what i'm trying to even get to happen. so far i've been unsuccessful in getting anything to resume from a checkpoint. it always just starts over from the beginning (which if i'm not mistaken, is the behavior we want anyway right?). can you give instruction on how you're getting a task to restart from a checkpoint? do i need to just kill boinc unexpectedly?No, I can't - and I can't see any sign of a checkpoint being created in the file system. Haven't tried the <checkpoint_debug> Event Log flag yet, but them I always prefer to rely on direct observation, rather than fallible instrumentation. So, I'm beginning to think that some programmer anticipated this discussion by anything up to two years, and simply forgot to update the ReadMe (or tell anyone else what they'd done). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I’m starting to think the same thing based on my observations so far. Maybe petri already fixed this and didn’t mention it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
No need to fingerpoint etc. So monitor attached seems like one common attribute. TBar is this behaviour consistent with different drivers?! Linux flavors?! Im running Debian 10. Do you have a checklist so we can pinpoint this behaviour? Perhaps running Gnome vs Kde or anything?! I dont have all variables needed and in the Linux world there are alot. Kernels etc etc. The list is huge. :-/ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The last I heard anything about the Checkpoint from Petri was on 25 Apr 2019. The exchange went something like; Me: BTW, how did removing the Ckeckpoint workout? We need to get Raistmer to post the new code to svn before it can be recommended to Eric for beta. Petri: I haven't tested the checkpoint. Nothing since. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Well, there is one checkpoint still created - at the very end of the run, too late to be of any use except in the most unlikely of circumstances. If that one could be removed too (AND TESTED!), we could put this side of the conversation to bed - permanently. And concentrate on the monitors. Edit - Juan did send me a new test build this afternoon, saying he had taken out the [should we say remaining?] checkpointing. I haven't tried it yet, because I was too busy trying to find the problem I was supposed to be solving. I'll try and test it tomorrow morning. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
sounds like he sent you the build that I compiled for him. let us know if it acts any different (ie, not even creating that very end checkpoint). Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.