Questions and Answers : Unix/Linux : Fatal error in last minute of WU, but still reports success. Admin, please examine!
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Dec 05 Posts: 6 Credit: 1,468,014 RAC: 0 |
I\'ve just finished one which reported success but the details in the results file suggest otherwise, and I didn\'t see anything uploaded. 3 months processing and all this in the last minute... I\'d like to know if it really is OK, and if the files can be salvaged and uploaded somehow. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6312426 <core_client_version>5.5.0</core_client_version> <stderr_txt> (null): cannot open input file dataout/atmos_restart.day (null): cannot open input file dataout/ocean_restart.day ... [deleted] ... pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfo.pjk6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfo.pik6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfo.pfk6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfa.phk6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfa.pgk6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfa.pek6c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/b6hcfa.pdk6c10 to netcdf format. (null): cannot open input file dataout/ocean_restart.day Model crashed: umshell1.f: READ_FLH: I/O error (null): cannot open input file dataout/ocean_restart.day Model crashed: umshell1.f: READ_FLH: I/O error (null): cannot open input file dataout/ocean_restart.day Model crashed: umshell1.f: READ_FLH: I/O error (null): cannot open input file dataout/ocean_restart.day Model crashed: umshell1.f: READ_FLH: I/O error Fatal crash! :-( </stderr_txt> |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Yes Andy, it did finish, well done - result and graph here. Version 5.15 of the climate software shocks everyone by reporting every single error message since the beginning of the model, when the model completes! Looks as if you restored it from a backup at some point? (If so, well done for that too!) Now that I look at it again, you haven\'t actually been granted the usual amount of credit for it, so maybe there\'s a missing trickle or something, but your graph is certainly showing a complete run. ;-) Visit the Scotland team |
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
My latest completed model (on Vista) has the same type of error messages (a bit of a shock for me too, at first). When it finished, the last trickle was not credited immediately. Things got sorted out on the next database update. Since the database was down for a while starting yesterday afternoon till this morning and trickles haven\'t been updated since then, I can see why you could be missing more. There used to be missing trickles issues some time ago. If I remember correctly, they are taken into account as soon as the next one comes in and/or the database is updated. Forum search Site search |
Send message Joined: 11 Dec 05 Posts: 6 Credit: 1,468,014 RAC: 0 |
Yes Andy, it did finish, well done - result and graph here. Version 5.15 of the climate software shocks everyone by reporting every single error message since the beginning of the model, when the model completes! Looks as if you restored it from a backup at some point? (If so, well done for that too!) Thanks very much for your reassurance - I have another one on the other core to finish in 5 hours time, so looking forward to that too! I\'m surprised that it didn\'t seem to upload everything on completion. Yes, it is quite an achievement to actually complete a WU, I tried a few times on my overclocked Core2, but after a few weeks a crash would happen, something would get corrupted and the WU would abort :-( This time I ran it on my linux production webserver which is quite lightly loaded and stable, (as it has to be!) and the WUs survived. I tried to just leave it alone as much as possible, and not even sneeze in the general vicinity! It might be a good idea to award a substantial credit prize on successful completion. I hadn\'t restored a backup on the machine, but upgraded the kernel a few times so requiring a reboot. Once the last 5.15 WU has completed, should I detach and reattach to clean out the folder and prepare for the new app? Cheers, Andy. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Nice idea but it can\'t be done. CPDN awards credit as the Run progresses. It is intended that all boinc Projects award the same amount of credit for equal amounts of work. So... If a CPDN Run bombs somewhere along the way, the participant still gets par-value credit for work done. If a Run finishes, full credit will have been given. If CPDN then tossed-in a bonus, the theoretical balance among Projects would be skewed. It might draw additional participants to CPDN but I doubt it would please leaders of other Projects. Congratulations on your success. Your effort contributed significantly to the science. Thanks for participating and I hope we see you around for more. (Note: New options are being tested, shorter-running than the current Coupled Model. Other Models are in various stages of planning and development, so it will be an interesting place to be for quite awhile.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 11 Dec 05 Posts: 6 Credit: 1,468,014 RAC: 0 |
Nice idea but it can\'t be done. CPDN awards credit as the Run progresses. It is intended that all boinc Projects award the same amount of credit for equal amounts of work. So... Well, in principle yes, but in practice I suspect a little more generosity wouldn\'t go amiss as these WUs are about 1000x longer than any others and require a lot of patience, commitment and stamina to see through!
Yes, but this is also a negative thing which doesn\'t give so much incentive to take care of the task!
Well, perhaps avoiding churn and keeping the existing participants interested may be more significant than bringing in new ones that then just drop out after a while!
Thanks - I have a nice warm fuzzy feeling now at actually having got all the way through two of these monsters... :-) the sulphur runs last year were much shorter, but I still had difficulty keeping a machine stable enough to run continuously while trying to survive occasional power outages, developing applications which could do all sorts of unpredictable things, and rendering animations etc. I think a greater degree of granularity would help overall, say distributing 10 year pieces - you can combine them as they come in, though there isn\'t the same magnitude of satisfaction on completion! ;-) Also, optimisation for the significant numbers of SSEn+ enabled processors (of course without losing sight of accuracy) and maybe even a PS3 version, which I think would be a major feat! Conceivably they could do a WU in about a week, if single precision could be fudged to produce acceptable results, though would still be useful in double precision mode. I guess the next version of the Cell will do DP just as quick as current SP anyway, so worth a thought! I\'m very concerned about climate change, and look forward to learning about your developments and of any improvements in model capability and code optimisation. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The \"granularity\" is at 40 years. This is what the restart dumps are for. If you want a preview of new models that are coming soon, you can join the beta testing, or get a vague idea from the post in this page, dated Tue Feb 27, 2007 11:12 pm The optimised models are already available as version 5.40 TCMs. As for getting more credits for your models, dream on. It\'s not going to happen. Backups: Here |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Andy The 10-year, or 40-year sections of the model can\'t just be run separately and then combined. Every model has start conditions which are the values for the parameters. Different for each model. But as the model progresses, the conditions change and the changes are cumulative. So to run a model from, say, Dec 2000 you need the results up to the end of Nov 2000 (ie the restart dump), and you\'d need to wait until another computer had completed it up to that point. May as well do it all on one computer. Cpdn news |
©2024 cpdn.org