Message boards : Number crunching : Sulphur units constantly failing
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
The crashing definatly has to do with your computer. I think my last model crashed once but it had to have been a fluke, I have used so many different aplications at the same time as the model and not had a crash, that I dont know when to expect one. Therefore if I am going to do a back up, it will probably be once a month, but my question is what needs to be backed up if I am to back up the data? And how should I back up the information? Should i just copy the files to a different location on my c drive? If your system is unstable with high cpu usage then good luck. I wouldn\'t do graphics intensive game while I modeled. That and make sure your computer can breathe. What does a crash look like for those of you who are having crashes? |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
The crashing definatly has to do with your computer. I think my last model crashed once but it had to have been a fluke, I have used so many different aplications at the same time as the model and not had a crash, that I dont know when to expect one. I\'m Sorry but my computer is perfectly stable. I can run climateprediction without problems for many hours. I tried prime95 with climateprediction 50/50 cpu time for 24hours without a problem. The problem is when an higher priority program require 100% cpu the climateprediction application get out of sync (remember that climateprediction run with a very low priority). I know it and now i stop boinc every time i know that another application will need 100% cpu time. There is a post also on the boinc dev\'s mailing list about this issue |
Send message Joined: 30 Aug 04 Posts: 77 Credit: 1,785,934 RAC: 0 |
Yep, the Sulhur Clients are still highly unstable to certain standard Situations. (I know since after the last desaster I paused and just recently fired it up again... Results appear unchanged, hardly any of my Sulphur Models will ever complete.) Scientific Network : 44800 MHz - 77824 MB - 1970 GB |
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
I am wondering if there is any way to verify the data before the phase is up. I dont know what happens but maybe every 5% it should try to verify the data, and make sure that the program didnt run into some sort of error. This is my input on the matter. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It\'s constantly verifying the data. If it has a problem, it rewinds a day and trys again. Then it will rewind a month, and then a year. If it still has a problem, then it quits. You can see the files in the dataout folder of the model: restart, restart.month, restart.year |
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
ahh ok cause it just did that today to me it was at 9.02% then it jumped back to 9.00% when it had a problem late yesturday. The bionic client said 0% progress after the benchmarks failed cool thanks |
Send message Joined: 31 Aug 04 Posts: 13 Credit: 134,268 RAC: 0 |
i just had one die halfway phase 4 - any help??!? this is the second of my sulfur units to fail, it was after a restart, but i suspended cpdn first. no backup. :( result # 1754289 <core_client_version>5.2.13</core_client_version> <message><file_xfer_error> <file_name>sulphur_j55s_100893152_0_4.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> <file_xfer_error> <file_name>sulphur_j55s_100893152_0_5.zip</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> </message> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
christo Sorry, no help. This is why big business make daily backups of their computer data. |
Send message Joined: 31 Aug 04 Posts: 13 Credit: 134,268 RAC: 0 |
This is why big business make daily backups of their computer data. so does my small business... unfortunately after the last run failing, i didnt want to touch this wu at all. however i had to restart the computer, and it failed within about 6 hours. |
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
i have a question. I havent been keeping too close of a track on how much time i am crunching but I have a feeling that its crunching faster than it did in the beginning. I am wondering if it is possible that crunching rate would change as the percent complete increases? If so can anyone tell me why this is or could this be related to failing work units? I think this might be happening but i dont have actual data to calculate the rate difference. Just wanted input Thanks |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
Speed does change over the lifetime of a model, but not normally by much. Bear in mind that s/TS is averaged over the entire progress of the model, so the most usual reason for a fall is that something happened earlier to cause a slowdown such as a rewind to an earlier point (which may well be what happened in your case). |
©2024 cpdn.org