Message boards : Number crunching : WUs constantly failing
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
You could try removing BOINC from the startup folder, so that, when your brother turns on the computer to play games, BOINC doesn\'t start. Yes i know :) But for now it\'s fine in this way, i make a boinc backup every morning ;) I want to keep the model crashing, so maybe i can find something useful to help programmers fixing issues like this. Because avoiding the start of boinc, or to make a backup is a suitable way for expert people, not for the normal user. In special mode when the workunits last several months :) Now i have to find a way to avoid the cleanup after the model crash, to find the error in the yabds.out. In the working model, in yabds.out there are errors similar to those on db\'s post, if i remember i\'ll copy them here tomorrow. |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
I don\'t think this is the case. If you look at the machine (325133), you will see that the most recent crash was on a model issued on 16th Jan, long after the batch of bad WUs was resolved. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
This is due to a batch of bad WU\'s sent out previously. I think Tolu meant another bad batch, but I could be wrong. |
Send message Joined: 6 Aug 04 Posts: 42 Credit: 3,693,897 RAC: 3,475 |
Looks like I\'ve been hit with these as well. This host. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Me too ... I just killed two .. don\'t feel bad, i notice that they both failed for someone else too ... |
Send message Joined: 30 Aug 04 Posts: 77 Credit: 1,785,934 RAC: 0 |
Pretty easy for me, of 24 machines, not a single one so far has managed to process Sulphur correctly. Effectively I\'ve suspended CPDN until they fix the recent, enourmous Problems with their Clients, nothing else to do (hughe waste of resources otherwise) :( Scientific Network : 44800 MHz - 77824 MB - 1970 GB |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
why my post has been deleted? maybe it was long? or i should post it in the phpbb forum? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Regretfully you mentioned a certain name associated with a certain, soon to be released, project. Mods & Admins have been notified that this name & project are not to be mentioned, due to legal requirements, so I had to delete your post. Lots of mine, from before I found out, have also been deleted. Actually, they are not \"deleted\", just hidden, and in a week or two, when it is officially announced, I\'ll go through them, and \'return\' them to view. As for your offer to help with the testing, this is being done by people with known records of being able to complete models. There is no credit, just \'destructive testing\' of all the options with a computer that is known to work with spinup, which is also a difficult testing process. After this, the user has the option to continue, to produce some starting data for \"IT\", or return to spinup. It is hoped that mentions of \"IT\" on sites outside the control of this site do not attract undue attention. And I hope that people reading these boards do not persist in posting about this matter. Anyone whose computers can\'t complete a sulphur model can wait for the coupled model in a month or two, or concentrate on other projects for a while. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Well, I have been doing pretty well I thought. But one computer now seems to have had several in a row. I am going to try to get one more to see ... I had been completing and had completed several SLab models with that computer and the first of the three deaths seems to be the cross from phase one to two ... I think that was mentioned as an issue. The last two were \"819\" traps on start up. Which I find interesting as I don\'t run the graphics and that error trap is USUALLY an indication of a video card issue ... well I will try one more I guess ... ==== edit Forgot to mention that soneone else also tried the last two models and also got an \"819\" error, though one DID run for a bit ... interesting ... Really strange ... |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
ok, i din\'t knew this :) so I copy - filter - paste the old message that if you wish you can delete or keep hidden :) http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1108852 Same workunit :) as i said before, i let this workunit to crash to gather some useful info for avoiding crashes :). in one crash the yabsd.out was still present, in the last part there was this : FIXED LENGTH HEADER ------------------- Dump format version-32768 UM Version No 401 Atmospheric data On hybrid levels Over global domain Ancillary dataset Exp No =-32768 Run Id =-32768 360-day calendar Arakawa B grid Year Month Day Hour Min Sec DayNo Data time = 0 1 16 0 0 0 0 Validity time = 0 12 16 0 0 0 0 Creation time = 0 1 0 0 0 0 0 Start 1st dim 2nd dim 1st parm 2nd parm Integer Consts 257 15 15 Real Consts 272 6 6 Level Dep Consts -32768 1 1 1 1 Row Dep Consts -32768 1 1 1 1 Column Dep Consts -32768 1 1 1 1 Fields of Consts -32768 1 1 1 1 Extra Consts -32768 1 1 History Block -32768 1 1 CFI No 1 -32768 1 1 CFI No 2 -32768 1 1 CFI No 3 -32768 1 1 Lookup Tables 278 64 912 64 912 Model Data 58881 6391296 6391296 LOOKUP TABLE 58368 64-bit words long ANCILLARY_STEPSim(s_im) 5 INITMOS : MOS_OUTPUT_LENGTH = 1129 im,sm,ngroup,new_im,new_sm 1 1 48 T F PPCTL: Opening preattached file on unit 60 PPCTL: Opening preattached file on unit 61 PPCTL: Opening preattached file on unit 62 PP_CTL: Error Buffering in Fixed length Header Empty PP File in Climate Mode? Error code = 0.00 Length requested = 0 Length actually transferred = 256 PPCTL: Opening preattached file on unit 63 PPCTL: Opening preattached file on unit 64 PPCTL: Opening preattached file on unit 65 PPCTL: Opening preattached file on unit 66 PPCTL: Opening preattached file on unit 67 PPCTL: Opening preattached file on unit 68 in the last crash there was only the stderr_um.txt file with this: BUFFIN: C I/O Error - Return code = 16 naturally i backup everything so the climate model continue to advance and as you can see my machine continue to trickle :) I don\'t think is a workunit problem, but a application problem that should be solved because how you can tell to normal people, that before playing some games or make something with an heavy load, that they must backup the boinc folder or shut down boinc? Best Regards Luigi |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
It did not take long ... error and it failed for the other participant on start up ... Are we sure we are done with the bad work units pending? Well tomorrow is another day. I looked in my account, I thought I had done more sulfur, but so far have only successfully completed one. But, I do have another that is only 2 days from completion and it runs continuously, so, theory says that should be a good estimate (though I am not sure ... probably will take 3-4 days). Regardless, I have one more coming, the next one after that has 16 days to run ... |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
I\'m a programmer so i know that to report a possible bug it\'s better to give more details :) My computer is an athlon 64 3200+ 754 pin 0,13u 2Ghz motherboard abit kv8 with latest bios 1 GB of ram (2 ddr400 modules) addon boards DVB-S Board - skystar 2 Pinnacle board PCI-500 a standard realtek ethernet board no sound card, using integrated one. standard clock, also the memory timing are from the SPD settings. I have no power supply problem, i have an enermax power supply(i don\'t remember the model :) ) no problem with the cpu overheating, i\'m using hyper 6 from cooler master, 950g of laminated copper :) I have latest stable drivers of everything, andthe system is completely stable I\'m running boinc version 5.2.13. with normal installation (no service) with automatic start Os Windows Xp pro SP2 without any additional update. How to reproduce these issue, it\'s simple: 1) you need a brother (maybe it\'s not strictly necessary :) ) 2) turn on computer. 3) wait until the logon of windows appear 4) logon 5) Start a standard game, in this case Splinter cell 1 6) After 30 min or 1 hour you exit from game 7) Model crashed This weekend i\'ll try to reproduce the model crashing myself to gather more specific details. And i want to try also if i can reproduce with another computer. |
Send message Joined: 28 Aug 04 Posts: 90 Credit: 2,736,552 RAC: 0 |
> 5) Start a standard game, in this case Splinter cell 1 > 6) After 30 min or 1 hour you exit from game > 7) Model crashed I would recommend you to shut down BOINC or at least suspend computation while gaming! |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Well, I wish mine were that simple. Stable, single use platform, BOINC only ... heck, it is so single use I usually only look at it through RealVNC as there is no need to go local ... :) |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
> 5) Start a standard game, in this case Splinter cell 1 Yes i know it, at least for climateprediction. But you, I and several thousand of people can do this, but there are millions out there that can\'t do this. To let the boinc platform be more attractive to normal users, must be more reliable also if you are playing a game :). the average computer user, is capable to surf in internet, write an email and to install a program, of the other thing he doesn\'t care. I look on the forums, in the server status, results page and so on several times in a day, you can say that I\'m a boinc addicted person :) An example of an ipothetical non expert user. 1) one friend tell him that can use is spare cpu time for something usefull. 2) He thinks \"why not?\" 3) install the boinc client (if he is capable) 4) He choose the projects he likes (now is better than before, but i\'m waiting for account managers :) ) 5) He is sure that it don\'t need his attention and he forget about the boinc existence for a month 6) after that because for 1 Hour a day plays his favourite game, in a month has lost 30 climate models,lost time, wasted server resources, and no science done. 7) Deleted boinc and user lost. My first DC project was UD, and i liked of it that it was an install and forget program, then over 1 year ago i switched to boinc because i liked its philosofy. To make an example, I keep the UD client on a friend\'s computer where i have very infrequent access. I would like to install boinc as soon as i can, but for now to manage a remote client with dynamic ip it\'s a **** ** *** *** P.S. I\'m sorry for my bad english :( |
Send message Joined: 28 Aug 04 Posts: 90 Credit: 2,736,552 RAC: 0 |
> but for now to manage a remote client with dynamic ip it\'s a **** ** *** *** Heared about dynamic DNS services like http://dyn.dns.org? |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Perhaps for your friend a different project might be more appropriate. I know this sounds like heresy ... but, not all project are suitable for all computers and all people. I have had decent luck running CPDN on all my PCs, occassional model crashes for various reasong, but, aa pretty decent track record. Heck I am about to complete my second Sulfur model in a coupld days (1 day 12 hours). But, though you would think that it would be a better computer to run CPDN I have yet to complete a model on my PowerMac G5 ... bad computer? Bad program, gremlins? who knows. But, I just stopped and now run other projects on the PowerMac, it really shines at Einstein@Home ... Again, this is the beauty of BOINC ... Oh, and WCG uses the UD program if you like, or you can run thier two projects under BOINC like I do ... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I occasionally run Doom while BOINC is running. The only problem is when it starts to benchmark. Then Doom slows right down and movement gets jerky. At least for me. The baddies seem to keep going. :( When/if I wake up to it, I suspend Doom until the benchmark is finished. |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
Perhaps for your friend a different project might be more appropriate. I know this sounds like heresy ... but, not all project are suitable for all computers and all people. For now i run climateprediction only in my home computer, where there is no internet access, it\'s the best project for a computer like this. I only have to backup boinc folder every day and once in a while transport with cdrw at work. in my friend\'s computer i\'ll install boinc with WCG and einstein i think, surely i\'ll not install climateprediction. I like this project but needs to much user attentions. BTW with my home computer i managed to do an old slab model, without backups and with a lot of gaming ;) http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=250218 |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
I occasionally run Doom while BOINC is running. The only problem is when it starts to benchmark. Then Doom slows right down and movement gets jerky. At least for me. The baddies seem to keep going. :( Not every game eats 100% cpu, for example when i play with pes 5 the model continue to advance because the game doesn\'t need much cpu time :) many games simply do this: while (1) { continue; // :) } |
©2024 cpdn.org