Thread 'models not avail for Linux - AMD x86

Author	Message
bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51612 - Posted: 12 Mar 2015, 22:13:28 UTC Last modified: 12 Mar 2015, 22:16:58 UTC Hi, I'm running CPDN under Ubuntu 14.04 LTS 64bit and even without the 32bit libraries BOINC worked. I had few crashes so I installed 32 bit libraries as suggested by various forum members (and the sticky). In the last few days I successfully ran few models see here Then I start getting this message UK MET Office HadAM3P-HadRM3P Europe is not available for Linux running on an AMD x86_64 or Intel EM64T CPU and no work is fetched. I see quite a few HadAM3P-HadRM3Ps available for downloading. Any ideas about the cause of this message? ID: 51612 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 51613 - Posted: 12 Mar 2015, 22:24:17 UTC - in response to Message 51612. I had this recently on a machine that has since been retired and if memory serves correct it was another 32bit lib missing. Sorry I can't remember which one it was. Did you install the ones I added to get the graphics to work as well or not? (see my post before last in the thread at top of this section. ID: 51613 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51614 - Posted: 12 Mar 2015, 22:50:51 UTC - in response to Message 51613. Hi, I do have libGL1-mesa-dri:386/glx:386 have LibXmu6 but not :386 - not in repo have LibXt6 :i386 have libXi6 :i386 libX6 :i386 not in repo ID: 51614 · Reply Quote

Thyme Lawn Volunteer moderator Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0	Message 51615 - Posted: 13 Mar 2015, 1:58:46 UTC - in response to Message 51612. Any ideas about the cause of this message? The project team need to get Linux only MOSES II tasks returned as soon as possible. The UK MET Office HadAM3P-HadRM3P Europe application has temporarily been restricted to Windows only in an attempt to achieve this. The applications page shows the current model availability for each platform and the server status page shows what work is ready to be sent (I've asked the project team to change the hadam3p_eu label to make it clear that it's currently Windows only). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer ID: 51615 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51617 - Posted: 13 Mar 2015, 7:36:36 UTC - in response to Message 51615. ohh the MOSES ones. I think it will be a waste of time, resources and the model themselves to run them on my two linux machines (4 cores in total), as I shut them down regularly and I'm pretty sure models lost zips and will report (one already did) computation errors. And they also need >450h hours to complete. But I can try again, can't I? Or I will just waste the model? thanks ID: 51617 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 51620 - Posted: 13 Mar 2015, 10:51:30 UTC - in response to Message 51617. Worth trying again I think. When I shut my machines down I always use the suspend to disk/hibernate and find this greatly reduces the crashes. - I get one every couple of months that might be attributable to the closing down though I only close down a couple of times a week on one machine and up to once a day on the other. ID: 51620 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51622 - Posted: 13 Mar 2015, 11:59:24 UTC - in response to Message 51620. So you are not shutting down but hibernating or suspending your machines? Or you suspend tasks (Leave application in memory when suspended - checked - though this should not help when shutting down as RAM should be cleaned) after a checkpoint, exit manager and boinc, wait some time then shut down machines - this is what I do? Got one with TRIFFID - it says 200h but on the 3rd h it is only 0.7% progress. so I expect double the estimate. 3 weeks at best. Will see. ID: 51622 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 51623 - Posted: 13 Mar 2015, 12:01:39 UTC - in response to Message 51622. Using Kubuntu's suspend to disk function. On my laptop which has two hard disks, an SSD for the OS and Swap it opens up very quickly from suspend to disk. At some point I will do the same for my desktop. ID: 51623 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51625 - Posted: 13 Mar 2015, 13:28:23 UTC - in response to Message 51623. Do you suspend or exit BOINC prior hibernating? I have similar set up (Ubuntu), but do not hibernate - for SSD protection - and Wi-fi non waking up - sometimes (then needs full reboot) p.s. Moved BOINC data to HDD as fstrim was going mad the last 5 days ID: 51625 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 51626 - Posted: 13 Mar 2015, 14:11:31 UTC - in response to Message 51625. I just hibernate, I don't suspend or exit BOINC. BOINC RUNS ON THE conventional hard disk as I have trashed one through the large number of writes BOINC performs in the past. On the newer machine which is the laptop, I have 8GB and swap hasn't been used any time I have checked. ID: 51626 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 51627 - Posted: 13 Mar 2015, 19:01:07 UTC - in response to Message 51623. Last modified: 13 Mar 2015, 19:03:22 UTC Dave, Any time that the MOSES model is removed from memory (exiting BOINC or rebooting system), those models will return 0 more trickles for the model year it was running at that time, and the zip file will not be created for that year. You can look through your currently running MOSES models and find a "skip" in the trickles expected for that year. At the end of the run, it will have a status of error because it won't upload a yearly file for that year. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17907699 http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17880399 http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17770278 http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17784754 So those are four you are currently running which will have a result of error on the task webpage. Here's an example of one of yours, that obviously ran to the end, but has an outcome of Client Error because of 3 missed upload files. So the model was removed from memory sometime during each of three model years. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17814513 ID: 51627 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 51630 - Posted: 14 Mar 2015, 10:23:36 UTC - in response to Message 51627. Thanks for that, helps me understand a little better. - looking at the dates when the skips have occurred, they all seem to relate to when I have shut BOINC down and rebooted in order to apply kernel updates from ubuntu. I shall try leaving them till none of that type of task are running to check out whether hibernation produces the same result. ID: 51630 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649	Message 51631 - Posted: 14 Mar 2015, 10:44:29 UTC - in response to Message 51630. Thanks for that, helps me understand a little better. - looking at the dates when the skips have occurred, they all seem to relate to when I have shut BOINC down and rebooted in order to apply kernel updates from ubuntu. I shall try leaving them till none of that type of task are running to check out whether hibernation produces the same result. Yeah, unfortunately, that's just how it is. Those MOSES global things can't stand most kinds of restart. If "hibernation" i.e. suspending to a state image but without losing memory doesn't cause upload loss -- that might help. Hopefully. ID: 51631 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51633 - Posted: 14 Mar 2015, 11:18:36 UTC - in response to Message 51631. Yeah, unfortunately, that's just how it is. Those MOSES global things can't stand most kinds of restart. If "hibernation" i.e. suspending to a state image but without losing memory doesn't cause upload loss -- that might help. Hopefully. Can they stand any kind of restart? If not than it should be clearly written somewhere on the page - These models need to run 24/7 until done. If one cannot do it do not run these models on your machine. At the moment I do micro management and a) it will even not help running a successful MOSES model b) it goes against the idea of DC c) I waste time, energy and models Isn't there any way things get better besides me investing in super cruncher? By the way how do you determine there is a skip in trickles and zips, until stderr is not reported? ID: 51633 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649	Message 51634 - Posted: 14 Mar 2015, 13:11:08 UTC Last modified: 14 Mar 2015, 13:18:20 UTC Yeah, well, actually. Probably the intermediate uploads and trickles are worth something. It might be good if the various distributed project teams had better communication with the crunchers. I figure that, however misconfigured and vulnerable the models are, at least they keep my house warmer (all those watts? yeah?) -- but winter is over here in North America. :) I've actually run several of these MOSES globals to "successful" completion. Now, the MOSES globals runnning on my machines will almost all predictably "fail", not because of being interrupted, because they misconfigured to only run 9 uploads, good, but when the 10th doesn't happen, because it wasn't progammed, BOINC sees a failure. Naah, the data that gets uploaded is worthwhile, ignore the BOINC infrastucture failure reports -- but complain. The problem with the submitters that sometimes can't get the complexities of the BOINC infrastructure -- hope somebody out there can help. Hope they better at climate science than they are with the BOINC infrastucture. Do the "scientists" know 9 from 10? probably not. Hope this is not offensive to the "scientists". but really - it annoys me no end running these things and seeing the possibly true or probably not "failed" after a week or two Anyhow, I keep on running these models, despite the BOINC failure idiocies -- But it makes it hard to recruit new users. ID: 51634 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 51635 - Posted: 14 Mar 2015, 15:13:23 UTC - in response to Message 51633. Can they stand any kind of restart? No If not than it should be clearly written somewhere on the page - These models need to run 24/7 until done. If one cannot do it do not run these models on your machine. I agree. At the moment I do micro management and a) it will even not help running a successful MOSES model b) it goes against the idea of DC c) I waste time, energy and models It's hard to argue with any of that. Isn't there any way things get better besides me investing in super cruncher? Maybe things get better if they hire another IT/programmer person. If you ask me, they are stretched way too thin on that side and known problems don't get fixed, or get fixed after a very long time. Shortage in IT/support/programmer staffing has always been somewhat of a problem with this project. It's gotten considerably worse lately, at least in part because of the number of different types of models being run now. By the way how do you determine there is a skip in trickles and zips, until stderr is not reported? For the MOSES II (global only) models, if you look at the trickle listing on the webpage of a successfully completed model, for example http://cpdnbeta.oerc.ox.ac.uk/result.php?resultid=12895915, you'll see trickles spaced at 2880 timesteps with every 12th one at 5760 timesteps. If you see a model with more timesteps between trickles in the listing than 5760, then no doubt it will return a result status of error at the end because the yearly upload file for that interrupted year will not have been generated and uploaded. ID: 51635 · Reply Quote

Eirik Redd Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649	Message 51638 - Posted: 14 Mar 2015, 17:45:44 UTC Last modified: 14 Mar 2015, 18:02:31 UTC So, the question that comes to my mind is -- Am I insane to keep crunching these misconfigured models that report fail? Possibly so, possibly not. (I've a few dozen running on my minifarm, and won't kill them) Some of these wu's will report OK, many will fail (supposedly, in BOINC, not the science (not because of restarts, ooh no) Because some low-level idiot misconfigured the BOINC interface for these jobs. If only that klutz would show up on these forums and apologize -- won't happen. This site is a public site not only for us volunteer crunchers, also for the sometimes good, sometimes slobs who submit crunch jobs here. Some of the "researchers" submit clumsy misconfigured wu's here. Stand up and say so, and apologize!! I know you won't . The very limited staff at CPDN provide a service to climate researchers all around the world. These "researchers" might be funded idiots, or more likely are into their climate models, and delegate the BOINC and HAD interface to undergrads with no clue, and don't check back for a year or so, and then complain to Myles and all -- "Your service didn't get me what I want?" Because your undergrad can't count to ten. (provably so, with the Moses models referred to elsewhere - where? look at your results - 9 isn't 10 -0 got it?) I think that's what's happening with some of these obviously misconfigured models. But there's no way for us volunteers to reach back to the funded total slobs thru the public service we crunchers provide, and say -- because we don't know what stupid uncaring drunk idiot submitter of these stupid things that report to us "failed" -- Myles -- please feed back. Some of the scientist users of this site are slacker flobs who don't give a ra about the crunchers who care. Yes, it's not the site, it's not the few underpaid supporters of the infrastructure. The problem that is about to turn me, and others off, is that -- it's free -- so "researchers" using my CPU time don't have to give a damn about getting their wu's even close to passable. They don't care. Costs them nothing. OK? got a clue? Us crunchers feel exploited. Clear? I apologize if I have offended any crunchers. As for the (edit- no offense) persons or institutions that submit these wu's that appear to fail (I don't know if they fail or not, "I am not a scientist") But it's getting kind of old, these "BOINC reports failure" should I worry that my CPU wasted? Anyhow rant done. I don't know who to blame, but this many misconfigured? many wu's is getting really really old. And I don't apologize for this rant. Needed to be said. I'm no drunker that the clowns that submitted the last 15000 misconfigured wu's that I've seen. ID: 51638 · Reply Quote

bernard_ivo Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981	Message 51639 - Posted: 15 Mar 2015, 16:09:09 UTC - in response to Message 51638. Thanks geophie and Eirik "Welcome to the world�s largest climate modelling experiment" no comment... Perhaps we crunchers could utilize Twitter which is visible on CPDN front page and start asking CPDN is properly backed with IT/programmers whatever needed so this experiment really gets the crunchers needed to achieve the goals it set. Or write a letter to the scientists/BBC or Oxford?! Perhaps we can start a discussion in the forum and then move it towards the staff and scientists. Any other grassroots ideas? Do we need IPCC 7 or 8th report to be out so climate work gets the attention needed?! ID: 51639 · Reply Quote

Thread 'models not avail for Linux - AMD x86_64 or Intel EM64T CPU'