Message boards : Number crunching : News and Announcements
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next
Author | Message |
---|---|
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
CPDN main project I am afraid we have been forced to take the independent climateprediction.net message board (the phpbb forum) offline for investigation and maintenance. On the evening of Wednesday 20th March a hidden iframe redirect was found on a number of pages on that message board. We are currently looking into this security issue. The main portion of the CPDN website is also hosted on this server, and so this portion of the website is also offline. We hope to resolve this issue soon and restore normal services. This problem does not affect the availability or download of climate models and the upload servers are available as usual. The CPDN Team Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
CPDN main project The problem explained in the last post which has made it necessary to shut down some climateprediction.net server programs has caused an additional problem which has just come to light. It is possible for new members to join the project, but at the moment it is not possible to attach computers to climateprediction.net. If you cannot attach your computer you cannot to download new climate models for the time being. If you are affected by this problem you can attach to other projects to keep your computer busy. In BOINC Manager in the Tools menu select Attach to project and then choose a project. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
CPDN main project It is again possible to attach computers to the project and download work when it is available. Thank you, Jonathan! Reminder: you can subscribe to this thread by pressing the button at the top and receive an email whenever a new notification is posted. Cpdn news |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The recent batch of hadcm3n models (issued around 5 April), has a problem which causes FORTRAN errors, as discussed in a Windows thread. This is not the same as the FORTRAN errors that have occurred intermittently over the years. This one is fatal. Those on Macs and Linux will self abort shortly after starting, but on Windows systems, they just sit there not running. And will continue to do so until they time out. This means that they'll never return a trickle_up file, so the server can't distinguish between them and successfully running models to send a "Killer trickle". e.g. I have 2 models from a December batch which are around 85% complete. Other people will also have some that are OK. That means that on Windows, it has to be diy. :) There's a few ways to check between good and bad: 1) No trickle_up files returned. But this also depends on the mix of projects being run. 2) From the date near the top of the model's page, and also near the top of the workunit page. 3) No progress, either in the BOINC manager window, or on the Show graphics window. The project apologises for the problem. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Upload server uploader1.atm is down but will probably be up again soon. Very few model files upload to this server at the moment so not many CPDN members will be affected by the outage. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
All the servers are now running. Cpdn news |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
As part of the ongoing problems with attaching here, it's becoming obvious that there's a very important file on everyone's computer. This is: account_climateprediction.net.xml It has the information that the project needs to identify you. Please make a copy (or several), and keep it in a safe place for when hardware problems occur. Keeping the equivalent for ALL of your projects is a good idea for the same reason. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The latest unplanned outage was caused by a hard disk failure in climateapps2, which is the BOINC server as well as an upload server. This disk was replaced and the raid system slowly rebuilt itself. Then the 2nd disk was replaced, which then needed another rebuild. The server is still trying to re-sync itself, but is being hammered by all of the computers pushing and shoving in their attempts to get their data back to the project, and to try for more work. And it seems that there is a new problem(s), resulting in lines of error messages appearing on various/ several/all pages. This has been reported. |
Send message Joined: 28 Mar 11 Posts: 35 Credit: 82,588 RAC: 0 |
Planned downtime: 2 - 5 August 2013 The server room in which CPDN resides is due to be shut down for electrical testing on 2 August 2013. The testing will take place over the weekend, and CPDN servers will be brought back online on 5 August 2013 (assuming everything goes to plan). There will be NO CPDN service during this time from any Oxford machines: ClimatePrediction.net Climateapps2.oerc.ox.ac.uk [this forum] cpdn-upload2.oerc.ox.ac.uk charybdis.oerc.ox.ac.uk cpdntrickle.oerc.ox.ac.uk uploader.oerc.ox.ac.uk uploader1.atm.ox.ac.uk cpdnbeta.oerc.ox.ac.uk trillionthtonne.org seacourt.oerc.ox.ac.uk glaaki.oerc.ox.ac.uk kraken.oerc.ox.ac.uk wyrm.oerc.ox.ac.uk |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Climateapps2, the BOINC server, is still having problems. It's very old, and is having difficulty remembering to keep all of it's volumes mounted. Plans are under way to retire it, but in the meantime, it will help to keep it up and running if people minimise their poking and prodding. e.g. constantly looking at the server status for work, trying to upload data, (if it's possible, keep the Network set to off), etc. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Problems are larger than initially thought. Currently, Climateapps2 isn't processing Trickles nor requests to check Server Status. Andy is working on it. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And Andy knows about the "key file" problem as well. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
1} Climateapps2 is up and running, and known major bugs were squashed. (Andy has been busy -- he's watching the store alone these days because Jonathan is away.) 2) Andy wrote that he has requests from three scientists to generate more work. He'll get to it as soon as he can but we don't have a timeline yet. (He's busy cleaning-up after a long stint in 'firefighting' mode.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Main database server power-supply failed. Attempts to Trickle-up, for example, return: "Server error: feeder not running". Jonathan is working on it. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
The credit generation system is still not working after the climateapps2 server rebuild, the administrators are aware & have been investigating for quite some time. Once it is resolved, everyone will get the outstanding credit for work done since the original server failed (30th July). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
The credit system is running now- models are being marked with credit based on trickles, and all work since the old server crashed looks like it has been credited. The export process will send this to external statistics sites within the next day or so.
I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
You may, or may not, have noticed that the number of RAPIT (hadm3n) tasks available from the download server decreased considerably today. Andy B gave the following as a reason for this decrease: "In case there is a query on the boards: I have been asked to pause the current workunits in the queue in and put out another batch of workunits, the scientists want this other batch of workunits computed before the current workunits in the queue, so you will shortly see a drop of the queue to 2200." |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The CPDN BOINC webpages are back up and trickles are being accepted again. Jonathan says the 403 access forbidden problem was caused by a failure to mount the project's NFS partitions after an unexpected reboot at 0300 UTC yesterday (Sunday 22 September): The servers running on our VM infrastructure seem to have rebooted at 4 am BST on 22 Sept. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The zips for hadcm3n (RAPID) models upload to a server external to Oxford Uni. This is currently out of space. The relevant people have been asked to urgently increase storage space. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Message from staff: Hi All, "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
©2025 cpdn.org