Message boards : Number crunching : Project has no tasks available
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Interesting message in event log today [error] No start tag in scheduler reply I guess it makes a change from No tasks available. The machine in question is currently stocked up with WCG tasks for the next four or five days. Just interested in the meaning of the message. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Just interested in the meaning of the message. It means that the scheduler reply didn't contain a <scheduler_reply> tag. It's supposed to start with that, so the reply must have been corrupt or (more likely) empty. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 24 Nov 05 Posts: 2 Credit: 309,254 RAC: 0 |
Looks like the lack of WU's has ended. Now I am getting 3 or 4 at a time and they are processing faster than they did with my old 2 CPU configuration. Also more of them are completing without ABENDS despite the instability of Windows 8 which ABENDS 4 or 5 times a day lately, probably because of too old device drivers. I thought going to WIN 8 was going to make my system more stable, boy was I ever wrong! I've gone from a bit over 122,000 in credits to now having over 242,000 in credits in just a couple of months so it looks like I'm finally making progress even though I am processing tasks in SETI, LHC, Rosetta, World Community Grid processing a half dozen different sub-projects, Cosmology, Einstein, Milkyway and Lattice Project in addition to Climate Predict. They each are sending dozens of WU's at a time to process (except LHC which still doesn't send a lot of work and still I have been able to process as many credits in the past 2-3 months as I had been able to in previous SEVEN YEARS I have been processing CP projects. The processing speed of my 4 CPU configuration is really amazing despite the unreliable nature of WIN 8 that has been so shaky that I have had to cease processing anything for days at a time instead of 24/7 as previously. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
While it is true that a failed task is of some use to the project, a completed task is much more valuable - and more satisfying for volunteers as well. Despite a lot of effort by that machine, it has finished only three out of 31 tasks. There must be something seriously wrong if the success rate is only 10%. If the machine is crashing four or five times a day then it really isn't going to do well with tasks of CPDN's size and duration. It would be a very good idea to find out what the problem is and to resolve it. Quite apart from any concerns about distributed computing, no machine should crash that often: the machine itself looks very capable. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Yes, all those crashes mean there's a problem in that computer. I'd be very surprised if the instability is due solely or even mainly to Win 8 which was made for desktops and laptops just as much as for mobile devices. I've looked through the error codes and stderr reports of nearly all the crashed models on the first two pages of your computer's results. In many cases I've also looked at the pages for the workunits to see how other computers managed with the same models. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1261663 Could I make a few points and suggestions. * To see the stderr report of a model, go to a Task page then click on stderr+ to see the details. * Don't spend time trying trying to look at the Task pages of Hadam regional models as they often won't open up. We can get plenty info from looking at the Hadcm Task pages. * The models that crashed with exit code 25 will almost certainly have ended because your computer crashed. * A small number of the models that crashed with exit code 22 have 5 or 6 instances of INVALID THETA at the end of the report. This is almost always due to the the model itself producing impossible climate conditions, so that's the fault of the models, not your computer. * However, quite a few models with exit code 22 did not crash with impossible climate. I think the problem in these cases is probably instability of the computer. * Two or three models crashed with code 193. Jorden explains this error in his FAQs. This seems to point to a memory or RAM problem. Your computer has lots of RAM. Test it. Windows 7 has its own memory-testing/diagnostics program and I expect Win 8 has too. If it fails the test using all RAM modules together, rerun the test with each of the modules in turn. If you find a faulty module, download MEMTEST and use that to double-check. Are all the RAM modules the same type? A previous computer of mine ran beautifully with type A and beautifully with type B. But A + B produced crashes. Please let us know what's happening. Cpdn news |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Just to echo Mo's point, I have had two computers in the past that didn't like a mix of memory types that were each fine on their own and in yet another computer were completely happy together. Never did work out what was different about the computer that let them play together - playing with memory bus speed didn't seem to make any difference in fact the one that let them play was a duron processor that still let them play with a substantial overclock. |
Send message Joined: 30 Jan 12 Posts: 38 Credit: 10,197,388 RAC: 0 |
Well, 3600 new jobs and counting, European Region models, I hope they all work okay (I'm sure they will). Edit: Over 5000 now, life is good ATM. |
Send message Joined: 20 Mar 07 Posts: 1 Credit: 432,097 RAC: 0 |
April 29th and no work again? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
As has been said often, the work on this project isn't continuous. It's in batches of a few thousand, and then there's a wait until enough results are returned for the climate physicists to decide what they want to do next. And with about 30 thousand computers attached, not everyone will get work when it IS available. Also, as mentioned in the News posts, and also in several discussion threads, there are a few problems at present. Backups: Here |
Send message Joined: 27 Aug 04 Posts: 14 Credit: 763,720 RAC: 0 |
I had four tasks running on my four cores, until they all crashed... Bye bye CPDN. I came back last week to try it again, this time only with my Win7 laptop. The 4 WUs got to about 50% and the next thing I saw was that they had all crashed. What a waste of cpu-time and energy again. Again, bye bye. Don't think I will ever return if this is the level of quality you can offer. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... I came back last week to try it again, this time only with my Win7 laptop. The 4 WUs got to about 50% and the next thing I saw was that they had all crashed. What a waste of cpu-time and energy again. Again, bye bye. Don't think I will ever return if this is the level of quality you can offer. The tasks haven't reported back yet & hence we can't see why they crashed. If they all crashed simultaneously that usually implies something environmental (such as a power cut or windows shutting down before Boinc has exited). If it was a laptop, perhaps it tried to hibernate the tasks. Because they run for so long, CPDN tasks do require some TLC. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I have found that the tasks usually survive an accidental trip through hibernation. Last week there was a power cut during the middle of the night. Both of my laptops went into hibernation when their batteries were exhausted. (Batteries only last about 2 hours when the machines is being flogged as hard as Boinc does.) Automatic shutdown is set for 20% charge. When the machines were restarted all the models had survived. I know that using hibernation is not a good idea with Boinc, but, it is a lot better than just having the computer run until the battery is dead and then crash. That�s just about a guaranteed model killer. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Getting new work. There have been a lot of posts here about the fact that the project often has no work available. If you look at the �Server Status� page you will see that it reads �0� except for a few Hadam3�s that are almost certainly �zombies� that will fail in the download phase. Despite the fact that technically the project has no work, I have picked up 3 hadcm3n WU�s in the last few days. These WU�s are all reissues of WU�s that failed on other machines. They all end in _3 or _4. They have been around the block a couple of times already. In order to get these it is necessary that Boinc be running (duh). If you don�t have any work from CPDN run something. Run 24/7 (with an internet connection) if you can. The reissues are generated in very small numbers as they timeout and they are snapped up just as fast. The more you run the better the change. Running 24/7 the chances of catching one is greatly increased. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thanks for the suggestions, Jim. Some extra suggestions: * increase the work buffer to 10 days (though you have to make sure you don't get too much work from too many projects). * if you see that tasks are available and you're really keen to grab some you can temporarily suspend tasks from OTHER projects * if for any reason you suspend work from a project, BOINC prevents that project from fetching new tasks * if you can't get hold of new CPDN tasks, do consider joining other projects as well. Find them in the Tools menu of BOINC Manager. The projects listed there are all considered safe and reputable by the people in charge of BOINC at the Uni of California at Berkeley * check in the climateprediction.net preferences of your account that you've enabled all the model types you want. At the moment the model types are Hadcm3m, which is longish, and all regions of Hadam3p (Europe, Pacific North West, South Africa and we hope some Australia & NZ). If you want anything available just enable them all * if you're running BOINC tasks on a laptop make sure you're not letting it overheat by simultaneously running too many tasks for the machine's fans to cope with. Check temps by downloading (for example) Core Temp and also, if you're running GPU tasks for another project, GPU Temp. If you don't like the look of the temperatures, members here will advise you about what to do Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
It's also important to remember that CPDN has reduced the number of times a computer can request extra work from the server to once per hour. We cannot change this setting which was decided to limit the load on the server. The countdown to the next hourly attempt is shown in the Projects tab of BOINC Manager. Do not try to ask for work now by clicking the Update button as this will reset the time to 60 minutes. Patience rules! And of course to see what models are available go to the Server Status link in the blue menu to the left. Cpdn news |
Send message Joined: 18 Feb 11 Posts: 44 Credit: 9,975,761 RAC: 0 |
It's also important to remember that CPDN has reduced the number of times a computer can request extra work from the server to once per hour. We cannot change this setting which was decided to limit the load on the server. The countdown to the next hourly attempt is shown in the Projects tab of BOINC Manager. Do not try to ask for work now by clicking the Update button as this will reset the time to 60 minutes. Patience rules! I always wondered why a request automatically resulted in a one-hour delay. Now I'm enlightened! : ) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Too many computers, not enough models, too many people trying to grab large numbers of them. This way the work is shared a bit better. |
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
If a climate scientist working on computational models does not know how to use free computing capacity, I would fire him. But as it is, we�ll just wait for work� |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
The way I understand it, it is not free to the Scientists. They have to pay the people at CPDN to generate the models for them and manage the data collection. Only the running of the WU's by us is free. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Hi Mo, I'm running BOINC Manager 7.2.38. I do not see the countdown information you reference in the Project Tab...The fields are: Project, Account, Team, Work Done, Average Work Done, Resource Share, and Status (which is blank). Where is the countdown time displayed? Art Masson |
©2024 cpdn.org