Message boards : Number crunching : Bad BOINC, bad model or bad server?
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I have this HADSM slab running on this single-core AMD with BOINC 6.2.6. Yesterday at about midday UTC the model reached the end of phase 2 and at 66.666% progress the CPU time stopped rising which seemed normal while the zip file was being created. The file and associated trickle were uploaded and the graphs look normal. Several hours later I looked at the model again and saw that the CPU time was still stuck at the same figure and Progress still said 66.666%. In the graphics window the globe showed the climate was advancing, but no text was visible. Suspending and restarting the model and BOINC activity had no effect. CPU time & Progress remained stuck. So I exited then restarted BOINC. CPU time and Progress leapt ahead showing 3 or 4 extra hours of work done, the graphics displayed fully and Task Manager confirmed 100% CPU usage. The model immediately sent another trickle which is the one marked 17.56.21. So while the figures were stuck in BM the model had in fact been crunching. The model has continued crunching for 24 hours since then and there are BOINC Manager messages about 4 more trickle uploads, none of which appear on the model\'s web page. I\'m wondering whether I have a problem with my BOINC, a problem model, or there\'s a server problem displaying some computers\' trickles. I\'ll probably just have to wait and see. Cpdn news |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I\'m wondering whether I have a problem with my BOINC, a problem model, or there\'s a server problem for some trickle uploads. I\'ll probably just have to wait and see. Nope. I\'ve see this too sometimes. But I don\'t run Windows slab very often, so I kind of forgot about it. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I\'m glad someone else has seen this bizarre behaviour. Immediately after posting I saw that the model had just sent up another trickle. This seems to have spurred the server into action as 4 more trickles now suddenly show, all with earlier timestamps. Problem not really diagnosed but still solved which is good enough for me. Cpdn news |
Send message Joined: 9 Aug 04 Posts: 33 Credit: 168,775 RAC: 0 |
Sometime BOINC Mananger \'stops\' updating the messages tab, even though the messages are being created by BOINC. It\'s a bug. Live long and BOINC! |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Nice to see some SETI people also crunching here. Cpdn news |
Send message Joined: 10 May 08 Posts: 5 Credit: 263,778 RAC: 0 |
Not had any problems with CPRN, but often get lock-ups with SETI. I restart BM and all is fine. I\'ve not left it long enough to judge if it is still running though. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Eluk I\'ve looked at your computers and models. On the first computer on your list your climate models are crashing after just a couple of minutes, almost always with code 6 and the message \'can\'t allocate shared memory\'. Have a look at Mike\'s advice in the forum Mac section. Cpdn news |
Send message Joined: 10 May 08 Posts: 5 Credit: 263,778 RAC: 0 |
Hi Eluk Thanks for this info. I hadn\'t seen this. Looking at your link and attempting a solution. I did note that these were all before 13 July and I still have 4 tasks running. |
©2024 cpdn.org