climateprediction.net (CPDN) home page
Thread 'Bad BOINC, bad model or bad server?'

Thread 'Bad BOINC, bad model or bad server?'

Message boards : Number crunching : Bad BOINC, bad model or bad server?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34325 - Posted: 20 Jul 2008, 16:18:32 UTC
Last modified: 20 Jul 2008, 16:31:23 UTC

I have this HADSM slab running on this single-core AMD with BOINC 6.2.6.

Yesterday at about midday UTC the model reached the end of phase 2 and at 66.666% progress the CPU time stopped rising which seemed normal while the zip file was being created. The file and associated trickle were uploaded and the graphs look normal.

Several hours later I looked at the model again and saw that the CPU time was still stuck at the same figure and Progress still said 66.666%. In the graphics window the globe showed the climate was advancing, but no text was visible.

Suspending and restarting the model and BOINC activity had no effect. CPU time & Progress remained stuck.

So I exited then restarted BOINC. CPU time and Progress leapt ahead showing 3 or 4 extra hours of work done, the graphics displayed fully and Task Manager confirmed 100% CPU usage. The model immediately sent another trickle which is the one marked 17.56.21. So while the figures were stuck in BM the model had in fact been crunching.

The model has continued crunching for 24 hours since then and there are BOINC Manager messages about 4 more trickle uploads, none of which appear on the model\'s web page.

I\'m wondering whether I have a problem with my BOINC, a problem model, or there\'s a server problem displaying some computers\' trickles. I\'ll probably just have to wait and see.
Cpdn news
ID: 34325 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 34326 - Posted: 20 Jul 2008, 16:42:41 UTC - in response to Message 34325.  

I\'m wondering whether I have a problem with my BOINC, a problem model, or there\'s a server problem for some trickle uploads. I\'ll probably just have to wait and see.

Nope. I\'ve see this too sometimes. But I don\'t run Windows slab very often, so I kind of forgot about it.
ID: 34326 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34327 - Posted: 20 Jul 2008, 16:49:05 UTC

I\'m glad someone else has seen this bizarre behaviour.

Immediately after posting I saw that the model had just sent up another trickle. This seems to have spurred the server into action as 4 more trickles now suddenly show, all with earlier timestamps.

Problem not really diagnosed but still solved which is good enough for me.
Cpdn news
ID: 34327 · Report as offensive     Reply Quote
Profileold_user633

Send message
Joined: 9 Aug 04
Posts: 33
Credit: 168,775
RAC: 0
Message 34329 - Posted: 20 Jul 2008, 20:18:27 UTC
Last modified: 20 Jul 2008, 20:18:58 UTC

Sometime BOINC Mananger \'stops\' updating the messages tab, even though the messages are being created by BOINC. It\'s a bug.

Live long and BOINC!
ID: 34329 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34331 - Posted: 20 Jul 2008, 20:40:35 UTC

Nice to see some SETI people also crunching here.
Cpdn news
ID: 34331 · Report as offensive     Reply Quote
old_user516912

Send message
Joined: 10 May 08
Posts: 5
Credit: 263,778
RAC: 0
Message 34346 - Posted: 22 Jul 2008, 20:32:57 UTC

Not had any problems with CPRN, but often get lock-ups with SETI. I restart BM and all is fine. I\'ve not left it long enough to judge if it is still running though.
ID: 34346 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34352 - Posted: 23 Jul 2008, 9:03:05 UTC
Last modified: 23 Jul 2008, 9:07:01 UTC

Hi Eluk

I\'ve looked at your computers and models. On the first computer on your list your climate models are crashing after just a couple of minutes, almost always with code 6 and the message \'can\'t allocate shared memory\'.

Have a look at Mike\'s advice in the forum Mac section.
Cpdn news
ID: 34352 · Report as offensive     Reply Quote
old_user516912

Send message
Joined: 10 May 08
Posts: 5
Credit: 263,778
RAC: 0
Message 34396 - Posted: 25 Jul 2008, 18:03:51 UTC - in response to Message 34352.  

Hi Eluk

I\'ve looked at your computers and models. On the first computer on your list your climate models are crashing after just a couple of minutes, almost always with code 6 and the message \'can\'t allocate shared memory\'.

Have a look at Mike\'s advice in the forum Mac section.


Thanks for this info. I hadn\'t seen this. Looking at your link and attempting a solution.

I did note that these were all before 13 July and I still have 4 tasks running.
ID: 34396 · Report as offensive     Reply Quote

Message boards : Number crunching : Bad BOINC, bad model or bad server?

©2024 cpdn.org