climateprediction.net (CPDN) home page
Thread 'Miscellaneous problems'

Thread 'Miscellaneous problems'

Message boards : Number crunching : Miscellaneous problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
MossyRock
Avatar

Send message
Joined: 4 Oct 13
Posts: 27
Credit: 2,301,681
RAC: 7,632
Message 54112 - Posted: 16 May 2016, 13:27:32 UTC - in response to Message 54110.  

Les,

I followed your advice for the steps to take for a reboot/restart and my WAH2 tasks survived.

Thank you.
ID: 54112 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 54115 - Posted: 16 May 2016, 17:30:19 UTC

And, because of the huge number of data sets waiting to be downloaded at present, there's no need for a large queue.


And now there are more tasks in the queue than there are running! Does the project need some publicity to get some more crunchers?
ID: 54115 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 54116 - Posted: 16 May 2016, 18:20:37 UTC - in response to Message 54115.  

And now there are more tasks in the queue than there are running! Does the project need some publicity to get some more crunchers?

If I may take the opportunity to offer a suggestion, allowing more control in the areas of interest would induce some people to do more. At present, it is all-or-nothing for WAH2 (and WAH2-ri, whatever that is). It is normally good marketing to give the customer more choice, if it doesn't cost anything. And here, it should be just some digital bits in the account preferences that are required.
ID: 54116 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 54117 - Posted: 16 May 2016, 18:30:07 UTC

Jim1348 -

If you go to to Your Account, then clinateprediction.net preferences, then Edit, you can turn -ri (which stands for region independent) on or off separately from the rest of WAH2.
ID: 54117 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 54118 - Posted: 16 May 2016, 18:36:55 UTC - in response to Message 54117.  
Last modified: 16 May 2016, 18:50:24 UTC

WB8ILI,

Yes, I can do that. But you can't distinguish between Europe and Asia, and I think South/Central America is coming. It would be nice to have a choice, as is the case for the other projects. Or am I missing something? I am not sure of how the geographic areas are distributed between WAH2 and ri anyway.
ID: 54118 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54119 - Posted: 16 May 2016, 19:09:14 UTC

Jim

I think that it's to do with NOT giving people the chance to avoid certain batches.
And some of the data sets are small test batches, just before a big batch is released.

The new ones will be for the Mexico area.

ID: 54119 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 54121 - Posted: 16 May 2016, 19:46:30 UTC - in response to Message 54119.  

OK, that is a marketing decision. If avoiding certain batches means hard ones, those are the ones I like. If I could select the _1 and _2, I would be happy to do so. In fact, I would recommend setting up a "reliable PC" program, for sending those to machines that normally can do them.

If it is choice base on some other criteria they want to avoid, then they will have to judge on that. But reducing choices to the consumer could of course mean that the choice is "0".
ID: 54121 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 54122 - Posted: 16 May 2016, 20:18:23 UTC - in response to Message 54110.  

The full answer depends on several things, which need to be worked out by individuals.

Some of these, not necessarily needed here, are:
Suspend Network access (in the menu)
Suspend the project (in the Projects tab)
Suspend all pending models FIRST, then the running ones.
That's too much trouble for me. And if you suspend tasks, others will be downloaded once you restart Boinc.
I simply close Boinc first when I need to reboot; it doesn't cause any crashes.
ID: 54122 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54123 - Posted: 16 May 2016, 20:29:16 UTC - in response to Message 54122.  

One other option that I forgot is "No new tasks", which is in the Projects tab.
I always have mine set this way, so that I can get better control of downloads.
Which adds to all of the decisions about which lever to have in which position.

There are ways to handle this matter, but it does involve work, so it's best left to individuals.

ID: 54123 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54124 - Posted: 16 May 2016, 20:50:29 UTC - in response to Message 54121.  

Jim

It's sort of "You can have any banana/peach/pear that you like from this store, but you don't get to choose where we source our products."

It would be nice if BOINC could be set to choose "good" computers, and especially to block work from 64 bit Linux computers without the required 32 bit libraries. (And it's nearly always just one specific library that's missing.)

But probably the main thing to keep in mind, is that the researchers are climate physicists, NOT "BOINC people". The ones that are, are the handful from Oxford's OeRC who "work here" from time to time.
This project is "marketed" to the researchers something along the lines of "This is an easy way to get massive amounts of climate data without having to buy your own super computers."

And one of the reasons why it takes so long to get through the huge line of tasks, is those that I just mentioned: The 64 bit Linux people. Some of them have a lot of processors, and are "serial model killers". And they don't seem to notice anything wrong with what their computers are doing. And are DEFINITELY not interested in credits. :)
Which results in data sets being recycled for a while, and so appearing in the total available for longer than they should.


ID: 54124 · Report as offensive     Reply Quote
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 54125 - Posted: 16 May 2016, 21:13:14 UTC - in response to Message 54118.  

I understand WAH2 Region Independent. What is the other WAH2? Confused.

Thanks!
Regards,
Bob P.
ID: 54125 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54126 - Posted: 16 May 2016, 21:39:13 UTC - in response to Message 54125.  
Last modified: 16 May 2016, 21:39:45 UTC

I think that it's just a fixed region size, with a fixed number of run months.
Bound to be other differences, but it don't think there was ever any detailed descriptions of the 2 WaH types.
ID: 54126 · Report as offensive     Reply Quote
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 54133 - Posted: 17 May 2016, 16:28:55 UTC

My first compute error in a month. Posting in case it's relevant:

Workunit 10440044

Error was generated at or near completion. Stderr output includes:

"Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH"

BOINC client included some typical "output file absent" messages.
ID: 54133 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,809,647
RAC: 9,110
Message 54134 - Posted: 17 May 2016, 16:34:45 UTC - in response to Message 54124.  


It would be nice if BOINC could be set to choose "good" computers, and especially to block work from 64 bit Linux computers without the required 32 bit libraries. (And it's nearly always just one specific library that's missing.)


Hi Les,
(Sorry if this has been discussed) Is there any way CPDN project to send messages to LINUX BOINC crunchers via BOINC notices system - displayed in the manager - that they should check whether they have all the libraries and provide them with the link to the sticky here?

Cheers
ID: 54134 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 54135 - Posted: 17 May 2016, 16:58:30 UTC - in response to Message 54134.  

Hi Les,
(Sorry if this has been discussed) Is there any way CPDN project to send messages to LINUX BOINC crunchers via BOINC notices system - displayed in the manager - that they should check whether they have all the libraries and provide them with the link to the sticky here?

Cheers[/quote]

Boinc tends to avoid sending standardized mailings to large numbers of people because the last time they did that several ISP’s began blocking them as Spammers. It took weeks to fix.


ID: 54135 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 54136 - Posted: 17 May 2016, 17:53:53 UTC - in response to Message 54135.  

JIM,

Bernard is suggesting using the boinc messaging capability that posts a notice to boinc manager, and has nothing to do with e-mail. WCG uses it all the time for various things. I don't know how effective it would be to periodically send that message to, hopefully, linux only boinc manager applications, but it's an interesting idea.
ID: 54136 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54137 - Posted: 17 May 2016, 20:12:29 UTC - in response to Message 54133.  

Ritterm

REPLANCA is a problem with the data set.
One of the input files may have been set for 12 months instead of 13.

Not to worry; the researcher(s) will be keeping an eye on what happens to their tasks, and when they start seeing failures, will probably search for any unsent tasks and pull them, then replace them with correct datasets.

ID: 54137 · Report as offensive     Reply Quote
Chris

Send message
Joined: 9 Apr 12
Posts: 10
Credit: 2,700,404
RAC: 0
Message 54139 - Posted: 18 May 2016, 2:59:12 UTC

Is there something wrong with the validation? I have a bunch waiting for validation, which I don't remember seeing before.

[but then I rarely check the details here very often... and these shorter work units mean I have more than 1 done at a time...]
ID: 54139 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54140 - Posted: 18 May 2016, 4:10:17 UTC - in response to Message 54139.  

I think that's due to the changes of when the scripts run, which is now once per week.
See the 2nd last post in the News & Announcements thread.

ID: 54140 · Report as offensive     Reply Quote
ProfileVicki

Send message
Joined: 28 Nov 15
Posts: 50
Credit: 4,099,809
RAC: 0
Message 54152 - Posted: 21 May 2016, 10:16:11 UTC

Hi all.
I just found that 3 tasks crashed simultaneously. a screenshot of one error message is here https://onedrive.live.com/redir?resid=DA0E938BD05FB3F3!613&authkey=!AIoZBfFtmFq5FIk&v=3&ithint=photo%2cpng

The other 2 messages were the same except for the task names.

Any thoughts on what could make them fail together welcome. Windows 10 home is what it says os wise. drivers are up to date.
Look forward to reading your ideas,

Vicki


ID: 54152 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Miscellaneous problems

©2024 cpdn.org