climateprediction.net (CPDN) home page
Thread 'Miscellaneous problems'

Thread 'Miscellaneous problems'

Message boards : Number crunching : Miscellaneous problems
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50729 - Posted: 3 Nov 2014, 23:00:29 UTC
Last modified: 27 Apr 2016, 22:00:57 UTC

Reusing an old post of mine, with a new title.

Use this for any general problems not about credits or uploads.

Hopefully the "can't create a new thread" problem will be solved before too long.
ID: 50729 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54025 - Posted: 27 Apr 2016, 22:04:43 UTC

Project's front page is down.
Reported.
Time for stronger string to hold everything together. :)

ID: 54025 · Report as offensive     Reply Quote
ProfileKWSN THE Holy Hand Grenade!

Send message
Joined: 9 Apr 07
Posts: 7
Credit: 1,630,807
RAC: 0
Message 54039 - Posted: 5 May 2016, 2:26:34 UTC

1) Reporting finished WU's takes about 20-30 seconds - usaed to take 5 secs max...

2) Report of a finished WU does not erase the WU from your computer...
ID: 54039 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 54044 - Posted: 5 May 2016, 13:35:45 UTC - in response to Message 54039.  

Because I only see reporting when I have had network activity suspended I haven't notice 1. yet. Haven't noticed 2 recently either though it always happened with the, "Short models" on Linux. Currently the uploading problems seem to be preventing a task from reporting or perhaps it is just it won't report as finished till uploads have all gone?
ID: 54044 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 54046 - Posted: 5 May 2016, 14:55:40 UTC

2) Report of a finished WU does not erase the WU from your computer...


Could you give the batch nos of the ones that are leaving the folders behind. I am assuming you mean the folders are left rather than the task staying on the list in BOINC after being reported. - I have experienced neither problem recently so I assume it is just some batches. (will check all my machines when they next complete tasks in case I have to get into hat eating.)
ID: 54046 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 54047 - Posted: 5 May 2016, 17:44:23 UTC

Yes, some task are not cleaning up after themselves when they finish. Task wah2_eu25_m304_202312_12_316_010289235 has finished and reported. All zip file and trickles have uploaded. It is gone form the list in boinc mange, but, it is still present in the “Projects” folder.
ID: 54047 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 54048 - Posted: 5 May 2016, 17:53:00 UTC - in response to Message 54047.  

Just checked all my machines and non batch 316's on any of them. I will wait a day to see in any more batch numbers are reported as affected and let Andy know. Of course the real problem is the people who don't check the fora and find themselves running out of disk space. Those of us in the know can just delete the folders.
ID: 54048 · Report as offensive     Reply Quote
ProfileVicki

Send message
Joined: 28 Nov 15
Posts: 50
Credit: 4,099,809
RAC: 0
Message 54079 - Posted: 12 May 2016, 20:19:54 UTC

Hi all
last night 3 tasks crashed simultaneously displaying some sort of Visual Fortran error during an AVG upgrade. Somewhat strange as the tasks are on a SD CARD (Known as Drive H on my Desktop) and that drive is on AVG's Exception list.
I can't recall the task names this morning, but know that 1 ended with _0, the others ended in _1.
Is this a known bug? should I be using a different antivirus product?

Any thoughts welcome.

Enjoy your day
'
Vicki
ID: 54079 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 54080 - Posted: 12 May 2016, 23:02:45 UTC - in response to Message 54079.  

should I be using a different antivirus product?

I avoid anti-viruses as much as possible, which is easy since all my CPDN work is on dedicated machines that are not exposed to the usual modes of transmission. But on my main PC, I have tried various AVs with minimal success. They are always monitoring something in the system, even with the "exceptions". At present, I am just using Microsoft Security Essentials, which generally does not cause problems (Win7 64-bit).

I am of the opinion that if you need a very aggressive AV, then you are doing something wrong and should change your habits anyway.
ID: 54080 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54081 - Posted: 12 May 2016, 23:31:51 UTC

I am of the opinion that if you need a very aggressive AV, then you are doing something wrong and should change your habits anyway.


Or just doesn't know the level of protection needed, what each one really does, etc.

Vicki
Generally it's a good idea to Suspend everything one by one, and then Exit from BOINC before doing ANY upgrades.
The Visual Fortran error is because the main climate program is written in Visual Fortran.

Years ago I used AVG, until an upgrade started deleting models, in spite of all the excluding I tried. So within a few hours of downloading it, I got rid of it.

I did a lot of reading and thinking, before deciding to use Microsoft Security Essentials, based mostly on a feeling that I was being careful anyway, and having other programs for checking everything.
It might not have been the best, but it did do one thing very well: stop Windows from complaining about no AV.

Towards the end, I even found out that I could keep changing the day of the week when it ran, so that it NEVER ran, and didn't complain about not having been run. :)

ID: 54081 · Report as offensive     Reply Quote
MossyRock
Avatar

Send message
Joined: 4 Oct 13
Posts: 27
Credit: 2,301,681
RAC: 7,632
Message 54096 - Posted: 14 May 2016, 16:19:35 UTC

I had 2 WAH2 tasks fail back-to-back after I restarted Boinc Manager after a machine reboot for Windows updates.

I always perform a normal, routine, controlled Boinc Manager shutdown before a reboot: File > Exit Boinc > Stop running tasks when exiting the BOINC Manager (checked).

The failed tasks are WUs 10371899 and 10351844.

Outcome: Client error
Client State: Compute error
Validate State: Invalid.

Here is what is in the Boinc Manager Event Log:

5/11/2016 7:40:32 AM | climateprediction.net | Message from task: 0
5/11/2016 7:40:32 AM | climateprediction.net | Computation for task wah2_eu25_d756_193612_13_366_010351844_0 finished
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_3.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_4.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_5.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_6.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_7.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_8.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_9.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_10.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_11.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_12.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_13.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:40:32 AM | climateprediction.net | Output file wah2_eu25_d756_193612_13_366_010351844_0_14.zip for task wah2_eu25_d756_193612_13_366_010351844_0 absent
5/11/2016 7:41:02 AM | climateprediction.net | Message from task: 0
5/11/2016 7:41:02 AM | climateprediction.net | Computation for task wah2_eu25_i76p_198612_13_366_010371899_1 finished
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_2.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_3.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_4.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_5.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_6.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_7.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_8.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_9.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_10.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_11.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_12.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_13.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 7:41:02 AM | climateprediction.net | Output file wah2_eu25_i76p_198612_13_366_010371899_1_14.zip for task wah2_eu25_i76p_198612_13_366_010371899_1 absent
5/11/2016 8:40:49 AM | climateprediction.net | Sending scheduler request: To report completed tasks.
5/11/2016 8:40:49 AM | climateprediction.net | Reporting 2 completed tasks

That's a lot of processing time down the drain.

Any ideas why this happened?

Thanks.
ID: 54096 · Report as offensive     Reply Quote
MossyRock
Avatar

Send message
Joined: 4 Oct 13
Posts: 27
Credit: 2,301,681
RAC: 7,632
Message 54097 - Posted: 14 May 2016, 16:24:11 UTC - in response to Message 54096.  

Ah, I just read Les' recommendation to suspend everything before exiting Boinc Manager.

Would doing this prevent the crashes I experenced?

Thanks.
ID: 54097 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,901,585
RAC: 2,106
Message 54103 - Posted: 14 May 2016, 22:39:02 UTC - in response to Message 54097.  
Last modified: 14 May 2016, 22:40:17 UTC

Ah, I just read Les' recommendation to suspend everything before exiting Boinc Manager.

Would doing this prevent the crashes I experenced?

Thanks.

Not necessarily. The same thing just happened to a suspended WAH2 model on one of my Windows 10 machines after one of these forced updates: they are an absolute curse. "3 AM seems a good time to reboot", says the idiotic Windows Update. Actually, that's for me to decide IMHO.
ID: 54103 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 54104 - Posted: 14 May 2016, 23:33:37 UTC - in response to Message 54103.  

Iain,

Can't you choose to have Windows notify you when an update requires a reboot? Then schedule it in the future some time, but reboot before that time when you have a chance to shutdown boinc cleanly? I haven't any system on Windows 10 yet, but likely will in the near future.
ID: 54104 · Report as offensive     Reply Quote
MossyRock
Avatar

Send message
Joined: 4 Oct 13
Posts: 27
Credit: 2,301,681
RAC: 7,632
Message 54105 - Posted: 15 May 2016, 0:20:23 UTC - in response to Message 54103.  

Iain,

Thanks for your response.

Was BOINC Manager up and running at the time of the reboot, even though the WAH2 model was suspended?

There's a setting in Win 10 tell it NOT to reboot after updates until you explicitly give it the ok to do so - I have it set on my Win 10 machine that doesn't do any BOINC, and it has never rebooted on its own, at least, not yet.

Have you set this option on yours? If you have, are you saying that it went ahead and rebooted on its own without your input?
ID: 54105 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,901,585
RAC: 2,106
Message 54107 - Posted: 15 May 2016, 23:17:49 UTC - in response to Message 54104.  

[geophi wrote:]Can't you choose to have Windows notify you when an update requires a reboot? Then schedule it in the future some time, but reboot before that time when you have a chance to shutdown boinc cleanly? I haven't any system on Windows 10 yet, but likely will in the near future.

That's pretty much what I do. However, the recent WAH2 casualty on restart makes me rather resent the required intervention: all WAH2 tasks will now have to be completed before manually rebooting. I'll do nothing next patch Tuesday and look at the event log to see what if anything happens automatically and remind myself of the time periods offered. Perhaps I've interpreted the language as being stricter than it is ...
ID: 54107 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,901,585
RAC: 2,106
Message 54108 - Posted: 15 May 2016, 23:24:38 UTC - in response to Message 54105.  

[MossyRock wrote:]Was BOINC Manager up and running at the time of the reboot, even though the WAH2 model was suspended?

No. I suspend the models individually (i.e. I don't suspend the project) then close the BOINC client and exit BOINC Manager before a reboot. This is because starting two WAH2 models at the same time can crash one of them (i.e. the model fails on my machine but the reissue has sometimes succeeded on a similar machine, arguing that the model was not an inevitable failure), so on restarting I start each WAH2 then wait a few minutes then start the next etc. This has worked up until recently.
ID: 54108 · Report as offensive     Reply Quote
MossyRock
Avatar

Send message
Joined: 4 Oct 13
Posts: 27
Credit: 2,301,681
RAC: 7,632
Message 54109 - Posted: 16 May 2016, 0:47:25 UTC - in response to Message 54108.  
Last modified: 16 May 2016, 0:48:15 UTC

Suspending work units individually causes new work units, that are ready to start, to begin running to "fill in the hole."

This can cause quite a mess, especially if there are new CPDN models in your queue that are ready to start. You will end up with the ones that were running originally, now suspended, plus the new ones that start that you have to suspend also.

Is there a way to prevent new work units from starting as you go about suspending work units individually?
ID: 54109 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 54110 - Posted: 16 May 2016, 3:46:11 UTC - in response to Message 54109.  

Individually suspend work units is only the short form of the answer, to jog peoples memory.

The full answer depends on several things, which need to be worked out by individuals.

Some of these, not necessarily needed here, are:
Suspend Network access (in the menu)
Suspend the project (in the Projects tab)
Suspend all pending models FIRST, then the running ones.

And, because of the huge number of data sets waiting to be downloaded at present, there's no need for a large queue.

I wait until my current models finish before downloading more.
This lets me see what's available, which may be newer than what would have been downloaded way back, and some of them may be of more interest, for lots of reasons.

ID: 54110 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 54111 - Posted: 16 May 2016, 7:12:42 UTC
Last modified: 16 May 2016, 7:14:53 UTC

[Ian Inglis Wrote] so on restarting I start each WAH2 then wait a few minutes then start the next etc


Interesting, I will try this next time I reboot one of my two machines running tasks directly under Linux as opposed to the other two that are pretending to be windows machines. I have had some tasks fail when I have restarted two tasks of the same type within seconds of each other on both of these machines.

At least Linux just gives me a message telling me that I need to reboot to use my updated software.
ID: 54111 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Miscellaneous problems

©2024 cpdn.org