climateprediction.net (CPDN) home page
Thread 'Stopped sending trickles 5 days ago, but still running. Should I let it run or abort?'

Thread 'Stopped sending trickles 5 days ago, but still running. Should I let it run or abort?'

Message boards : Number crunching : Stopped sending trickles 5 days ago, but still running. Should I let it run or abort?
Message board moderation

To post messages, you must log in.

AuthorMessage
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62702 - Posted: 10 Sep 2020, 10:04:47 UTC

Hi everyone
I have three tasks running since last week,
They Stopped sending trickles 5 days ago, but are still running. Should I let them run or abort?
Thanks
Candido

ID: 62702 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 62703 - Posted: 10 Sep 2020, 11:12:09 UTC - in response to Message 62702.  

Hi everyone
I have three tasks running since last week,
They Stopped sending trickles 5 days ago, but are still running. Should I let them run or abort?
Thanks
Candido


As there is no other work from CPDN at the moment I would be inclined to let them run. The science information is contained in the zip files that are uploaded at the same time as the trickle ups which are used to calculate credit. are the tasks attempting to send the zip files? If so it may well be that the server down under has a problem and they will go when it comes back up.
If you look in the event log under the tools menu it will tell you if the files are failing to go. If so let us know and I will send a message to Andy who manages that side of things.
ID: 62703 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 62704 - Posted: 10 Sep 2020, 11:13:03 UTC

My trickles are also going somewhere but after the ones on the first of this month, non are showing up. Well, if they are going somewhere then it is a problem on the CPDN site.
ID: 62704 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62705 - Posted: 10 Sep 2020, 11:27:08 UTC

You need to be careful with labels.

Trickles are very small files, that don't show up in the Transfers tab.
The files that do are zips, and are lots of megabytes, some over a hundred megs each.
These are the data files. Which don't show up anywhere you can see once they've been uploaded.

************

Then it depends on if the zips are being created.
You can see both zips and trickle_up files in the data folder if they're being created, but not uploaded.
Also, the seconds counters for Elapsed & Remaining (estimated) should be changing if the models are still running.

One possibility for "jogging it free", is to Suspend each of the models in turn, then Suspend BOINC.
After a few seconds to allow all open files to be closed, Exit from BOINC, and then restart BOINC.
Then Unsuspend everything and see if they run.

If you're also running tasks for other projects, then it gets tricky, and it's best not to do this.
ID: 62705 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62706 - Posted: 10 Sep 2020, 11:37:42 UTC - in response to Message 62703.  

Hi Dave
Thanks for your reply
The three tasks are sending the zip files, just cheked.
Last time they sent was on 8 Sept, 20:15. They started and finished sending with no problems on my side.
They didnt send any yesterday because I suspended them until this morning to do Beta tasks for WCG.

The log also shows this one: 08/09/2020 20:15:50 | climateprediction.net | Sending scheduler request: To send trickle-up message.
Should there be a confirmation of sending the trickle?

Earlier that day there were some zip files called "....restart.zip". IS that normal?
Full log for 8 o'clock:

08/09/2020 07:40:34 | climateprediction.net | Started upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_restart.zip
08/09/2020 07:40:47 | climateprediction.net | Sending scheduler request: To send trickle-up message.
08/09/2020 07:40:47 | climateprediction.net | Requesting new tasks for CPU
08/09/2020 07:40:49 | climateprediction.net | Scheduler request completed: got 0 new tasks
08/09/2020 07:40:49 | climateprediction.net | Project has no tasks available
08/09/2020 07:40:49 | climateprediction.net | Project requested delay of 3636 seconds
08/09/2020 07:40:54 | climateprediction.net | Started upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_20.zip
[...unrelated logs...]
08/09/2020 07:52:20 | climateprediction.net | Finished upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_20.zip
08/09/2020 07:52:20 | climateprediction.net | Started upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_restart.zip
08/09/2020 08:02:53 | climateprediction.net | Finished upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_restart.zip
08/09/2020 08:02:53 | climateprediction.net | Started upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_20.zip
08/09/2020 08:14:00 | climateprediction.net | Finished upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_20.zip
08/09/2020 08:14:00 | climateprediction.net | Started upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_restart.zip
08/09/2020 08:14:39 | climateprediction.net | Finished upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_restart.zip
08/09/2020 08:14:39 | climateprediction.net | Started upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_20.zip
08/09/2020 08:25:46 | climateprediction.net | Finished upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_20.zip
08/09/2020 08:31:12 | climateprediction.net | Finished upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_restart.zip
08/09/2020 08:41:27 | climateprediction.net | Sending scheduler request: To send trickle-up message.
08/09/2020 08:41:27 | climateprediction.net | Requesting new tasks for CPU
08/09/2020 08:41:30 | climateprediction.net | Scheduler request completed: got 0 new tasks
08/09/2020 08:41:30 | climateprediction.net | Project has no tasks available
08/09/2020 08:41:30 | climateprediction.net | Project requested delay of 3636 seconds

Thanks
Candido
ID: 62706 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62707 - Posted: 10 Sep 2020, 12:02:35 UTC - in response to Message 62706.  
Last modified: 10 Sep 2020, 12:06:33 UTC

The trickle_ups won't show on the models page if something is wrong at Oxford, but the server status page says that the server is running.
Still, could be a script problem.

The restart.zip is the one containing all the info needed to continue running the next section of the model if it's needed.
This is usually created near the finish of the run, but some models last year did this early on, near the middle of the run.
It all depends on what the researchers have in mind.

****************

The confirmation of a Scheduler request is: Scheduler request completed
It doesn't say what the request was for.

( It can also say that it's failed, as in this one from one of mine earlier in the year: Scheduler request failed: Timeout was reached )
ID: 62707 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62708 - Posted: 10 Sep 2020, 12:37:33 UTC - in response to Message 62707.  
Last modified: 10 Sep 2020, 12:38:30 UTC

Hi Les
Thanks for your replies and info conveyed
REgarding your earlier message
The seconds count are running in elapsed and remaining, even after suspending as suggested and then restarting the computer
I have 5 LHC virtual machines running at the same time and even those restarted without issues

I know every six hours the machine sends the files so I can look for the the files.
What folder should they appear in, the slots or the project folder?
Thanks
Candido

ID: 62708 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62709 - Posted: 10 Sep 2020, 12:48:50 UTC - in response to Message 62708.  

They'll be in: projects/climateprediction.net

It's good that everything kept going.
It's a nuisance that we can't see the zips to be sure they got there after they're sent, but they go to servers all over the world.
The ANZ ones that you're working on go to a big data centre in Hobart, the capital of Australia's island state of Tasmania.
ID: 62709 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 62710 - Posted: 10 Sep 2020, 12:51:41 UTC

I think in the slots folder but the only tasks I am running at the moment are testing ones under Linux which only produce 4 zips over about 18 days so a bit to wait till I can check. But as the zips are going, it looks like it is just a script needs restarting at Oxford.
ID: 62710 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62713 - Posted: 10 Sep 2020, 14:17:59 UTC

Great, I will keep them running!
Thanks everyone
Candido
ID: 62713 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62715 - Posted: 10 Sep 2020, 16:43:07 UTC - in response to Message 62709.  
Last modified: 10 Sep 2020, 16:43:38 UTC

I have just seen a Trickle file in the project data directory change the extension name from 'xml' to something like '...sent'
I assume it is sending the trickles so it probably is the mentioned script not running properly in the server...
Candido
ID: 62715 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 62716 - Posted: 10 Sep 2020, 18:00:28 UTC - in response to Message 62715.  

I have sent a message to Andy. If it can be sorted remotely, it should change quite quickly. If not it may be a few days or a week. I suspect the former.
ID: 62716 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 62718 - Posted: 11 Sep 2020, 10:09:12 UTC - in response to Message 62716.  

Trickles should be showing again now.
ID: 62718 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 62722 - Posted: 12 Sep 2020, 22:12:21 UTC - in response to Message 62718.  

Yes! All trickles are now showing!
Many thanks!
Candido

ID: 62722 · Report as offensive     Reply Quote
Rayburner

Send message
Joined: 17 Jan 05
Posts: 10
Credit: 23,525,643
RAC: 0
Message 62733 - Posted: 21 Sep 2020, 16:26:11 UTC

Hi!

Looks like new trickles aren't showing again.

BR
Rayburner
ID: 62733 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 62734 - Posted: 21 Sep 2020, 18:28:39 UTC - in response to Message 62733.  
Last modified: 21 Sep 2020, 18:31:16 UTC

Hi!

Looks like new trickles aren't showing again.

BR
Rayburner
I will send another email. Based on last time I think it is a script Andy can start remotely. Not sure why it keeps stopping though.

Edit: Done.
ID: 62734 · Report as offensive     Reply Quote
Rayburner

Send message
Joined: 17 Jan 05
Posts: 10
Credit: 23,525,643
RAC: 0
Message 62735 - Posted: 22 Sep 2020, 10:28:05 UTC - in response to Message 62734.  

Hi!

Looks like new trickles aren't showing again.

BR
Rayburner
I will send another email. Based on last time I think it is a script Andy can start remotely. Not sure why it keeps stopping though.

Edit: Done.


Trickles showing up again :-).

Thank You

Rayburner
ID: 62735 · Report as offensive     Reply Quote

Message boards : Number crunching : Stopped sending trickles 5 days ago, but still running. Should I let it run or abort?

©2024 cpdn.org