Message boards : Number crunching : Stopped sending trickles 5 days ago, but still running. Should I let it run or abort?
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
Hi everyone I have three tasks running since last week, They Stopped sending trickles 5 days ago, but are still running. Should I let them run or abort? Thanks Candido |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,013,957 RAC: 21,195 |
Hi everyone As there is no other work from CPDN at the moment I would be inclined to let them run. The science information is contained in the zip files that are uploaded at the same time as the trickle ups which are used to calculate credit. are the tasks attempting to send the zip files? If so it may well be that the server down under has a problem and they will go when it comes back up. If you look in the event log under the tools menu it will tell you if the files are failing to go. If so let us know and I will send a message to Andy who manages that side of things. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
My trickles are also going somewhere but after the ones on the first of this month, non are showing up. Well, if they are going somewhere then it is a problem on the CPDN site. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You need to be careful with labels. Trickles are very small files, that don't show up in the Transfers tab. The files that do are zips, and are lots of megabytes, some over a hundred megs each. These are the data files. Which don't show up anywhere you can see once they've been uploaded. ************ Then it depends on if the zips are being created. You can see both zips and trickle_up files in the data folder if they're being created, but not uploaded. Also, the seconds counters for Elapsed & Remaining (estimated) should be changing if the models are still running. One possibility for "jogging it free", is to Suspend each of the models in turn, then Suspend BOINC. After a few seconds to allow all open files to be closed, Exit from BOINC, and then restart BOINC. Then Unsuspend everything and see if they run. If you're also running tasks for other projects, then it gets tricky, and it's best not to do this. |
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
Hi Dave Thanks for your reply The three tasks are sending the zip files, just cheked. Last time they sent was on 8 Sept, 20:15. They started and finished sending with no problems on my side. They didnt send any yesterday because I suspended them until this morning to do Beta tasks for WCG. The log also shows this one: 08/09/2020 20:15:50 | climateprediction.net | Sending scheduler request: To send trickle-up message. Should there be a confirmation of sending the trickle? Earlier that day there were some zip files called "....restart.zip". IS that normal? Full log for 8 o'clock: 08/09/2020 07:40:34 | climateprediction.net | Started upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_restart.zip 08/09/2020 07:40:47 | climateprediction.net | Sending scheduler request: To send trickle-up message. 08/09/2020 07:40:47 | climateprediction.net | Requesting new tasks for CPU 08/09/2020 07:40:49 | climateprediction.net | Scheduler request completed: got 0 new tasks 08/09/2020 07:40:49 | climateprediction.net | Project has no tasks available 08/09/2020 07:40:49 | climateprediction.net | Project requested delay of 3636 seconds 08/09/2020 07:40:54 | climateprediction.net | Started upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_20.zip [...unrelated logs...] 08/09/2020 07:52:20 | climateprediction.net | Finished upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_20.zip 08/09/2020 07:52:20 | climateprediction.net | Started upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_restart.zip 08/09/2020 08:02:53 | climateprediction.net | Finished upload of wah2_anz50_306i_208912_32_872_012025222_2_r1861266388_restart.zip 08/09/2020 08:02:53 | climateprediction.net | Started upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_20.zip 08/09/2020 08:14:00 | climateprediction.net | Finished upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_20.zip 08/09/2020 08:14:00 | climateprediction.net | Started upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_restart.zip 08/09/2020 08:14:39 | climateprediction.net | Finished upload of wah2_anz50_2031_208912_32_871_012021947_2_r1714502974_restart.zip 08/09/2020 08:14:39 | climateprediction.net | Started upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_20.zip 08/09/2020 08:25:46 | climateprediction.net | Finished upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_20.zip 08/09/2020 08:31:12 | climateprediction.net | Finished upload of wah2_anz50_31ho_209412_32_872_012026920_2_r1674205441_restart.zip 08/09/2020 08:41:27 | climateprediction.net | Sending scheduler request: To send trickle-up message. 08/09/2020 08:41:27 | climateprediction.net | Requesting new tasks for CPU 08/09/2020 08:41:30 | climateprediction.net | Scheduler request completed: got 0 new tasks 08/09/2020 08:41:30 | climateprediction.net | Project has no tasks available 08/09/2020 08:41:30 | climateprediction.net | Project requested delay of 3636 seconds Thanks Candido |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The trickle_ups won't show on the models page if something is wrong at Oxford, but the server status page says that the server is running. Still, could be a script problem. The restart.zip is the one containing all the info needed to continue running the next section of the model if it's needed. This is usually created near the finish of the run, but some models last year did this early on, near the middle of the run. It all depends on what the researchers have in mind. **************** The confirmation of a Scheduler request is: Scheduler request completed It doesn't say what the request was for. ( It can also say that it's failed, as in this one from one of mine earlier in the year: Scheduler request failed: Timeout was reached ) |
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
Hi Les Thanks for your replies and info conveyed REgarding your earlier message The seconds count are running in elapsed and remaining, even after suspending as suggested and then restarting the computer I have 5 LHC virtual machines running at the same time and even those restarted without issues I know every six hours the machine sends the files so I can look for the the files. What folder should they appear in, the slots or the project folder? Thanks Candido |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
They'll be in: projects/climateprediction.net It's good that everything kept going. It's a nuisance that we can't see the zips to be sure they got there after they're sent, but they go to servers all over the world. The ANZ ones that you're working on go to a big data centre in Hobart, the capital of Australia's island state of Tasmania. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,013,957 RAC: 21,195 |
I think in the slots folder but the only tasks I am running at the moment are testing ones under Linux which only produce 4 zips over about 18 days so a bit to wait till I can check. But as the zips are going, it looks like it is just a script needs restarting at Oxford. |
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
Great, I will keep them running! Thanks everyone Candido |
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
I have just seen a Trickle file in the project data directory change the extension name from 'xml' to something like '...sent' I assume it is sending the trickles so it probably is the mentioned script not running properly in the server... Candido |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,013,957 RAC: 21,195 |
I have sent a message to Andy. If it can be sorted remotely, it should change quite quickly. If not it may be a few days or a week. I suspect the former. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,013,957 RAC: 21,195 |
Trickles should be showing again now. |
Send message Joined: 15 Nov 10 Posts: 43 Credit: 6,118,949 RAC: 0 |
Yes! All trickles are now showing! Many thanks! Candido |
Send message Joined: 17 Jan 05 Posts: 10 Credit: 23,525,643 RAC: 0 |
Hi! Looks like new trickles aren't showing again. BR Rayburner |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,013,957 RAC: 21,195 |
Hi!I will send another email. Based on last time I think it is a script Andy can start remotely. Not sure why it keeps stopping though. Edit: Done. |
Send message Joined: 17 Jan 05 Posts: 10 Credit: 23,525,643 RAC: 0 |
Hi!I will send another email. Based on last time I think it is a script Andy can start remotely. Not sure why it keeps stopping though. Trickles showing up again :-). Thank You Rayburner |
©2024 cpdn.org