Message boards : Number crunching : TRICKLE CANNOT UPLOAD, BUT, SERVER SHOWS IT ALREADY HAS
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I have a rather strange problem. WU hadam3P_eu_xuv6_1977_1_007045210_2 (Task ID#12462018) has had a trickle stuck in my transfer tab for 2 days now. The problem is that I cannot get this trickle to upload. I think it is the first trickle. I know that other trickles for this WU have uploaded since it was created 2 days ago. At first I put this down to all of the server problems we have been having, but, now I don’t think so. The WU is presently crunching its way through late march of 1978. It should have produced 3 trickles by now. Checking the WU under “computers†in “my account†I find that 3 trickles are listed as having been received. Does the server really have the entire trickle? If so how do I get rid of the pseudo trickle stuck in the transfer tab that keeps trying (and failing) to upload? Should I use the "abort transfer" button to get rid of it? Is this a sign that the WU is flawed and should be aborted or can it be saved? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It may simply be due to the change of IP of all of our servers that are on the OERC network, as per the News thread. Until someone shows up who can fix the problem, a lot of trickles and zips for various file types will be stuck in peoples Transfer queues. As for your trickle, is it essential that you remove it from your queue? Backups: Here |
Send message Joined: 21 Jun 06 Posts: 26 Credit: 8,397,236 RAC: 0 |
Hello, I have the same problem only not with 1 trickle but with 73 on one machine alone! if there is an IP problem somebody needs to let us know what to do about it as me sitting here filling up my hard drive is no solution best regards Ian ----> Please Join team Scotland HERE |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Thanks for the reply, Les. So you think that the failure to upload of the zip files (there are now 2, 1.zip and 4.zip stuck in the transfer tab) is server related and not an inherent problem in the WU. No real need to clear the transfer tab immediately. I will just suspend network activity to keep it from trying to upload every few hours. I will keep running it and hope that the gods of IT get the server problem sorted out before the WU finishes in about a week. The one thing that I don’t get is why does the “Trickle Information for Result # 12462018†page show the zip files as having been uploaded? Does the server have them or not??? |
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
Trickles and zip files are two different things. A trickle is just a small piece of data that lets the server know the model is still alive. A zip file is several megabytes of scientific data the model has generated. They usually go to different servers so you may be able to trickle but not upload zips. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Jim If the IP change occurred after your computer uploaded the trickle data, but before the server could return an Ack message, then this would produce the effect that you see. Backups: Here |
Send message Joined: 21 Jun 06 Posts: 26 Credit: 8,397,236 RAC: 0 |
hello, I realize that the trickles and zip files are differant i.e. being 5.24 meg in size for instance i now have 76 zip files equaling nearly 400 meg of zip files stored up so far and increasing steadily something has to be done to allow these zip files to be uploaded as i dont plan on having an upload on my broadband connection of a gig or something rediculous in one go ! best regards Ian ----> Please Join team Scotland HERE |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ian There are no programmers working for cpdn at present, and apparently no one else around at Oxford Uni who knows how to fix the problem. The people who are running the current projects know about it. One was in the process of moving data off Kraken to another server when the change occurred. This is why Kraken is still showing as off line. As for what users can do, the only solution is to go into the Projects tab, set cpdn to No new tasks, then go into the Tasks tab, and Suspend all cpdn models. The server for the beta test site is also affected. This means that testers can't return data, meaning in turn, that the release of new models is going to be delayed even further. Backups: Here |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
I suspect that they know how to fix it, but as the IP addresses have been switched earlier than I expected it means that various machines are now not connected to the network, and the only way to fix them now is to go into the OeRC machine room. No current CPDN staff have access to this room, though the new staff will. Meanwhile, I'll have to do it next week. |
Send message Joined: 5 May 10 Posts: 69 Credit: 1,169,103 RAC: 2,258 |
i now have 76 zip files equaling nearly 400 meg of zip files stored up so far and increasing steadily Just think of the hit the server's going to take when it's reconnected! :) As I write, I have twenty .zips from two current FAMOUS tasks waiting for upload. (A trifle under 105 MB.) If the tasks error before the files are uploaded, the files will simply be deleted (by the BOINC client, I believe). That happened when the server was down over Christmas too. Presumably the data were lost. NG |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
You are an angel Milo :-) |
Send message Joined: 21 Jun 06 Posts: 26 Credit: 8,397,236 RAC: 0 |
hi, I now have 206 zip files between my 2 machines equalling more than a gig of files to be uploaded so i suspect the servers even when back will be under a very severe load if not too high a load when acceping all this data Ian ----> Please Join team Scotland HERE |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Nigel There's a very simple solution to your worry: BACKUPS! Make one now while all of the models are still running, and use this to get all of the zips back to Oxford if they crash near the end. Make a new one every day. Having lots of backups isn't a crime! It's also possible to do as I mentioned earlier, and Suspend all of the models until the servers are back. The first step of course, is to go to the Projects tab, and set ALL projects to No new tasks, run down ALL work other than cpdn, and THEN make the backups. Then get more work from your other projects if desired. Backups: Here |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
As I don't run my computer 24/7 and only have 2 cores I don't have to worry about running out of space unless the zip files go for months without being able to upload. Thanks again Milo for keeping things working in your own/your new job's time when you could have abandoned the project. Dave |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
You're welcome. Kraken is now fixed but, as expected, heavily overloaded. So, don't be surprised if it appears to be down when you try to connect. Some people are getting through as files are arriving. |
Send message Joined: 5 May 10 Posts: 69 Credit: 1,169,103 RAC: 2,258 |
Thanks, Les. I suspended Climateprediction.net last night rather than do anything which might interfere with my other projects. The files are uploading now, in no particular order, as I write. Only three and two halves to go now. I'll unsuspend when they're all gone. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I have a question about the Hadam3p_pnw Wu’s and the present server problems. I know that they upload 12 monthly zip files and that they go to servers at the University of Oregon. They have uploaded just fine. It is the 13 zip file that I am worried about. I know that it goes to Oxford, but, I am not sure what server it to goes to. Does it go the OeRC server that is presently offline? Should I suspend before it finishes? I don’t need any fore zip files stuck in the transfer tab. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I haven't got any PNW models myself so I can't check to make sure, but on the Server status page it says: Upload server (restart dumps) climateapps1.oucs File _13 for all the regional models is a restart dump. The _13 file for EU models goes to climateapps1.oucs. The _13 file for SA and PNW models must either go to the same server as files 1 - 12 or, much more probably, to climateapps1. I don't think the people in Oregon or Penn State Uni will want restart dumps to put together all the models in the time series, so climateapps1 is more likely. I don't think they'd want to use uploader.oerc (the server that's still down) for any of the restart dumps because it's being used as an ordinary upload server. So I'd let the _13 PNW file upload. Cpdn news |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Thanks, Mo. I will let it run to the end and hope that it can upload. |
Send message Joined: 9 Apr 07 Posts: 7 Credit: 1,630,807 RAC: 0 |
I'm having the same problem, but with a kicker... some of my .zip files are getting through, but others are not! for example on WU Hadam3p_eu_xml2_1997_1_007010478_1, the zips 1, 4, 7 & 10 are stuck, but 12 and 13 (that I've noticed...) have uploaded... (13 is uploading as I type...) |
©2024 cpdn.org