Message boards : Number crunching : Upload failures
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 19 · Next
Author | Message |
---|---|
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
ok thanks! I have 75 files to upload (around 5GB of data). Good thing the timelimit is long then ;) I had some problems with bad proxy software also that seemed to transfer the files even if the server would not accept it or something like that. I have changed proxy software now. It does not start to transfer anything and ends after a few minutes with "transient HTTP error" so it seems to be ok then :) The old proxy software worked fine when transfers were working fine but seems now when project has issues the problem with the old proxy shows :) Problem is that i cannot leave the proxy software running for too long since its not allowed on the network. But i will have to start it once in a while and see if it transfers or not then :) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Has anyone been able to upload anything to Jasmin recently? ANZ stands for Australia/New Zealand. These files go to a data center in the south of Tasmania, which is an island state in the SE corner of Australia. To get zip files to go there, just allow access to the internet. There are no problems with their servers. However ... If you have lots of zips going to various place, then there IS a problem - files are uploaded in the order in which they were created. So, first you have to wade through all of the files that were created before the ANZ files, THEN the ANZ files get a turn. And that means waiting for each file going to jasmin to either upload or time out. Part of the reason that the magic word for this project is Patience |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Ok thanks, then i know. I have no rush. Its only that i had problems before the disk space problems started.. But issues seems to have been resolved now changing proxy software. I will try to send the work in 3 weeks when i get back from my vacation :) |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Damnit.. Seems i still have issues uploading ANZ50 files.. I dont get it.. I have tried 3 different proxy software now and i changed the project from HTTP to HTTPS deleting all my file transfers for that one.. But i still cannot upload it seems like... 2019-07-03 11:57:22 | climateprediction.net | Temporarily failed upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip: transient HTTP error The ANZ50 server should accept files without problems i guess? I dont get it why i have issues here... |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Here is the log for this attempt with ANZ50 file.. Dont know if it is possible to see something there.. 2019-07-03 12:05:11 | climateprediction.net | Started upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip 2019-07-03 12:05:11 | climateprediction.net | [file_xfer] URL: http://upload4.cpdn.org/cpdn_cgi/file_upload_handler 2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 1210 bytes 2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 2819 bytes 2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 3523 bytes 2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 3596 bytes 2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 1734 bytes 2019-07-03 12:05:12 | | Internet access OK - project servers may be temporarily down. 2019-07-03 12:05:12 | | [http_xfer] [ID#232] HTTP: wrote 98 bytes 2019-07-03 12:05:13 | climateprediction.net | [file_xfer] http op done; retval 0 (Success) 2019-07-03 12:05:13 | climateprediction.net | [file_xfer] parsing upload response: <data_server_reply> <status>0</status> <file_size>262144</file_size></data_server_reply> 2019-07-03 12:05:13 | climateprediction.net | [file_xfer] parsing status: 0 2019-07-03 12:05:13 | climateprediction.net | [fxd] starting upload, upload_offset 262144 2019-07-03 12:05:15 | | Project communication failed: attempting access to reference site 2019-07-03 12:05:15 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error) 2019-07-03 12:05:15 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error) 2019-07-03 12:05:15 | climateprediction.net | Temporarily failed upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip: transient HTTP error |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Continuing to get this error: 7/3/2019 8:04:28 AM | climateprediction.net | Started upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip 7/3/2019 8:04:30 AM | | Project communication failed: attempting access to reference site 7/3/2019 8:04:31 AM | | Internet access OK - project servers may be temporarily down. 7/3/2019 8:04:52 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip: transient HTTP error |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
same here: 2019-07-03 07:33:21 | climateprediction.net | Temporarily failed upload of wah2_sam50_n6uw_201612_25_822_011884293_0_r566787151_13.zip: transient HTTP error |
Send message Joined: 14 Feb 06 Posts: 31 Credit: 4,507,116 RAC: 2,013 |
Continuing to get this error: That's the error that we're all getting due to the problems that the project is having. Just let BOINC keep trying. Eventually it will upload. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Yes, it could take another week to clear everything. I DID suggest that people suspend running models until the problem was fixed, but it looks like no one listens anymore. |
Send message Joined: 28 May 17 Posts: 49 Credit: 17,313,889 RAC: 7,078 |
CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either. I left the 2 that had not started yet suspended but the ones that had started I let them complete even if the uploads will take awhile. |
Send message Joined: 30 Mar 10 Posts: 12 Credit: 2,609,109 RAC: 87 |
Personally I suspended CPDN new tasks :) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,020,584 RAC: 20,684 |
CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either. Never seen a problem from suspending tasks if BOINC isn't stopped and restarted. Also a long time since even doing that I have lost a Windows task. Just to reiterate so the information stays near the top of the thread, Clearing the data and the backlog of people still uploading data some of whom have several hundred gigabytes means that it could easily be a week or more before the problems stop completely. Also no need to suspend any tasks other than sam50's as they go to different servers. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either. I have two dual booted Mac systems and a Dell with Windows 7 all of which are run ning CPDN. the two Macs have Fusion virtual machines running CPDN under Windows 7. (When there are no native MAC jobs available) I have to swap between the OSX versions on the Macs relatively frequently and have never had any problems with errors of any sort except once at the very beginning when I started to do this. I've now been doing this for years whilst testing variations of RCA software. The one problem I had at the beginning, occurred when shutting down Fusion before suspending CPDN which was in the middle of a Zip upload. Since than I've always made sure there are no uploads running before suspending CPDN and only then suspending Fusion. (I always suspend the tasks before suspending the project although I don't have any obvious reason to think that this is strictly necessary.) Also, I've never had a problem on the Dell with suspending, after taking similar suitable precautions. |
Send message Joined: 14 Feb 06 Posts: 31 Credit: 4,507,116 RAC: 2,013 |
I DID suggest that people suspend running models until the problem was fixed, but it looks like no one listens anymore. Do we have any idea how many of the 8,945 users with recent credit visit the message boards, and so would see your message? I assumed that it was going to be a very small proportion, so I didn't see the point in suspending things. So not so much not listening, just not seeing the point. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
I do believe that Les understands the figures. It seems to me that he was simply replying to those who do come onto the boards but ignore his advice and continue to complain. Actually there have been 5410 visit to this post, at this point, which is a lot more than the no. of replies so it would appear that many people probably have seen the post and may very well have acted accordingly. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
I have a safr50 job stuck as well, should I also suspend safr50 as well as the sam50 tasks? |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
I do check this thread numerous time a day which I guess combined with other regulars contributes also to the high number of visits. I suggested that more channels are used to spread the message but I haven't seen elsewhere - so no one listened. I did suspend the ones going to jasmin for almost a week, but again we have no clear info how things are going and what is to be expected. Once queues cleared I started few and uploads fail again. Additionally CPDN started to require micromanagement cause of numerous issues and yet info is scarce and we have to be patient. I think many of us are patient and persistent perhaps above average, yes CPDN is the most demanding project, but I started to feel I should not complain or ask for info.....because things are being dealt with and they are fixed up eventually. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,020,584 RAC: 20,684 |
I have a safr50 job stuck as well, should I also suspend safr50 as well as the sam50 tasks? Just checked in client_state.xml as I have an safr50 that is suspended to allow testing tasks to run. I can confirm that safr50's also go to jasmine so worth suspending that as well. And it isn't just clearing space on the servers that needs to happen before problems stop. Given that one person has posted saying they have 290GB to upload, an amount of data that from my connection would take over 10 days, there will be many computers competing for the limited number of connections the servers can take uploads on, even once all the backlog of data is cleared from the servers it reaches the data centre on, I don't see the problems clearing in under a week. To check if a task goes to jasmine search for the string, "upload_url" in your client-state.xml file and go through them till you find the one for the task in question. You should find something like the following, wah2_safr50_n0ym_198912_13_820_011866056_0_r5511092_1.zip</name><nbytes>0.000000</nbytes><max_nbytes>150000000.000000</max_nbytes><status>0</status><upload_url>http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler Anything other than Jasmine is either OK or has a different problem. |
Send message Joined: 28 May 17 Posts: 49 Credit: 17,313,889 RAC: 7,078 |
CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either. How soon you forget. You started this thread. https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8701#59554 CPDN tasks are some of the most fragile tasks of all the BOINC projects. Most have no issues suspending or at least going back to the last checkpoint. Even if they did go back the last checkpoint, no one wants to lose several days of work. There's a higher chance of losing work from suspending than from a task trickle upload being lost. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,020,584 RAC: 20,684 |
Reminder, it is likely to take at least a week till things are back to normal! How soon you forget. You started this thread. Not forgetting anything. That thread is specifically to do with a Linux batch. I know the problem with suspending and re-starting tasks when BOINC has been exited has not been resolved on the hadcm3 linux tasks. It remains to be seen how much of a problem it is with the HADAM4 and openifs tasks which will at some point be coming to Linux boxen. |
©2024 cpdn.org