climateprediction.net (CPDN) home page
Thread 'Upload problem'

Thread 'Upload problem'

Message boards : Number crunching : Upload problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
old_user69295

Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 40658 - Posted: 12 Sep 2010, 5:03:12 UTC - in response to Message 40657.  
Last modified: 12 Sep 2010, 5:04:35 UTC

| What bugs me about it is that I live 15 minutes away from Oregon State.


Well, now I have to apologize to the folks at OSU. It seems that it's not their upload server that's having the problem:

Sat Sep 11 20:50:13 2010 climateprediction.net [file_xfer_debug] URL: http://climateapps1.oucs.ox.ac.uk/cgi-bin/file_upload_handler
Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: timeout on name lookup is not supported
Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: About to connect() to climateapps1.oucs.ox.ac.uk port 80 (#0)
Sat Sep 11 20:50:14 2010 [http_debug] [ID#22] Info: Trying 163.1.13.16...
Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Connection refused
Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Failed connect to climateapps1.oucs.ox.ac.uk:80; No such file or directory
Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Expire cleared
Sat Sep 11 20:50:21 2010 [http_debug] [ID#22] Info: Closing connection #0
Sat Sep 11 20:50:21 2010 [http_debug] HTTP error: Couldn't connect to server

=Mike

ID: 40658 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40659 - Posted: 12 Sep 2010, 6:07:28 UTC

The last zip file, (13), which is created about 10 minutes after the last zip that goes to OSU, goes to Oxford, as it contains the data to join it to the next in the series for that parameter set.

ID: 40659 · Report as offensive     Reply Quote
old_user69295

Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 40662 - Posted: 12 Sep 2010, 15:30:34 UTC - in response to Message 40659.  
Last modified: 12 Sep 2010, 15:42:56 UTC

Thanks ... good to know. Also, I saw the post about the server.

BTW: Who authorized giving Milo weekends off? :D

=Mike


    ID: 40662 · Report as offensive     Reply Quote
    ProfileMilo Thurston
    Volunteer moderator
    Volunteer developer

    Send message
    Joined: 2 Mar 06
    Posts: 253
    Credit: 363,646
    RAC: 0
    Message 40663 - Posted: 13 Sep 2010, 11:08:20 UTC

    I've now put a small NAS unit in the server room where climateapps1 is stored and I'm slowly copying data to it. This is not an ideal solution but it is small and cheap so I was actually able to get hold of it in a matter of days rather than months.

    Hopefully the server can be re-started later today.
    ID: 40663 · Report as offensive     Reply Quote
    Darmok

    Send message
    Joined: 29 Dec 09
    Posts: 34
    Credit: 18,395,130
    RAC: 0
    Message 40670 - Posted: 15 Sep 2010, 9:24:19 UTC
    Last modified: 15 Sep 2010, 9:25:05 UTC

    Uploads are working well but Boinc Manager has not updated credits. I'm not very concerned about it but I don't recall seeing this with all CPDN servers running. Is this part of the current issue?
    ID: 40670 · Report as offensive     Reply Quote
    old_user601544

    Send message
    Joined: 12 Nov 09
    Posts: 5
    Credit: 6,176
    RAC: 0
    Message 40741 - Posted: 21 Sep 2010, 21:01:29 UTC

    depuis plusieurs semaines, je n'arrive pas à faire le transfert des résultats.
    Voici les messages régulièrement reçus :

      21/09/2010 21:32:53 climateprediction.net Started upload of famous_u0y8_599_200_006634083_0_14.zip
      21/09/2010 21:32:53 climateprediction.net Started upload of famous_u0y8_599_200_006634083_0_17.zip
      21/09/2010 21:34:23 Project communication failed: attempting access to reference site
      21/09/2010 21:34:23 climateprediction.net Temporarily failed upload of famous_u0y8_599_200_006634083_0_14.zip: HTTP error
      21/09/2010 21:34:23 climateprediction.net Backing off 1 hr 34 min 3 sec on upload of famous_u0y8_599_200_006634083_0_14.zip
      21/09/2010 21:34:26 Internet access OK - project servers may be temporarily down.
      21/09/2010 21:34:43 Project communication failed: attempting access to reference site
      21/09/2010 21:34:43 climateprediction.net Temporarily failed upload of famous_u0y8_599_200_006634083_0_17.zip: HTTP error


      21/09/2010 21:34:43 climateprediction.net Backing off 3 hr 16 min 53 sec on upload of famous_u0y8_599_200_006634083_0_17.zip
      21/09/2010 21:34:45 Internet access OK - project servers may be temporarily down.


    [list=][img][/img][/list]

    ID: 40741 · Report as offensive     Reply Quote
    Profilemo.v
    Volunteer moderator
    Avatar

    Send message
    Joined: 29 Sep 04
    Posts: 2363
    Credit: 14,611,758
    RAC: 0
    Message 40742 - Posted: 21 Sep 2010, 22:53:56 UTC

    Credits are not related to uploads (except that if you can't upload your trickles you won't receive credit for them until you do!).

    There's a script that runs once a day and puts our credit into our accounts. Another script, which also runs once a day though I think at a different time, exports a record of our credit to the external stats sites like BoincStats. Occasionally these scripts have to be disabled because other jobs are being done on the server, or someone turns a script off then forgets to reenable it. The data will all be there, though, not lost.

    I don't think Milo has access to all this at the moment so we may just have to be patient.
    Cpdn news
    ID: 40742 · Report as offensive     Reply Quote
    old_user71841

    Send message
    Joined: 23 Apr 05
    Posts: 1
    Credit: 398,113
    RAC: 0
    Message 40781 - Posted: 27 Sep 2010, 5:47:49 UTC
    Last modified: 27 Sep 2010, 5:49:06 UTC

    Hi, I do have upload-problems here (for some days now):

    27.09.2010 07:11:01 climateprediction.net Started upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip
    27.09.2010 07:33:07 Project communication failed: attempting access to reference site
    27.09.2010 07:33:07 climateprediction.net Temporarily failed upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip: HTTP error
    27.09.2010 07:33:07 climateprediction.net Backing off 3 hr 54 min 30 sec on upload of hadam3p_pnw_v3bl_1993_1_006722916_0_13.zip
    27.09.2010 07:33:09 Internet access OK - project servers may be temporarily down.

    Trickles from other WUs running on my machine were uploaded without any problem.
    Does somebody have an idea how to solve it?
    ID: 40781 · Report as offensive     Reply Quote
    Profileold_user206925

    Send message
    Joined: 4 Nov 06
    Posts: 10
    Credit: 1,717,441
    RAC: 0
    Message 40824 - Posted: 9 Oct 2010, 1:57:17 UTC

    I thought it better to post here rather than start another "upload trouble" thread...

    I am having the same problems as legolas; I have 10 zip files sitting here that have been trying to upload for around a fortnight now.

    I stopped client, created cc_config file, restarted and read config file for the HTTP option but it had no effect.
    I have also rebooted this machine.

    Here are the models, versions and filenumbers ( 3 different WUs ):

    All models are Famous 6.11 files _5, _12 (x2), _13 (x2), _14 (x2), _15 (x2), _16, _20.

    2 of these Famous models are completed (1 says it is ready to report, 1 says it is uploading) and on "computation errored" out at 160 odd hours.
    Another model is almost complete.


    Any idea how I can get the work/ files up to the servers as CPDN is one of my fave projects and I dont want to stop crunching it.

    Thanks
    Veebee
    ID: 40824 · Report as offensive     Reply Quote
    ProfileJIM

    Send message
    Joined: 31 Dec 07
    Posts: 1152
    Credit: 22,363,583
    RAC: 5,022
    Message 40825 - Posted: 9 Oct 2010, 5:04:00 UTC - in response to Message 40824.  

    Dear Veebee:

    Check the server status page. The server has been down for the past 3 days. The Scheduler, transitioner and the feeder are all not running so you cannot report completed tasks or get new ones.

    Read more in the "News and Announcements" tread at the top in Number Crunching.

    ID: 40825 · Report as offensive     Reply Quote
    Profilemo.v
    Volunteer moderator
    Avatar

    Send message
    Joined: 29 Sep 04
    Posts: 2363
    Credit: 14,611,758
    RAC: 0
    Message 40826 - Posted: 9 Oct 2010, 5:57:32 UTC

    The completed model can't report until climateapps2 has all (or more) of its programs up and running, but both these FAMOUS v.11 models should be able to upload all their files in spite of the outage. All these files should upload to kraken which has had no recent outages.

    Veebee, I think this must be a problem at your end, not with the server.

    The timeout on file uploads was lengthened to 90 days so there's no rush from that point of view but I think each file is only allowed 100 upload attempts. Don't keep repeating manual retries (the Retry Now button) until someone can suggest more ideas.
    Cpdn news
    ID: 40826 · Report as offensive     Reply Quote
    Profileold_user206925

    Send message
    Joined: 4 Nov 06
    Posts: 10
    Credit: 1,717,441
    RAC: 0
    Message 40827 - Posted: 9 Oct 2010, 9:45:05 UTC - in response to Message 40826.  
    Last modified: 9 Oct 2010, 9:51:08 UTC

    Quote from mo.v:
    Veebee, I think this must be a problem at your end, not with the server.
    end quote.

    I don't know ... I have two "identical" machines (i7-920's) and they are both crunching and up/ downloading other projects without issue.
    THIS machine is one I downloaded a few extra Climate models on to cover work shortages on a chosen project, the other only has the one model running but hasn't (as yet) had a zip file sit there unable to upload.
    (mind you, that one is a HADSM3 slab model - just had a "close look" nearly 1200 hours so far !!! :O )

    I shall avoid manually retrying uploads on them, but I am having that sinking feeling that all that CPU time is gonna be wasted .. :`(

    BTW: two of the zip files Do say they are at 100% uploaded and a few of the pothers get to a certain point and stop...
    ID: 40827 · Report as offensive     Reply Quote
    ProfileThyme Lawn
    Volunteer moderator

    Send message
    Joined: 5 Aug 04
    Posts: 1283
    Credit: 15,824,334
    RAC: 0
    Message 40828 - Posted: 9 Oct 2010, 11:13:33 UTC - in response to Message 40824.  
    Last modified: 9 Oct 2010, 14:37:30 UTC

    Veebee wrote:
    Any idea how I can get the work/ files up to the servers as CPDN is one of my fave projects and I dont want to stop crunching it.

    If BOINC is making simultaneous attempts to upload the files it's possible that you're hitting a 5 minute inactivity timeout on the files. That's most likely if you have a relatively slow connection, are restricting BOINC's upload bandwidth or have a busy connection (e.g. more than one computer attempting a large upload at the same time or a large non-BOINC file transfer on the same computer).

    I've found that when BOINC is doing multiple uploads it has a tendency to favour the most recently started upload. That can result in nothing being sent for uploads which are already in progress until the more recent upload has completed. If an upload is locked out in this way for longer than 5 minutes it is timed out and has to be restarted. The restart offset is negotiated with the server but very frequently the server seems to have lost track of how much has already been received (possibly something is causing it to delete the data it has already received?) and restarts from 0.

    The only way I've found of getting round this is to restrict the number of simultaneous uploads (the default is 8 in total and no more than 2 per project) by including the following in cc_config.xml:

    <cc_config>
    <options>
    <max_file_xfers>2</max_file_xfers>
    <max_file_xfers_per_project>1</max_file_xfers_per_project>
    </options>
    </cc_config>

    Depending on your mix of projects you might need to increase <max_file_xfers> (setting it to 1 is possible, but that would prevent other projects from uploading results until all of your CPDN files have been sent).
    "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
    ID: 40828 · Report as offensive     Reply Quote
    Ingleside

    Send message
    Joined: 5 Aug 04
    Posts: 127
    Credit: 24,535,403
    RAC: 12,813
    Message 40831 - Posted: 10 Oct 2010, 11:16:45 UTC - in response to Message 40826.  
    Last modified: 10 Oct 2010, 11:22:28 UTC

    The timeout on file uploads was lengthened to 90 days so there's no rush from that point of view but I think each file is only allowed 100 upload attempts. Don't keep repeating manual retries (the Retry Now button) until someone can suggest more ideas.

    I'm not aware of any limits on #retries, and a quick test reveals that manually increasing to 11000 connection-attempts had no effect, the upload just kept retrying as before.

    Worth remembering, since many CPDN-users still seems to use old BOINC-clients, is that the increase to 90-day is only for the v6.10.xx and later clients.


    As for checking for connection-problems, the 1st. is always to re-boot any modems, routers and so on, and to re-boot the affected computer.

    If this doesn't work, try creating/edit a cc_config.xml (placed in BOINC data-directory) containing minimum the following lines:
    <cc_config>
    <log_flags>
    <file_xfer_debug>1</file_xfer_debug>
    <http_xfer_debug>1</http_xfer_debug>
    </log_flags>
    </cc_config>

    And just select to "Read config file" in BOINC Manager.

    Keeping <file_xfer_debug> always enabled is an advantage, since you'll always get info about which upload-server is tried connected, making it easy to check with the server status-page if this server is down, and you don't need to manually search-through client_state.xml to get this info. The transfer-speed will also be logged if the transfer was successful.

    The 2nd. option on the other hand will create much extra info, so disabling it again after fixing the problem is recommended. To disable, just change the 1 to a zero, and re-read config-file.

    A couple other <log_flags> that possibly also can be useful is:
    <http_debug>1</http_debug>
    <proxy_debug>1</proxy_debug>


    edit - I see Gundolf Jahn also did mention some of the log-flags earlier in the thread.
    ID: 40831 · Report as offensive     Reply Quote
    ProfileIain Inglis
    Volunteer moderator

    Send message
    Joined: 16 Jan 10
    Posts: 1084
    Credit: 7,860,975
    RAC: 4,768
    Message 40835 - Posted: 10 Oct 2010, 20:18:02 UTC - in response to Message 40827.  

    [Veebee wrote:] ...(mind you, that one is a HADSM3 slab model - just had a "close look" nearly 1200 hours so far !!! :O )
    That model, hadsm3dhet2_k8ob_006620893_7, has become a slow-processing 'iceworld'. Painful though it might be at this stage, the best thing to do with that model is to abort it: it will finish eventually, but the data from the freeze point onwards is invalid. Some efforts have been made to find the cause, which is so far proving elusive.
    ID: 40835 · Report as offensive     Reply Quote
    Darmok

    Send message
    Joined: 29 Dec 09
    Posts: 34
    Credit: 18,395,130
    RAC: 0
    Message 40839 - Posted: 11 Oct 2010, 14:50:30 UTC
    Last modified: 11 Oct 2010, 14:50:59 UTC

    Read Milo's announcement but I still encounter failed downloads for the past several days. Everything else is Ok.

    Started download of atmos_v3xe_1199_200_006736082_0.gz
    Project communication failed: attempting access to reference site
    Temporarily failed download of atmos_v3xe_1199_200_006736082_0.gz: HTTP error

    Thanks
    ID: 40839 · Report as offensive     Reply Quote
    ProfileastroWX
    Volunteer moderator

    Send message
    Joined: 5 Aug 04
    Posts: 1496
    Credit: 95,522,203
    RAC: 0
    Message 40840 - Posted: 11 Oct 2010, 23:23:38 UTC - in response to Message 40839.  

    Don't know what's what's wrong but I had two files remaining, partially downloaded. After a day or two, I realized that each boinc attempt downloaded a small bite of bytes. Because the remaining files were relatively small and already partially downloaded and didn't restart each time, I decided to click 'Retry Now'... and click... and click.... Eventually, the downloads finished. (Pathetic way to get the job done, actually.)

    Why the server permitted that bit of foolishness, when it refused to complete the transaction on its own, one can only guess.
    "We have met the enemy and he is us." -- Pogo
    Greetings from coastal Washington state, the scenic US Pacific Northwest.
    ID: 40840 · Report as offensive     Reply Quote
    Darmok

    Send message
    Joined: 29 Dec 09
    Posts: 34
    Credit: 18,395,130
    RAC: 0
    Message 40841 - Posted: 12 Oct 2010, 10:13:57 UTC - in response to Message 40840.  

    [quote]I decided to click 'Retry Now'... and click... and click.... Eventually, the downloads finished. (Pathetic way to get the job done, actually.)

    Thanks AstroWX. This would confirm it is a problem with the CPDN download server.

    I clicked several times also but my confidence in the downloaded files was low for not being corrupted and possibly wasting computing time so I regrettably aborted them and held off on downloads until a resolution to this issue or a confirmation the models will behave properly to the end.
    ID: 40841 · Report as offensive     Reply Quote
    ProfileMilo Thurston
    Volunteer moderator
    Volunteer developer

    Send message
    Joined: 2 Mar 06
    Posts: 253
    Credit: 363,646
    RAC: 0
    Message 40842 - Posted: 12 Oct 2010, 10:25:21 UTC

    I know that there's a problem with downloads from climateapps2 and I am working as best I can to fix it.
    I cannot say how many days it will take to fix as it depends upon many factors.
    ID: 40842 · Report as offensive     Reply Quote
    ProfileMilo Thurston
    Volunteer moderator
    Volunteer developer

    Send message
    Joined: 2 Mar 06
    Posts: 253
    Credit: 363,646
    RAC: 0
    Message 40858 - Posted: 14 Oct 2010, 9:04:06 UTC

    Hiro was able to extract a disk from his cluster, which I have used to add more space to climateapps2. So, there should now be no problem with downloads.
    ID: 40858 · Report as offensive     Reply Quote
    Previous · 1 · 2 · 3 · 4 · Next

    Message boards : Number crunching : Upload problem

    ©2024 cpdn.org