climateprediction.net (CPDN) home page
Thread 'Upload problems'

Thread 'Upload problems'

Message boards : Number crunching : Upload problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 41991 - Posted: 21 Apr 2011, 11:48:39 UTC

Caught one

6.10.58 BOINC client Ubuntu
2.6.32-30-generic #59-Ubuntu SMP Tue Mar 1 21:30:46 UTC 2011 x86_64 GNU/Linux

All 12 uploads corrupted with <url>http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/file_upload_hnndler</url>

Noticed that happens only on my Intel machines, not AMD -- too small a sample to be significant.

this wu
ID: 41991 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 41992 - Posted: 21 Apr 2011, 12:51:14 UTC

Gaaaaargh --
This is driving me nuts!
I looked again at the source code at BOINC but can't see head nor tail.
ONE lousy character in the xml gets changed, sometimes. One some machines.With some models, not others. Sometimes. Only linux. Maybe only intel. But not always. Some models are susceptible, but no way to figure it.
And now when I want to do another test my only other Core 2 snarfed a hadcm3n while I wasn't looking --good -- need to do those, but --

So -- this is one weird problem -- What to do?
ID: 41992 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 41995 - Posted: 21 Apr 2011, 13:48:10 UTC

OK trying SIMAP and Einstein
ID: 41995 · Report as offensive     Reply Quote
ProfileWarped

Send message
Joined: 12 Sep 04
Posts: 34
Credit: 1,017,702
RAC: 0
Message 42016 - Posted: 24 Apr 2011, 7:47:20 UTC - in response to Message 41988.  

My feeling is to agree that it is not merely a PNW issue.
I have found references to malformed URLs on more than one upload server.

I think the best strategy would be for the CPDN sysadmins to search the apache logs on our servers and try to find out which models are failing, and use them to pull out the info about the client types that are failing.

I do agree that it would be good to find out whether this happens with other BOINC projects.

...so I have another item on my to-do list!

Jonathan
CPDN SysAdmin


Welcome Jonathan!

From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved?

I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them?
Warped
ID: 42016 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42018 - Posted: 24 Apr 2011, 8:45:27 UTC

The models for PNW disappeared about 3 days ago, as per this thread. The application is gone as well.

So someone is doing something. Not sure who or what.


Backups: Here
ID: 42018 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,498,085
RAC: 21,454
Message 42019 - Posted: 24 Apr 2011, 11:50:57 UTC - in response to Message 42016.  
Last modified: 24 Apr 2011, 11:54:05 UTC

From what I can gather, the problem seems particularly severe in PNW tasks for Linux. Would it be possible to disable the distribution of these work units to Linux clients until resolved?

This could fairly easily be done with making a customized plan-class, but with the current less-than-optimal staff-situation wouldn't expect this to happen at this point.

I have tried editing the client_state.xml file to no avail. Does the client make any contact with the URL http://boinc1.coas.oregonstate.edu/cpdn_cgi_main/ at any stage prior to upload? Could it be an error emanating from them?

The upload-server is only accessed then tries to upload a file, not before, so with the corruption being present at the time task was 1st. downloaded to client it has nothing to do with connections to upload-server.

As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs.

If you have exited BOINC, and afterwards edited client_state.xml and the wrong URL somehow gets re-created on next start of BOINC, this would be very interesting, since it's much easier to test-out things that happens with just a re-start of client, and not something that only happens on 1st. download of a task...
ID: 42019 · Report as offensive     Reply Quote
ProfileGreg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42020 - Posted: 24 Apr 2011, 20:12:05 UTC
Last modified: 24 Apr 2011, 20:35:32 UTC

The problem does occur with PNWs, but it's maddeningly intermittent. I got 6 PNWs on the 22nd. I have "grepped" my client_state.xml and there are no misspellings of file_upload_handler.

I have set up a cron job to regularly check client_state.xml for misspellings of "file_upload_handler".

Details: all 6 of my PNWs are re-dispatched tasks (after the first or second client returned an error). One of them has started processing, another should start in 3 days, then 3 more the day after that. Core i7-2600, Arch Linux 64 bit, my own glibc 2.13 compiled with -O2 -march=native -m32.
ID: 42020 · Report as offensive     Reply Quote
ProfileWarped

Send message
Joined: 12 Sep 04
Posts: 34
Credit: 1,017,702
RAC: 0
Message 42023 - Posted: 26 Apr 2011, 6:07:24 UTC - in response to Message 42019.  

As for editing client_state.xml, make sure you've completely exited BOINC-client before trying to edit the file, this includes exiting any BOINC-service or whatever it's called under Linux, if not the info in client_state.xml will just be overwritten with new info as BOINC runs.

If you have exited BOINC, and afterwards edited client_state.xml and the wrong URL somehow gets re-created on next start of BOINC, this would be very interesting, since it's much easier to test-out things that happens with just a re-start of client, and not something that only happens on 1st. download of a task...


Hi Ingleside.

For security reasons I run BOINC in a user account, not as root. It is only possible to edit the client_state.xml file when logged in as root. I tried everything I could think of, including rebooting and going straight to the root account in order to edit the file, making sure that BOINC was not running at all. I also edited the client_state_prev.xml file. I am sure that I was able to correct all instances of the misspelled "hnndler" in both files. Hence my suspicion that the file is somehow updated from the website in Oregon.

I aborted the task and have changed my preferences to run only the Southern Africa tasks, which happen to be the only type now available. This is running fine, except that the graphics file seems to have been corrupted on download. I can see that the model has not turned to an ice world but that is all (no timestep or other information). This is not a major issue as I can view the checkpoint progress via "Properties" in BOINC.
ID: 42023 · Report as offensive     Reply Quote
3rkko

Send message
Joined: 12 Feb 08
Posts: 66
Credit: 4,877,652
RAC: 0
Message 42027 - Posted: 26 Apr 2011, 17:07:11 UTC - in response to Message 42023.  

Did you make sure that the Boinc Client was not running? The command "sudo /etc/init.d/boinc-client status" tells you whether the Client is running or not. You can stop or start it by replacing status with "stop" or "start". This works in Ubuntu and should be something similar in other flavours of Linux.
ID: 42027 · Report as offensive     Reply Quote
old_user598777
Avatar

Send message
Joined: 23 Oct 09
Posts: 1
Credit: 1,171,675
RAC: 0
Message 42070 - Posted: 30 Apr 2011, 13:35:06 UTC - in response to Message 41989.  

I am having this problem on Windows 7 BOINC 6.10.60 (and the previous version)
There are 14 data sets waiting to upload.

Typical messages:


4/30/2011 5:50:55 AM climateprediction.net Started upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip
4/30/2011 5:52:14 AM Project communication failed: attempting access to reference site
4/30/2011 5:52:14 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip: HTTP error
4/30/2011 5:52:14 AM climateprediction.net Backing off 14 min 38 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_4.zip
4/30/2011 5:52:15 AM Internet access OK - project servers may be temporarily down.
4/30/2011 5:57:24 AM Project communication failed: attempting access to reference site
4/30/2011 5:57:24 AM climateprediction.net Temporarily failed upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip: HTTP error
4/30/2011 5:57:24 AM climateprediction.net Backing off 1 hr 47 min 33 sec on upload of hadam3p_pnw_yx88_1967_1_006898128_0_3.zip
4/30/2011 5:57:25 AM Internet access OK - project servers may be temporarily down.
4/30/2011 6:18:12 AM climateprediction.net Sending scheduler request: To send trickle-up message.

This has been going on since the database upgrade started.

Lewis Shadoff, Ph.D.
Lake Jackson, TX
ID: 42070 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42071 - Posted: 30 Apr 2011, 15:37:18 UTC
Last modified: 30 Apr 2011, 15:41:10 UTC

That sounds very like the problem I described (and gave a workaround for) here Lewis.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42071 · Report as offensive     Reply Quote
ProfileGreg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42073 - Posted: 30 Apr 2011, 15:55:58 UTC

Alternatively, it could be a virus scanner problem as described in this thread.
ID: 42073 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Upload problems

©2024 cpdn.org