Syncing to Archival Storage
This is a user-level guide for syncing a directory to CAC Archival Storage using Globus.
Prerequisites
- You know how to log into Globus.
- You are a user of a CAC project with archival storage service enabled. In this document,
denotes your CAC user name and denotes your CAC project name. - On the Linux host from where you want to run (either one time or regularly scheduled) sync commands, install Globus CLI client. The syncing script is a bash shell command so only Linux is supported.
- Tip: If running the
pip3 install globus-cli
command works for you, you can skip the install Globus CLI client documentation altogether.
- Tip: If running the
- If the source directory is not located on an existing Globus Connect Server endpoint, install Globus Connect Personal for Linux, MacOS, or Windows on the host where the source directory is located.
Log into Globus using CLI
On the Linux host from where you want to run sync commands,
-
Log into Globus using Globus CLI:
$ globus login Please authenticate with Globus here: ------------------------------------ https://auth.globus.org/v2/oauth2/authorize?........... ------------------------------------ Enter the resulting Authorization Code here:
-
Copy and paste the URL https://auth.globus.org/v2/oauth2/authorize?........... into a web browser. Log into Globus as instructed in the web browser. After logging in, copy and paste the code back into the session where you ran the
globus login
command and press enter.You have successfully logged in to the Globus CLI! You can check your primary identity with globus whoami For information on which of your identities are in session use globus session show Logout of the Globus CLI with globus logout
-
Verify you are logged into Globus using the
globus whoami
command and you should get your Globus ID in the output:$ globus whoami shl1@cornell.edu
Make a Guest Collection on CAC Archive
-
In a web browser, log into Globus. Under File Manager, go to
cac#archive02
collection and navigate to the/<CACProject>
directory. If you'd like, make a new directory to which data will be copied from the source directory. -
Follow the documentation on How To Share Data Using Globus to make the newly created directory a guest collection.
Configure the Source
-
If your source directory is located on an existing Globus Connect Server endpoint, you will need to make it a guest collection just as you did for the destination directory on CAC Archive.
-
If the source directory is not located on an existing Globus Connect Server endpoint, install Globus Connect Personal for Linux, MacOS, or Windows on the host where the source directory is located. Start the Globus Connect Personal endpoint on the source host.
Locate Source and Destination
- Back in Globus CLI client, locate the IDs of source and destination endpoints using the
globus endpoint search --filter-scope my-endpoints
command:$ globus endpoint search --filter-scope my-endpoints ID | Owner | Display Name ------------------------------------ | ---------------- | ---------------------- 4c8b5dda-389e-11ea-9710-021304b0cca7 | shl1@cornell.edu | my_source_endpoint 606579ae-5b03-11e9-bf32-0edbf3a4e7ee | shl1@cornell.edu | cac_archive_endpoint
Install the cli-sync.sh script
- Download the cli-sync.sh script onto your Linux host.
-
Open cli-sync.sh file and modify the following variables with appropriate values:
- SOURCE_ENDPOINT: ID of your source endpoint
- DESTINATION_ENDPOINT: ID of your destination point
- SOURCE_PATH: Should probably be "/"
- DESTINATION_PATH: Should probably be "/"
- SYNCTYPE: Read the comments in the script and decide carefully. checksum is the safest but slowest because it will make the destination host (CAC archive) to read the copied files from disk again to verify the checksum.
-
You now run cli-sync.sh script directly from the shell or as a cronjob for scheduled archival.