Using Git with GEOS-Chem

From Geos-chem
Jump to navigation Jump to search

This page describes how to use the Git version control system to download and manage the GEOS-Chem source code package.

Obtaining and installing Git

You will need to make sure that Git is installed on your system. Git is free, open-source software. You may want to ask your sysadmin or IT department to install Git for you. Please see this wiki post for more information.

Downloading a new GEOS-Chem version

Initial download

Code directory

The GEOS-Chem source code repository has now been migrated from CVS to Git, and is now available for remote download. We recommend that you download each new version of GEOS-Chem into a separate source code directory.

All users should use the following syntax:

git clone git://git.as.harvard.edu/bmy/GEOS-Chem/ LOCAL_DIR_NAME

where LOCAL-DIR-NAME is the name of the local directory on your disk into which the GEOS-Chem source code files will be placed. It is up to you to pick LOCAL-DIR-NAME.

For more information, please see Chapter 2.2: Downloading the GEOS-Chem source code in the GEOS-Chem Online User's Guide.

Run directories

Each of the GEOS-Chem run directories is saved as a separate Git repository, rather than putting them into a single repository. This is because some of the files (i.e. restart files) can be very large.

The download command looks like this:

git clone git://git.as.harvard.edu/bmy/GEOS-Chem-rundirs/DIR-OPTION LOCAL_DIR_NAME

Where:

  1. LOCAL-DIR-NAME is the name of the local directory on your disk into which the GEOS-Chem source code files will be placed.
  2. DIR-OPTION specifies the GEOS-Chem run directory for the given met field type, horizontal resolution, and simulation type.

For a complete list of all values of DIR-OPTION, please see Chapter 2.3: Downloading the GEOS-Chem run directories in the GEOS-Chem Online User's Guide.

Shared data directories

The GEOS–Chem shared data directories contain the various met fields, emissions, and other data that GEOS–Chem will read in during the course of a simulation. You must download the shared data directories via FTP or a similar utility (e.g. wget, FireFTP, SecureFX, etc.) The large volume of data makes it impossible to track this directory structure with Git.

For more information about how to download these directories, please see Chapter 2.4: Downloading the GEOS-Chem shared data directories in the GEOS-Chem Online User's Guide.

Ignoring files

Git also allows you to ignore certain types of files that we don't need to track (e.g. anything that can be built from the source code). These typically include:

  • Object files (*.o)
  • Library files (*.a)
  • Module files (*.mod)
  • Autosave files (*~)
  • Executable files (geos, geostomas)

You can tell Git that you don't want these files to be tracked by editing the .git/info/exclude file in your source code directory. Open this file in your favorite text editor and edit it to look like this:

# git-ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
*.[oa]
*.mod
*~
geos
geostomas

Viewing the revision history

The best way to examine the contents of your Git-backed GEOS-Chem source code is to use the gitk viewer. There are two ways to do this:

(1) Change into the Code.v8-03-01 directory and start gitk as follows:

cd Code.v8-03-01
gitk --all &       # This will show ALL open branches

(2) Or if you are using the git gui GUI browser (more on that below), you can invoke gitk from the Repository/Visualize master's History menu item.

At the top left of the gitk screen, you will see the graph of revisions. Each dot represents a commit, along with the log message that accompanied each commit.

Note that the most recent commit (i.e. the line at the very top), there are 2 green boxes at the top, one named master and one named origin:

origin
This was the state of the repository on the remote server when you checked it out for the first time. Therefore, this is the "pristine", unchanged code that you got from the download.
master
This is the current state of the local repository now. Since we haven't done anything to the code yet, the master and origin point to the same commit.

If you click on any of the commits in the top left window, in the window below, you will see the log message and a list of changes to the source code. The old code is marked in RED and the new code is marked in GREEN. At right you will also see a list of files that were changed during the commit.

So it's really easy to see how the code has evolved with gitk.

Making revisions

Using the GUI browser

We recommend using the git gui for source code management. Start this in your Code.v8-03-01 subdirectory:

cd Code.v8-03-01
git gui &

On the left there are 2 windows:

Unstaged Changes
An unstaged change is a modification that Git does not know about yet. If you modified any files since the last commit, then they should be displayed in this window. Also, right above this window you will find the name of the current checked-out branch.
Staged Changes
These are changes that Git will add to the repository the very next time you make a commit.

In general, anytime you need to modify the source code, you should NOT do it on the master branch. You should create a new branch for your modifications. Then you can test your modifications ad nauseum until you are sure that everything is functioning as it should. When your modifications are complete, you can merge them back into the master branch. Then the branch you created can be deleted.

The advantage of this approach is that if you ever need to start over from scratch, you can just go back to the master branch and you will get back the state of the code before your modifications were added.

Creating branches

To create a new branch, go to the Branch/Create on the menu (or type CTRL-N). You will get a dialog box that prompts you for the new branch name. Type a unique name and then click OK.

You should pick names that have meaning to you. Some good branch names are:

  • Bug_fix_sulfate_mod
  • CO2_simulation
  • KPP_with_isoprene
  • Methane_simulation

etc. You will be automatically placed into the branch you have just created.

Committing

With Git, you should commit frequently, such as when you have completed making revisions to a file or group of files. Commits that are made on one branch will not affect the other branches.

Committing is best done with the git gui. Basically you follow these steps:

  1. To force the git gui to show the latest changes, you can pick Commit/Rescan from the menu (or type the F5 key).
  2. You should get a list of files in the Unstaged Changes window. Clicking on the icons on the left of the file names will send them to the Staged Changes window. Once they are in the Staged Changes this means that Git will add them to the repository on the very next commit. Note: Clicking on the icon of the files in the Staged Changes moves back the file to the Unstaged Changes window.
  3. Type a Commit message in the bottom right window. See this example of a good commit message. Some pointers are:
    1. The first line should only be 50 characters or less and succinctly describe the commit
    2. Then leave a blank line
    3. Then add more in-depth text that describes the commit
    4. Then click on the Signed-off by button. This will add your name, email address, and a timestamp.
  4. There are two radio buttons above the Commit message window.
    1. New commit: This is the default. Assumes we are making a totally new commit.
    2. Amend last commit: If for whatever reason we need to update the last commit message, pick this button.
  5. Then when your commit message is done, you can click on the Commit button.

Then if you start the gitk viewer, your new commit should be visible.

Renaming files

In some instances you may find it necessary to rename files. For example, in GEOS-Chem v9-01-02, we have had to rename file ending in .f to .F and .f90 to .F90. If only the name of the file changes, then Git will recognize it as a renamed file in the repository. To rename a file, follow these steps:

  1. Change the name of the file with the Unix mv command. For example: mv myfile.f myfile.F
  2. Open the Git GUI. You will see the two files myfile.f and myfile.F listed in the Unstaged Changes window.
  3. Click on myfile.f and myfile.F; this will move them to the Staged Changes window.
  4. In Staged Changes you will see:
    1. File myfile.f is slated to be removed (i.e. a red "X" is listed next to the file name).
    2. File myfile.F is slated to be added (i.e. a green checkmark is listed next to the file name).
  5. Add a commit message, sign off, and click Commit as described above.
  6. Start the gitk browser. In the lower left window, you should see text such as:
  ---------------- GeosCore/myfile.F --------------------------
  similarity index 100%
  rename from GeosCore/myfile.f
  rename to GeosCore/myfile.F

From this point forward, file myfile.F will use the *.F file extension. However, it will still possess the total revision history from when the file was still named myfile.f. If you merge changes from another repository that still has myfile.f, then these changes will be seamlessly integrated into myfile.F.

--Bob Y. 10:19, 18 August 2011 (EDT)

Switching between branches

Before you switch from one branch to another (aka "checking out a branch"), it is recommended to commit any remaining unstaged files to the current open branch. Unstaged files will remain in your working directory even after you checkout a different branch. This can potentially lead to confusion.

To checkout a new branch, go to Branch/Checkout on the menu and pick the name of the branch you would like to switch to. The current branch name will be displayed just below the menu at top left.

Once you have created your branch and have checked it out, then you may begin making modifications to the source code with your favorite text editor.

We recommend to keep one open branch per new feature that you are adding into GEOS-Chem. This will allow you to test each individual feature separately. After each feature has been validated, you may merge each individual branch back into the master branch.

Merging

When you are ready to merge your changes back into the mainline master branch, then you can follow this procedure.

  1. Switch back to the master branch by selecting Branch/Checkout from the menu (or type CTRL-O). You will be given a dialog box of available branches. Select master and press OK.
  2. From the menu, pick Merge/Local Merge (or CTRL-M).

This should merge your changes back into master. If you then use the the gitk viewer, then the merge you just made should be visible.

Resolving conflicts caused by a merge

Sometimes you may encounter a conflict when merging one branch into another branch. This can happen when you are merging code from an older GEOS-Chem version into the latest version. A conflict is just Git saying, "I found some code in a place where I didn't expect to find it. Can you help me figure out which lines of code to keep and which to throw away?"

You will see each file containing a conflict listed in the Unstaged Changes window of the Git Gui. Clicking on each file will display the lines of code where the conflicts are located. You will see one or more slugs in the file. A slug is a block of text that displays the source code from the old branch and the new branch.

<<<<<<< HEAD
! This is old source code that already exists in the branch
...
=======
! This is new source code that is being merged into the branch
...
>>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086

At the top of the slug you see the string <<<<<<< HEAD followed by some source code. This is the "old" code, i.e. the code that existed as of the last commit. A separator line ======= then follows the source code.

Underneath the separator line, you will see the "new" source code, i.e. the code that we are merging into the branch. This source code is followed by the text >>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086. The long numeric string is the SHA1 ID (the internal ID # used by Git) corresponding to the commit that we are merging into the branch. Each commit has a unique SHA1 ID.

To resolve a file containing conflicts, do the following:

  1. Open the file in your favorite text editor (vi, emacs, or whatever)
  2. Search for the word HEAD. This will take you to the location of each slug (where conflicts exist).
  3. Decide which code that you want to keep.
  4. Delete the code that you do not want to keep
  5. Delete the lines <<<<<<< HEAD and >>>>>>>' 7797... If you keep these in the source code you will get compilation errors.
  6. Repeat this process for each conflict that you find.

Once you have resolved the conflicts in each file, you can commit them back into the repository.

--Bob Y. 16:17, 28 July 2011 (EDT)

Tagging

Git also allows you to tag a particular commit with an alphanumeric string for easy reference. This tag will allow users to just refer to the tag name using git pull.

Tagging can be done in one of two ways. You can add a tag via the command line:

git tag GEOS-Chem v8-03-01
git tag GEOS-Chem v8-03-01-patched

etc. at the Unix command line.

You may also add a tag via the gitk viewer utility, as follows:

  1. Open the gitk browser:
    • Type gitk (to view the current branch), or
    • Type gitk --all to view all branches.
  2. In the top-left window select a commit by clicking on it with the mouse
  3. Right-click to pull up the context menu. Select the Create tag option.
  4. Type in your tag text and hit RETURN

NOTE: Tagging is something that typically only the GEOS-Chem support team will do.

Deleting branches

Once you have merged your changes back into the master branch, you may delete the branch you just created. In the git gui, go to the Branch/Delete menu item. You will be given a dialog box where you can select the name of the branch you wish to delete.

Sharing your revisions with others (and vice versa)

One of the really nice features of Git is that it can create patch files, or files which contain a list of changes that can be imported into someone else's local Git repository. Using patch files totally obviates the need of having to merge differences between codes manually.

Creating a patch file to share with others

To create a patch file containing the code differences between a branch of your code with your master branch, or since type the following text:

For example, if you want to difference a branch of your code with your master branch, then type:

git format-patch master..BRANCH_NAME --stdout > my-patch-file.diff

where BRANCH_NAME is the name of the branch that you want to compare against the master branch.

You can also create a patch file for a given number of past commits. Typing:

git format-patch -3 --stdout > my-patch-file.diff

will create a patch file for the last 3 commits. If you want the most recent commit then use -1 instead, etc.

These commands will pipe the output from the git format-patch command to a file named by you (in this case my-patch-file.diff, but you may select whatever name you wish). You can then include the patch file as an email attachment and send it to other GEOS-Chem users, or the GEOS-Chem Support Team.

Checking the validity of a patch file

Other users can also send you their source code revisions as patch files. If you want to check the status of a Git patch file (i.e. what it will actually do) before you apply it to your repository, you can use the git apply command as follows:

% git apply --stat my-patch-file.diff

 GeosCore/aerosol_mod.f |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

The sample output listed above indicates that the patch contained 4 insertions and 3 deletions from only one file (aerosol_mod.f).

Note that the git apply --stat command does not apply the patch, but only shows you the stats about what it'll do. For more detailed information about the patch, you can open it in either an emacs or vi window and examine it manually.

You can also find out if the patch will install in your Git repository, or if there will be problems. You can also use the git apply command to do this:

git apply --check my-patch-file.diff

The most often error that is encountered is that the patch was made from an earlier version of GEOS-Chem. In that instance the situation can usually be rectified by having the sender of the patch do a git pull to the last-released GEOS-Chem version and then to create the patch again.

Reading a patch file into your local repository

To ingest a patch file into your local Git repository you should first make a new branch. Follow this procedure:

  1. Pick Branch/Create from the menu (or type CTRL-N). Give your branch a descriptive name like Updates_from_xxxx" that will serve as a mnemonic.
  2. Pick Branch/Checkout from the menu (or type CTRL-O) and switch to the branch you just created.
  3. To ingest the other person's source code changes, type:
     git am < their-patch-file.diff

You can then test the other person's revisions in the separate branch until you are sure they are OK. Then you can merge them back into the master branch as described above.

--Bob Y. 13:36, 2 October 2013 (EDT)

Invalid email address error

If you get the the following error while trying to run the command git am < their-patch-file.diff:

     Patch does not have a valid e-mail address.

Then use this command instead:

     git apply their-patch-file.diff

which should ingest the changes from the patch file into your repository. Why the difference? Long story short:

  • If their-patch-file.diff was created with the git format-patch command, then it will contain the name of the committer plus the commit log message at the top of the file. The git am command uses this information to create the commit message in your repository.
  • If on the other hand, their-patch-file.diff was created in gitk by right-clicking on the Make patch menu entry, then it will lack the email address of the committer and log message. This will confuse the git am command. Using git apply will ingest the changes into your repository, but you will have to add the commit message yourself in the git gui.

--Bob Y. 13:35, 2 October 2013 (EDT)

More about patch files

For more information about Git patch files, please see the following links:

Sending Patches with Git
A guide how to use the patch feature of Git to send your changes to another user.
How to create and apply a patch with Git
Another nice explanation of how to use Git to send patch files.

Getting updates from the remote repository

When a new GEOS-Chem version is released, we recommend that you download it into a new local directory with the git clone command.

However, there may be times when "patches" (i.e. minor updates to fix bugs or other issues) need to be applied to an existing GEOS-Chem version. The easiest way to obtain patches is to use the git pull command, as follows:

  1. Change to your local code directory (e.g. Code.v8-03-01)
  2. Make a new branch named patch (or something similar).
  3. Check out the patch branch. Now we are ready to obtain the updates from the remote server.
  4. Use the git pull command to download the updated files, as follows:
  5. Test: compilation and few time steps to make sure everything is fine
  6. Check out the master branch.
  7. Merge the patch branch into your master branch.
  8. Delete the patch branch.

This will merge the changes from the master branch of the remote repository into your master branch.

--Bob Y. 10:59, 17 September 2010 (EDT)

Reverting to an older state of the code

When you clone GEOS-Chem from the remote repository to your local disk space (with the git clone command), the repository will point to the most recent commit. However, you may want to revert to an older state of the code. Git allows you to do this very easily.

Let's assume that the latest version of the code is v8-03-02, but that you want to use the previous version v8-03-01. The procedure is as follows:

  1. Clone GEOS-Chem with git clone git://git.as.harvard.edu/bmy/GEOS-Chem Code.v8-03-02
  2. Open the gitk browser by typing gitk & at the command line.
  3. In the top-left window of gitk, find the commit that you want to revert to. Usually this will be denoted with a yellow tag (e.g. v8-03-01 or v8-03-01-benchmark). However, if there are any post-release patches, be sure you select the oldest one.
  4. Right click with the mouse. This will open a context menu. Select Create new branch.
  5. A new dialog box will pop up asking you to name the branch. Type Code.v8-03-01 and press OK.
  6. Close gitk and open the Git GUI by typing git gui &
  7. From the Git GUI dropdown menu, select Branch / Checkout, and then pick Code.v8-03-01.

That's it! We now have two branches that represent different GEOS-Chem versions.

  1. The master branch represents the state of the code as of the v8-03-02 release
  2. The Code.v8-03-01 branch represents the state of the code as of the v8-03-01 release.

You can work on the v8-03-01 branch as you wish. You can create further branches off of the v8-03-01 branch. The nice thing about this method is that you can always revert to the latest v8-03-02 release by just switching back to the master branch (with Branch / Checkout from the Git GUI dropdown menu).

You can also use this same method to check out older versions of files in any of the GEOS-Chem run directories.

--Bob Y. 09:46, 30 June 2011 (EDT)

Adding a patch that was made to a previous version

Let's say you are currently working on GEOS-Chem v9-01-01, and somebody gives you a patch that they added into their own GEOS-Chem v8-03-02 code. You can add the patch into your v9-01-01 code in such a way that the modification will be at the head of the revision history. Here is the procedure.

1. If you haven't done so already, clone the current v9-01-01 repository
   git clone git://git.as.harvard.edu/bmy/GEOS-Chem  Code.v9-01-01
   cd Code.v9-01-01
2. Start the Git GUI interface:
   git gui &
3. Start the Gitk browser. From the dropdown menu in Git Gui, select:
    Repository / Visualize All Banch History
4. In the Gitk revision, look for the parent commit of the patch in the revision history (upper-left) window. The parent commit is the one immediately preceding the patch. If you are not sure where the parent commit is, you can ask the person who sent you the patch.
5. We will create a new branch from the parent commit into which the code updates will be placed. Once you have found the parent commit, right click on it with the mouse. This will pop open a context menu. Select:
   Create new branch  (name it patch-install-point and hit OK)
6. We need to checkout (i.e. switch to) the new branch. Go back to the Git GUI. From the dropdown menu, select:
   Branch / Checkout (and then click on patch-install-point)
7. Locate the patch file that contains the update you wish to add to the code. Let's assume that this is called patch-file.diff. We will now apply this to your source code directory:
     git am < ~/my_patch_directory/patch-file.diff
NOTE: patch-file.diff does not have to be in the source code directory. It can be anywhere, as long as you specify the full file path. Here we assume it’s in ~/my_patch_directory.)
8. In the Git GUI, press the F5 key to refresh the display.
9. In the GitK browser, press the F5 key to refresh the display.
10. Now we want to merge the master branch into the patch-install-point branch. This will bring in all of the previous commits from the master branch into the patch-install-point branch, while keeping the new commits from the patch at the top of the revision history. Switch to the Git Gui window and pick from the menu:
     Merge / Local Merge (and then click on master)
11. The merge may result in some conflicts in some source code files. A conflict is a difference in the source code which Git cannot rectify. Most often conflicts are caused by 2 comments having the same number. Git will add the following lines to the source code:
   <<<<<HEAD
   ... lines of existing code ...
   =====
   ... lines of new code ...
   >>>>>
If there are conflicts, you will have to go through these manually. You can just hand-edit the source code files with your favorite text editor (we recommend Emacs). If there are no conflicts, you can skip ahead to Step 14.
12. Once you have finished resolving all conflicts, commit the modified files to the repository. In the Git Gui, click on each of the files in the “Unstaged Changes” window (this will tell Git to commit them to the repository). Then click on the “Sign off” button and click on the “Commit” button.
13. At this point you will have 2 branches. The master branch represents the pristine, unmodified code from the remote repository. The patch-install-point branch represents master branch plus the code from the patch that we added.
14. Test the code in patch-install-point to make sure that it is functioning properly (i.e. run a benchmark simulation or a short test run).
15. Once you are certain that the code in patch-install-point is good, then merge patch-install point back into the master branch. From the Git GUI dropdown menu:
  Branch / Checkout    and then click on master                # Switches to the master branch

  Merge / Local Merge  and then click on patch-install-point   # Merges patch-install-point into master

  Branch / Delete      and then click on patch-install-point   # Deletes patch-install-point branch    

--Bob Y. 09:58, 17 August 2011 (EDT)

Cherry-picking individual commits from another branch

Git allows you to cherry-pick commits, that is to pull a single commit from a different branch into the branch you are currently working on. This can prove useful in many situations. For example, a colleague may have committed a critical bug fix into his or her development version of GEOS-Chem. You may want to only grab that particular bug fix into your current branch without also getting all the other changes that your colleague made.

The best way to cherry-pick commits is via the gitk browser. Here is a simple example. Let's assume the following:

  1. You're working in branch my_branch of your local GEOS-Chem code directory.
  2. You've pulled the branch containing your colleague's bug fix update into a new branch of your local code directory called Bug-fix-branch.
  3. You are only interested in merging the commit labeled Remove obsolete LAVHRRLAI and LMODISLAI from input.geos into my_branch.

When you open the gitk browser, you will see this revision history:

Cherry pick 1.jpg


As you can see, my_branch is displayed in boldface, which indicates that is the current checked out branch.

To cherry pick the your desired commit (Remove obsolete LAVHRRLAI and LMODISLAI from input.geos) into my_branch, point to it with your cursor and press the right mouse button. You will see a context menu pop up. Select the Cherry-pick this commit option, as shown in this screenshot:

Cherry pick 2.jpg


This will bring only the selected commit into my_branch, as shown in the following screenshot. Note that my_branch now displays higher than Bug-fix-branch in the gitk browser window, because it now has the most recent commit.

Cherry pick 3.jpg

--Bob Y. 12:10, 11 October 2013 (EDT)

In summary

We recommend using the git gui because of its user-friendly interface. The following operations are best done from the GUI interface:

  1. Creating and checking out branches
  2. Committing code
  3. Merging code
  4. Deleting branches
  5. Examining revision history (you may also use gitk as standalone)

The following operations are best done from the command line:

  1. git cloneInitial download of repository
  2. git push:  Send changes to a remote repository
  3. git pullGet changes from a remote repository
  4. git tagAttach a label to a particular commit
  5. git format-patchCreate a patch

--Bob Y. 10:14, 19 March 2010 (EDT)