Talk:Python code for GEOS-Chem

From Geos-chem
Revision as of 17:45, 28 May 2013 by Bmy (Talk | contribs)

Jump to: navigation, search

Benoit Bovy wrote:

To initiate a more detailed discussion on this, I'd like to share with you my opinion and some suggestions on several points. I look forward to read yours (I'm sure I've missed other points of discussion that may be relevant here). I apologize for the long text below, I haven't found a better way to share this.

Project goals and code organization

In my opinion, the success of this project will depend on the availability of the GEOS-Chem interface for experienced Python users as well as Python newbies, and even for those who don't want to learn Python. It thus requires a robust and flexible code that allows various uses (basic to advanced) and that also allows easy integration of further GEOS-Chem developments. But at the same time, it would be interesting to share less flexible Python code useful for performing more specific tasks.

To meet these requirements, I thought that a possible solution would be subdividing the project into the following components:

  1. A consistent, "generic" Python library (package), which provides basic modules for interacting with GEOS-Chem related data and for pre/post-processing and visualization (data conversion, plotting...). GEOS-Chem related data encompass inputs/outputs, datasets, the chemistry mechanism and various other parameters, for which the library should provide an interface (Python classes ?) and methods or functions to read/write data from/to various formats (e.g., bpch/netCDF4, globchem.dat/KPP...). For easier code maintenance, I would say the less dependencies the better (std library, Numpy, Matplotlib and Python-netCDF4 for example).

  2. A "repository" of individual contributions that are not suited to be part of the generic library, but that may be useful for GEOS-Chem users (e.g., too specific goals or less flexible code, examples of library usage, Python programs, code not yet "standardized", prototyping...). This repository may store a series of Python modules (scripts, command-line or GUI programs, one-file libraries...) written for various purposes, and which import the generic library and/or other specific dependencies. If needed - it will depend on the repo content, a system for repository management and introspection should avoid users and developers to get lost in a bunch of Python modules. Some code may be further included into the library if relevant.

  3. For users who are interested by an interface to GEOS-Chem but who don't want to dive into Python, maybe a good solution is to make "standard" command-line or GUI programs available outside of the repository (a "bin" directory)?

  4. A "GEOS-Chem shell" from which one can easily use the library, run the Python modules of the repository, or even compile and run GEOS-Chem and manage simulations. I think it is worth using IPython here (http://ipython.org/), given its multiple shells (terminal, Qt, notebook), its kernel/client architecture and its high-level framework for parallel computing. Even though it has a non negligible learning curve for advanced usage, it becomes very popular ; the console is embedded by default in other projects such as Enthought Canopy or Spyder. I've seen there are extensions for calling Matlab or IDL code, which may be useful for non Python users. Finally, I think that it won't take much time to turn IPython into a GEOS-Chem interactive shell (just by creating a new IPython profile and a startup script ?).

It looks quite ambitious, though much work has already been done (bpch and globchem interfaces, command-line and GUI programs)!

Coding style:

  • I think everyone agrees we should use a common coding style, a least for writing a common GEOS-Chem Python library.
  • The easiest is to follow the PEP8 (http://www.python.org/dev/peps/pep-0008/) and make an interface as "pythonic" as possible. Defining or following standards for docstring formatting may also be helpful ; we can then automatically generate online or PDF documentation using Sphinx (we can take example from Numpy, Scipy or Matplotlib docstring formats).

Project hosting:

  • Using Git? Create a remote git repository on Github or on Harvard's servers?

License:

  • GPL ?

Project name:

  • ?

Project maintainer:

  • ?

Development priorities:

  • I would guess reading/writing data from/to both of the bpch and netCDF formats is one feature to work primarily on.
  • bpch reading (and writing) is implemented differently in our own projects. Although I don't know how it is implemented in Paul Palmer and Liang Feng's code, I think that the Barron's approach is the way to go, regarding the GEOS-Chem migration towards netCDF I/O. At the same time, I find the Gerrit's approach very simple and "pythonic" with powerful filtering methods.
  • So I wonder if it is possible to combine elements from these implementations? Moreover, as netCDF will be the default format used by GEOS-Chem, shouldn't we create a unique interface for both formats, where the bpch format is viewed as an extension? I mean, for example, creating a "CTMDataset" class which directly inherits from the "Dataset" class of the python-netCDF4 library, to which we add support for bpch + filtering and export methods:
    ctm = CTMDataset('ctm.bpch', 'r', format='BPCH2')
    ... <do_something_with_ctm>
    ctm.export('ctm.nc', format='NETCDF4')
 

Gerrit Kuhlmann replied:

I agree as to your point about using a consistent, "generic" Python library (i.e. point #1 under your "Project Goals and Code Organization" heading). Furthermore, I would suggest to keep plotting functions outside from the read/write/convert methods to remove the dependency of Matplotlib (and Basemap) for users who just want to convert some files. For inexperienced users installing the Enthought Python Distribution is probably the best solution to get all important dependencies.

Working with BPCH and netCDF files over the same interface would be my preferred approach. Hence, a netCDF like interface would be the best way. There is a module for reading and writing unformatted Fortran files (uff.py) in my package, which would allow to handle the packing and unpacking in the background.

I'm not too familiar with netCDF, but I think it would be nice if datasets are aware of the model domain, such that data can be easily accessed by given arrays of coordinates:

Another question would be the Python versions we should support. Maximum compatible would be versions 2.4 to 3.

--Bob Y. 13:45, 28 May 2013 (EDT)