Skip to main content

Posts

Showing posts from 2017

How to write a computational climate model with confidence

I have covered many aspects of climate model related topics so far and I have also discussed quite a few examples in model development. I think it is time to warp up a post to serve as a guidance for whose who may be interested in this field.

In one sentence, if either of the following scenarios fits you, you may find this post useful:
You want to develop a new model, and you don't know where to start with;You want to modify an existing model and you also have no idea how. This post will be mainly organized in chronological order with moderate details. For each step, you may have to spend great amount of time but the effort will worth it eventually.
You need to know what question you want to answer and is there promising solution available. This step is generally considered as literature review. But during this process, you have to estimate whether you are capable to address the question under limited resources (time, intelligence, funding, etc.). You will also need to send many ema…

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

https://www.researchgate.net/post/What_does_one_mean_by_Model_Spin_Up_Time

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.
https://en.wikipedia.org/wiki/Initial_value_problem
https://en.wikipedia.org/wiki/Boundary_value_problem

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Ecosystem development: from process-oriented to object-oriented system

One of my recent projects is to develop a three-dimensional coupled water and carbon cycle ecosystem model. After finished the first version of the system, I wasn't quite satisfied with the system architecture, partly due to the complexity of the system dependency.

In fact, I have spent great amount of time trying to design the system to be well-structured. Using class in C++11, I have defined many classes to controls different types of algorithms and processes. 
The interesting part of the story started to unveil when I started to write/revise the manuscript. When I tried to explain the conceptual model, I realized that even though the current system has use object-oriented programming (OOP) approach through class, most components within the system are not actually using the OOP concept at all. In a word, the system still acts like a process-oriented program.
Taking a review of several other Earth system models (e.g., Community Land Model), I realized that unfortunately most of c…

Spatial datasets operations: a hexagon-based discrete grid systems for global simulation

After finished my three-dimensional coupled water and carbon cycle model, I have been thinking whether I can apply this approach at large spatial domain or even global scale.
During this process, I realized that most (or all) global scale land surface modeling work are based on the square grid system, which is widely used in Earth science. This grid is also common recognized as pixel, grid cell.

Can we still use grid in global scale land surface model simulation?
Yes and no. If you do not consider lateral flow, then interactions between grid cells are omitted. In this scenario, grid cell might be the easiest approach to do so.
However, if horizontal interactions are considered. Then the grid-based structure will fail. This is because latitude/longitude based structure will create singularity in polar regions like this.

And due to the distortion, it is impossible to calculate interactions within this area across polar regions.

Most maps of various variables at global scale express the …

Scientific writing: from LaTex to BibTex

I am not very familiar with Tex system, but I use it for several purposes.

To produce a high quality manuscript using Latex itself can be a little bit of challenge so I am trying to do it once for all here.

Here are steps I generally follow if everything works fine.

You will need an Overleaf account because I prefer to write in a browser.Find the template of the document. If Overleaf has one, use it directly, if not, download it and upload it as a project;Start writing your manuscript just like normal;For figures, I suggest use an separated folder and indexed names;Use Mendeley to manage all your references and enable Mendeley Bibtex feature;Connect your Overleaf with Mendeley,Add the Bibliography file from Mendeley;Add all citations into your manuscript;Download the whole project including all output;Install Bibtex on a Linux machine;Upload all the results to Linux machine;Install the bibexport package;Remove unwanted citations from the Mendeley local file using bibexport;Upload the r…

Spatial datasets operations: an overview of global climate dataset and interpolation

Climate dataset is literally the most important data in climate change research. Great efforts have been made to prepare climate dataset over the decades.

In my recent project, I need to prepare some climate dataset at global scale. Then I have to take some time to finish a review of current state of climate dataset at global scale.

First, I want to emphasize "global scale" including both arctic and antarctic. Because I will use this data in a three-dimensional ecosystem simulation, a special configuration of the data structure will be used, which is completely different from traditional approaches.

Second, where or how can we get the climate data?

The easiest source we can turn to is existing global climate dataset, such as CRU dataset, NCEP dataset. I try not to get into details of these data but rather list the source and state some critical aspects that we have to take into consideration.

CRU:
http://www.cru.uea.ac.uk/data
http://www.ipcc-data.org/observ/clim/cru_climatolo…

High performance computing: data management

When you leave a position, such as graduation from university, you usually have to backup lots of data, which is usually considered as one critical step in data management.

In general, data management is an important process in all scientific research activities. Without proper data management, research activity efficiency may be influenced. In extreme cases, you may lose valuable data due to poor data management. In some cases, it could be regarded as research misconduct as well.

Data management itself can be a project from my perspective, especially when you have to deal with massive amount of data across different platforms.

In my case, I need to manage different types of data under different environments, I will first introduce the data formats that I am dealing with, then I will introduce the file system that will be used to manage the data.

In Earth system research, data usually can be classified based on various dimensions.
Based on data properties, these dimensions can be rast…

Ecosystem modeling: uncertainty quantification

I recently read a few news articles of skeptical attitudes towards climate change, and I decided to write something about it. I am not trying to promote climate change, but instead I want to point out there is great uncertainty in our current Earth system modeling.

Uncertainty quantification (UQ) is an important step in most numerical simulation processes. In general, a simulation without uncertainty quantification is less convincing. (I was surprised that an invited speaker in department seminar said he doesn't care about uncertainty.)

However, how to conduct uncertainty quantification itself is a challenge, especially for highly nonlinear ecosystem modeling.

First, I will invite you to read the Wikipedia UQ here:
https://en.wikipedia.org/wiki/Uncertainty_quantification
which serves as an overview of the concept I will discuss below.

Uncertainty comes from lots of sources and nearly every step we take in our Earth system modeling has uncertainty. We, scientists as well as engineer…

Ecosystem modeling: model evaluation of current implementation in ECO3D 1.0

As I am preparing my defense, the first version of ECOSYSTEM 1.0 in my thesis was finally completed. Looking forward to the next chapter of my life, I thought it is time to evaluate the ECOSYSTEM 1.0 to see what it is capable of and what still needs to be improved or expanded in the near future.

First, here is a brief introduction of the ECOSYSTEM model:
ECOSYSTEM is a three-dimensional water and carbon cycle terrestrial ecosystem model. Within ECOSYSTEM model, water and carbon cycle are seamlessly coupled. The water cycle is developed based on the PRMS, and the carbon cycle is developed based on TEM. The core idea behind the coupling is that both water and carbon (potentially nitrogen and others) fluxes can flow in a three-dimensional domain, and that is exactly one of the reasons why dissolved organic carbon (DOC) can be observed in stream water.

A lot of improvements have been made upon the original PRMS and TEM models. For example, I have added a new litter pool to consider the ca…

Ecosystem modeling: a review on spatial resolution and lateral flow

We all agree that lateral flow is important in hydrology, but why most ecosystem models do not consider lateral flow?

The answer is usually related to spatial resolution. In a large scale or global scale GCM model simulation, the spatial resolution is usually $0.5^{\circ} \times 0.5^{\circ}$. At this resolution, lateral flow is usually negligible compared with vertical fluxes.

However, this procedure usually causes problems in mass balance. First, without lateral flow, freshwater into the ocean cannot be estimated accurately. Second, dissolved nutrients into the oceanic systems cannot be estimated.

So the question is at what resolution do we actually MUST consider lateral flow?

The answer depends on the fluxes you are looking into. For example, if you are looking into water flow, it it more than likely you have to always consider it, especially at regional scale. If you are looking into carbon/nitrogen fluxes, the problem will become slightly complicated.

First, we will need to evalua…

Surface water hydrology: a reach based approach

In surface watershed hydrology, stream network is usually represented by a number of connected stream segments. Outflow from a upstream is then routed to its downstream as inflow at certain time step.

One of the important parameters to determine how long the outflow will arrive the next segment is defined using the travel time. The manning equation is usually used to calculate the open channel flow velocity and rate.

In a typical surface hydrology model such as SWAT or PRMS, these parameters are either prepared or calculated for the model. However, there may be great uncertainty for high-spatial resolution simulations.

For example, a segment travel time may be far less than one hour and a daily time step simulation cannot accurately capture the peak flow at all.

In some scenarios, we are interested in the spatial distribution of flow rate, flow velocity, and other dissolved components (DOC/DIC) in the stream, therefore, a segment based approach is inappropriate.

An alternative way to …

Ecosystem modeling: challenges in simulating the dissolved organic carbon

Dissolved Organic Carbon (DOC) is an important carbon budget within ecosystem, especially for aquatic ecosystem including oceanic ecosystem.

Conventional DOC investigations mainly focus on DOC measurements in either soil profile or stream discharge. These measurements generally cannot explain where does DOC come from and how much DOC will be exported into the ocean. However, these studies have provided much insights of what biogeochemical processes are responsible for DOC dynamics.

So the question is can we quantitatively estimate DOC in terrestrial ecosystem based on existing knowledge.

I will provide more information on this topic in the coming months.

There are a few processes need to be simulated following the path of DOC.

DOC production, consumption, and transportation on surface (mainly in litter);DOC production, consumption and transportation in subsurface (mainly in soil);DOC consumption and transportation in hydrological network;In some scenarios, DOC transportation in groundw…

Scientific writing: high quality scalable vector graphics

I recently found a new way to prepare publication-ready high quality vector graphic using Google Drive.
Here is the result:



Here are steps you can follow:

Prepare a Google sheets within Google Drive;Insert chart from the menu;Define whatever color or label you like;Use development tool within Google Chrome to find the SVG tag in the html script;Open your text editor and paste the SVG html tag into it;Make up the SVG header, remember to define the SVG version.Save the text file as a SVG file.You may want to convert the SVG file to a Postscript (PS) file so you can convert it to any 600dpi raster figure. The essential idea behind this is that Google Chart API uses SVG and we are basically calling Google Chart API to produce chart without actually writing any script! The above figure is Copy right protected

High Performance Computing: Download and prepare data in a batch mode

Over the time, I need to manipulate a lot of data on a Linux cluster. Some of these manipulations actually read/write data, whereas some are essentially file system operations, such as downloading the files.
Here I present a list of similar operations suitable for HPC using pbs job approach whenever possible.
I do not attempt to include all possible methods but only the ones that I find useful and easy to prepare in seconds.
Download
The most efficient way to download MODIS alike data using HPC.
wget -r --no-parent -R "index.html*" --retr-symlinks -A "*.nc" ftp-url
wget -r --no-parent -R "index.html*" -A "MOD17A2.A2000*.hdf" -A "MOD17A2.A2000*.xml" http-url
wget -r --no-parent -R "index.html*" -A "MOD17*.hdf" -A "MOD17*.xml" http-url
You can basically setup filter for file type, year and granule id.
A live example:
///==========================================================
#!/bin/bash                       
#PBS -l…