Parallel Performance on Solving Lengyel-Epstein Model Using Runge-Kutta Method

Authors : H.A. Bastian and M.A. Kartawidjaja

Abstract: The Lengyel-Epstein model is an Ordinary Differential Equation (ODE) model developed from Chlorite Iodide Malonic Acid (CIMA) chemical reaction in the Initial Value Problem (IVP) form. The stiffness of this model is determined by the rescaling parameter, which depends on the starch concentration. A stiff system is harder to solve and requires a larger computation time and thus, parallelization is applied. In this study, the ODE model is solved using Runge-Kutta implicit parallel scheme on a number of Personal Computers (PCs) connected in a network. The parallelization is implemented in two levels-one is parallelization across the method to solve the nonlinear equations and the other is parallelization across the system to solve the associated linear equations. The parallel environment is emulated using Parallel Virtual Machine (PVM) software. The performance of the parallel system is quantified by the speedup compared to a sequential system. The experiments showed that a significant speedup was achieved by implementing two level parallelization on the Lengyel-Epstein model.

How to cite this article:

H.A. Bastian and M.A. Kartawidjaja, 2009. Parallel Performance on Solving Lengyel-Epstein Model Using Runge-Kutta Method. Asian Journal of Information Technology, 8: 88-93.

URL: https://medwelljournals.com/abstract/?doi=ajit.2009.88.93

INTRODUCTION

Numerical modeling is commonly used in simulating natural phenomena, for example: weather predictions, air pollution, earth quakes, chemical reactions, ozone depletion, DNA structure analysis, et cetera. Modeling a system usually requires a large number of parameters and complex computational method, which in turn requires a large amount of time. One way to circumvent this problem is to use parallel computing where, the computation process is broken down into smaller processes to be solved simultaneously using a number of processors.

One model that needs intensive computational power is the Chlorite Iodide Malonic Acid (CIMA) chemical reaction introduced by Lengyel and Epstein. This model is an Initial Value Problem (IVP) involving a number of parameters related to the reactance of CIMA reaction.

Several researchers have designed and implemented ODE solver. Their research are mainly based on construction of new integration formula that accommodates parallelism. For example, Suhartanto (2007) designed a parallel iterated technique based on multistep Runge-Kutta formula. Bendtsen (1997) introduced a new formula based on Multiply Implicit Runge-Kutta (MIRK) methods that exploit parallelism across the method. De Swart et al. (1998) designed a parallel iterated technique based on implicit Runge-Kutta method. Cohen and Hindmarsh (1994, 1996) created PVODE from CVODE, which was a solver generated from two earlier solvers, VODE (Brown et al., 1989) and VODPK (Byrne, 1992), by accommodating some parallel techniques. The common part of those algorithms is the step size iteration (outer most loop) uses nonlinear iteration (Newton loop) to solve the nonlinear system and the nonlinear iterations uses some linear solvers. It means that there will be a linear solver loop if an iteration technique is used. We name the loop as inner most loop. However, none of those research investigate the effect of parallelism inside the inner most loop to the next outer loop (Newton loop) and further to the outer most loop (step size loop). This study aims to investigate this effect of parallelization on the overall performance of the ODE solver.

We propose a parallel implementation of an ODE solver based on Runge-Kutta method. The parallelization was performed in two levels, across the method to solve the arising nonlinear system and across the system to solve the related linear system.

MATERIALS AND METHODS

Lengyel-epstein model: The Lengyel-Epstein model can be declared in an Ordinary Differential Equation (ODE) system form as follows (Fengqi et al., 2008):

(1)

The parameter Ω∈Rⁿ is the reactor, parameters u and v each declares the particle concentration of activator Iodide (I-) and inhibitor Chlorite (ClO2-). Parameters a and b are parameters concerning with feed concentrations. Parameter c is the ratio of diffusion coefficient where as σ is the rescaling parameter determined by the starch concentration. The values for all parameters a, b, c and σ must be positive. The stability of the Lengyel-Epstein model is determined by parameters a, b and initial condition of the system, where as the stiffness of the system is determined by the rescaling parameter σ.

Runge-Kutta method: Generally speaking there are two classes of numerical methods that can be used to solve an IVP problem, which are one step method and multistep method. A famous class of multistage one step method is the family of Runge-Kutta methods.

The Runge-Kutta method has two different forms, i.e., explicit and implicit. Implicit form is commonly used to solve a stiff problem. The general equation of an implicit autonomous Runge-Kutta method can be declared as (Burrage, 1995):

(2)

Where,

A and b^T	=	Butcher table components
Y	=	The intermediate approximation vectors
h	=	The step size

Solving an ODE system for each integration step requires solving the non-linear system iteratively. The iterative method that is used here is the Newton method, which results in the following linear equation:

(3)

(4)

with parameter J declaring the Jacobian matrix. To reduce the computational cost, the Jacobian matrix is evaluated at a certain point, for example, at initial integration point, y₀. This is a modified Newton method as mentioned in Burrage (1995). To simplify the discussion, the operation I_s⊗ I_m-hA⊗J is symbolized with parameter M, so that (Eq. 4) can be declared as:

(5)

This linear equation is then solved with the iterative method known as GMRES (Generalized Minimal Residual) which was introduced by Saad (1986).

In advancing through the integration time, we used adaptive step size instead of fixed step size. The step size is dependent on the error from the previous step. Therefore error estimation is required in each time step. Because, global error is difficult to calculate or estimate (Skeel, 1986), the estimation was done for local error. Controlling the size of the local error will indirectly affect the size of the global error.

One way to calculate local error is Merson’s (1957) embedding method. For that purpose, we need to build a Runge-Kutta formula that differed in one order from the Runge-Kutta formula that was used. Usually the embedded formula uses a lower order (Dormand et al., 1994). If the approximation results with the first and second Runge-Kutta formula is y_n and , then local error can be declared as:

(6)

The error can be calculated by using weighted root mean square norm (Hairer et al., 1993):

(7)

with e_{n, i} declaring the ith error vector component and ewt declaring the error weighting factor, which size is declared in the following equation:

(8)

Parameter atol is the absolute tolerance value and parameter rtol is the relative tolerance value.

Parallel system speedup: A parallel system can be viewed as a collection of processing elements that communicate and cooperate to solve large problems faster than using a single processing element. In essence, a parallel system is expected to provide a better performance than that provided by a sequential system. As in single processor, most performance issues in multiprocessor can be addressed by programming or architectural techniques or both. The focus of this study is the performance issues addressed by programming techniques. There are various metrics to evaluate performance of a parallel system, such as execution time, speedup, efficiency and cost. Nevertheless, the most general metric is speedup (Wu, 1999).

Speedup denotes the performance gain of a parallel system in terms of reduced execution time compared to the execution time of the sequential system. By defining T₁ and T_p as execution time of a parallel algorithm on one and on p processors respectively, speedup is formulated as:

(9)

This speedup is termed as relative speedup. If the execution time on one processor is substituted by execution time of the best sequential algorithm, then an absolute speedup is identified.

The lower bound of the speedup is equal to one and the upper bound is equal to the number of processors in the system. A speedup that equals to this upper bound is termed as ideal speedup. In practice, sometimes the performance of a parallel system degrades by using more than one processor and thus speedup becomes <1. This fact is usually caused by excessive communication or by the small data size. In some cases, a speedup greater than the number of processors is observed. This super-linear speedup is commonly caused by the size of the data that might be too large to fit in the main memory of a single processor.

There are two distinct programming models in parallel computing platform, shred memory and distributed memory models. Two popular message passing softwares are commonly used to emulate parallel platform, i.e., Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The research was performed using PVM software (Geist et al., 1994).

Parallel Virtual Machine (PVM): PVM is software that makes a collection of computers appear as one large virtual machine. The set of computers used in the parallel environment must be defined prior to running the programs. PVM can be used on homogeneous or heterogeneous platforms. It handles transparently all message routing, data conversion and task scheduling across a network of computers.

PVM system is composed of two parts. The first part is a daemon that resides in all computers forming the virtual machine. The second part is a library of PVM interface routines that contains a complete collection of primitives that are required for cooperating processes.

PVM allows creation of any number of processes, independent on the number of processors used in the system. Each process is identified by a task ID. The processes will be mapped on to processors automatically unless overridden by the programmer.

PVM programs are generally organized in a master-slave arrangement, where a single process, referred as master process, is first executed and all other processes, child processes, are created by master process when necessary.

Parallelization strategy: The discretization process in the Lengyel-Epstein model resulted in the following equation:

(10)

The parameter value used here are a = 20, b = 1.2 and c = 2, following the recommendation (Fengqi et al., 2008) for a stable system. The stiff condition was acquired by taking the rescaling parameter σ = 8x10³. The initial condition uses two function types. The constant functions are declared as u₀(x) = 2.0 and v₀(x) = 11.0. The sinusoidal functions are declared as u₀(x) = 3 + sin(x) and v₀(x) = 10 + cos(x). The Runge-Kutta matrix used in the investigation is a fourth order Runge-Kutta matrix introduced by Iserles and NØrsett (1990). This matrix has decoupled sub-matrix structures to as shown in Fig. 1.

The error estimation here uses embedding technique by forming second approximation, which has a lower order from the Runge-Kutta formula used to solve the ODE. The modified matrix can be shown in Fig. 2 (Kartawidjaja, 2004).

The absolute tolerance and relative tolerance in Eq. 8 is set to 10^-5. Integration step size in this ODE system is calculated from Predictive Integral Derivative (PID) formula controller from H211b class introduced by Soderlind (2003) with the following equation:

(11)

Parameter h represents the integration step size, parameter r denotes error prediction and parameter ε is the multiplication of safety factor with the allowed tolerance. Tolerance value is between 0<TOL<1. Parameter k declares the order from the ODE system. To acquire a smooth integration step, parameter b = 4 is used following the recommendation by Soderlind (2003).

In order to balance the workload among processors, the number of processors used in the parallel environment must be even, which is two, four, six and eight processors. For clarity of discussion, we provide a working relationship of 2, 4 and 6 processors in Fig. 3, where p₁ denotes root processor, whilst p₂, p₃, p₄, p₅ and denote child processors. The eight processors interrelationship can be constructed analogously. Parallel environment is emulated with PVM software.


Fig. 1:	Runge-Kutta table introduced by Iserles and Nørsett (1990)


Fig. 2:	Modified Runge-Kutta table for embedding process


Fig. 3:	Working relationships of the processors

The parallelization in this study is done in two steps, across the method to solve the nonlinear system using Newton method (Kartawidjaja et al., 2004a) and across the system to solve linear system using GMRES method (Kartawidjaja et al., 2004b). If only two processors are available, then parallelization is only done to solve nonlinear system. But if more than two processors are available, parallelization is also implemented to solve the linear system.

Generally, there are three basic algebra operations in GMRES method, which are Inner Product (IP) operation, Vector Update (VU) operation and Matrix Vector product (MV) operation.

IP and VU are BLAS 1 operations, where as MV is BLAS 2 operation, which means MV operation requires a much larger computation time than the computation time needed by IP or VU operations. By that reason, parallelization in the linear system is only implemented in MV operation.

RESULTS AND DISCUSSION

The experiment was performed on a cluster of PCs, consisting of eight processors with similar characteristics, connected through a 100 Mbit Each PC has a CPU of 1.5 GHz and 512 MB RAM and the test code was written in C.

The performance of parallel computing is evaluated using speedup metric. Testing has been done for two initial conditions, initial condition in constant form and initial condition in sinusoid function. Test result of execution time for initial condition in constant form is presented in Table 1.

It is noted from Table 1 that the execution time increases with more processors for ODE dimension of 300. But for ODE dimension above 300, the execution time decreases as the number of processor increases. The same phenomenon also appears for initial condition in sinusoidal form as shown in Table 2.

By comparing Table 1 and 2, we see that ODE with initial condition in sinusoid form requires larger computation time compared to ODE with initial condition in constant form.

We can calculate the speedup from the execution time in Table 1 and 2 using Eq. 9 and the results are provided in Table 3 and 4.

It was shown that in general speedup occurs for ODE with dimension greater or equal to 400, both for initial condition in constant form (Table 3) and for initial condition in sinusoid form (Table 4). In addition, the speedup for initial condition in constant form is larger compared to the speedup for initial condition in sinusoid form, as illustrated in Fig. 4 and 5 for two and eight processors, respectively.

Our observation on single processor revealed that solving the linear system inside the Newton loop required approximately 60-70% of the overall execution time in each integration step. This large percentage of time will contribute to a better performance when more processors are available, except for small data size, i.e., 300.

Table 1:	Execution time of ODE with initial condition in constant form

Table 2:	Execution time of ODE with initial condition in sinusoid form

Table 3:	Speedup of ODE with initial condition in constant form


Fig. 4:	Speedup of ODE with initial condition in constant and sinusoid form on two processors


Fig. 5:	Speedup of ODE with initial condition in constant and sinusoid form on eight processors

Table 4:	Speedup of ODE with initial condition in sinusoid form

CONCLUSION

From the test results, we can draw the following conclusions:

•	For ODE with dimension greater or equal 400 we can acquire speedup by implementing parallelization in two levels

•	Speedup for ODE with initial condition in constant form is larger compared to speedup for ODE with initial condition in sinusoid form

Related Links

Journals By Subject

Asian Journal of Information Technology

Parallel Performance on Solving Lengyel-Epstein Model Using Runge-Kutta Method

How to cite this article: