Journal of Modern Mathematics and Statistics

Year: 2009
Volume: 3
Issue: 1
Page No. 22 - 24

Principal Component Analysis of Nutritional Quality of 43 Cassava Varieties

Authors : Nwabueze Joy Chioma and Ereh Trinitas

Abstract: Data on nutritional quality of fufu flour produced from 43 cassava varieties were analyzed using multivariate methods. The cassava varieties were selected from the newly developed Cassava Mosaic Disease resistant varieties in Onne, Nigeria. Analysis showed that the 1st Principal Component (PC1) explains about 26.37%, the 2nd Principal Component explained about 21.90 (PC2) the 3rd Components (PC3) explained about 15.43% and the 4th Principal Component (PC4) explained about 14.77%. The 1st, 4 principal components accounted for about 78% of the total variation.

How to cite this article:

Nwabueze Joy Chioma and Ereh Trinitas , 2009. Principal Component Analysis of Nutritional Quality of 43 Cassava Varieties. Journal of Modern Mathematics and Statistics, 3: 22-24.

INTRODUCTION

Agriculture plays an important role in the economic development of Nigeria. It provides food and employment for the population, raw materials and foreign exchange earning for the development of industrial sector. Cassava cultivation increased after 1850 in the East African Territories as a result of the efforts of Europeans and Arabs who were pushing into the interior and recognized cassavas as a safe guard against the frequent period of famine.

Nigeria is one of the largest producers of cassava in the world. Its production is currently put at about 34 mm tones a year. In addition to cassava, being produced primarily for food in form of garri, lafun and fufu (Sanni et al., 2006), it is also processed into several secondary products of industrial market value, which include chips, pellets, flour, adhesives, alcohol and starch. These products, which constitute vital raw materials for livestock feed, alcohol/ethanol, textile, confectionary, wood, food and soft drinks industries are traceable in the international market (FAO, 2004).

Cassava verities recently developed for pest and disease resistance are those improved cassava varieties capable of resisting the attack of common cassava disease known as Cassava Mosaic Disease (CMD), a viral disease transmitted by a white fly vector (IITA, 2005).

This research analyses a set of data on nutritional composition of fufu flour produced from 43 CMD resistant varieties of cassava planted by the in International Institute of Tropical Agriculture (IITA, 2005) at Onne, Port Harcourt, Nigeria (Etudaiye et al., 2008). The nutritional composition of the fufu flours measured on these varieties included moisture content, protein, ash, fat, fiber, carbohydrate and dry matter. This set of data was analyzed using a multivariate analysis method called principal component. Principal component analysis is a multivariate analysis that involves data reduction and data interpretation. Ian (2005) stated that when large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality using Principal component analysis technique. It replaces the original variables by a smaller number of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with the smaller number very much smaller than original variables.

A number of researchers had used principal components in data analysis including Muluneh et al. (2008) and Adebowale and Onitilo (2008) that carried out a research on composition and functional properties in germ plasma for diversity and potential of yam for food and non-food applications. Their findings among others showed that starch content varied from 65.2-76.6% dry matter, while the protein content range was between 6.47 and 30.6%. Adebowale and Onitilo (2008) used principal component analysis on the chemical composition of tapioca grits from different cassava varieties and roasting methods. Their results showed that moisture, starch and sugar content accounted for 83% of the variations in the chemical composition of the tapioca samples. The objective of this study, was to locate the nutritional composition that contributed maximum variability of the fufu flour processed from the 43 CMD cassava varieties and detect the structure of the data and reduce its number. The study was also to examine the percentage contribution of each composition to the total variation in order to assess the variance.

MATERIALS AND METHODS

Theoretical framework: The principal components can be estimated using the population parameters.

(1)

where:

y1, y2, ... yp = The principal components, which are the linear combinations of the original
p  

Components variables, which in this study are the proximate compositions

var (y1) = LΣL

where:

Σ=The correlation matrix of the x variables. If yi and yi are members of y then

cor (yi yi') = 0 for all i ≠ i'

The constants Li1, Li2, ... Lip are the elements of the corresponding eigenvectors normalized so that L'i Li = 1. Thus, the 1st principal component is the vector Li, which maximizes var (L'iX) and is of length 1. Similarly, the ith principal component is that vector Li, which maximizes (LiX ) subject to L'i Li = 1.

The principal component could also be expressed in terms of the eigen value of the original variable x hence, the ith principal component is given by

(2)

where:

ei =

The eigen vector of Σ, the covariance matrix of the compositions

1 e1) = The eigen value-eigen vector pair of Σ and (λ1≥ λ2≥ ... λρ≥0)

RESULTS AND DISCUSSION

Mean vector and correlation matrix: The compositions constitute the variables for this study,

where:

x1 = Moisture
x2 = Protein
x3 = Ash
x4 = Fat
x5 = Fiber
x6 = Carbohydrate
x7 = Dry matter

The correlation between the variables was calculated and the result is displayed on Table 1. It shows that all the variables except x4 are negatively correlated with moisture content x1. Protein is positively correlated with to fat and dry matter while, it is negatively correlated with ash, fiber and carbohydrate. In summary all the variables are correlated with each other some negatively and some positively though not very highly. The eigen values of the variables were also calculated and the result is shown in Table 2. Moisture content (x1), protein (x2), ash (x3), fat (x4), fiber (x5) carbohydrate (x6) and dry matter (x7). The 1st column of Table 2 shows the variables used for this study. The 2nd column of Table 2 is the eigen values of the variables, which are 1.84591, 1.53304, 1.08011, 1.03419, 0.81937, 0.50430 and 0.1830, which sum to 7.0000, which is the total number of variables. Since, the correlation matrix is used, the total variance to be partitioned between the components is equal to the number of variables. The 3rd column gives the proportion of variation associated with each variable, which is the ration of the eigen value of variables to the total variables. The last column showed the cumulative proportion of variation, up to each variables.

Four principal components were retained for this study because there are only 4 components, whose eigen values are >1 (Kaiser, 1960). The eigenvectors are termed component scores because they give scores to the principal components for example; the 1st principal component, which is the most important component has a score of

where, the coefficient (0.28496, 0.45267,..., 0.03858) are the normalized eigenvectors.

From Table 3, the 1st principal component explains 26.37% of the total variation in this study. The other principal component scores are calculated in the same way. The 2nd-4th principal components explain, respectively, 21.90, 15.43 and 14.77% of the total variation in the study. The other component PC5, PC6 and PC7 contribute very little to the total variation and we recommend that they be neglected.

Selection of important variables in 1st 4 principal components: The variables that have scoring coefficient of at least 50% in absolute terms are retained and selected in the 4 principal component of the variables retained from the 1st principal component only carbohydrate x6 is selected.


Table 1: Correlation coefficients of the variables

Table 2: Eigen values and scoring coefficient of the variables

Table 3: Component loadings of the 7 variables

Moisture content x1 and dry matter x7 were selected in the second principal component. In the 3rd and 4th principal components, fat (x4) and variable (x3), ash was selected, respectively.

CONCLUSION

The result of the principal component analysis of the data showed that the variables can be presented adequately in just 4 dimensions because, we obtained only 4 principal components. The 1st principal component is associated with protein and carbohydrate because they have high loadings. The 2nd principal component is associated with moisture and dry matter. The 3rd and 4th principal components are associated with fat and ash, respectively. These variables explain up to 78% of the total variance. The percentage contributions of each of the 4 principal components to the total variations are 26.37, 21.90, 15.43 and 14.77%, respectively.

The correlation matrix showed that there is a significant negative relationship between carbohydrate (x6) and protein (x2) on one hand and between carbohydrate and moisture at 5% level of significance.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved