Wednesday, February 26, 2014

Basic Mathematics - Matrices

Measure of Central Tendency

Measure Of Central Tendency

We will discuss about the measure of central tendency. It is also called 'Average'. Average gives central value/point from the given data. It helps us an idea about the whole data.

Types of Averages: -

1.     A.M.                2. Median                    3. Mode                       4. G.M.            5. H.M.

A.M.    It is denoted by Me.  (Quantitative Data). Mean is stable measure.
           


Properties of Mean:
1.         Sum of deviation of set of values from the mean is zero.
2.            Sum of square of deviation of set of values from mean is minimum.
3.         Combined Mean exist:
            Ist group mean  and number of observations n1 and IInd group mean  and      number of                    observations n2 then combined mean
                                              
            Uses:
1.     Mean is simple to understand and easy to calculate.
2.     It is rigidly defined and so its value is always definite.
3.     Its computation is based on all the observations.
4.     It is also capable of being handled algebraically, i.e. its further algebraic treatment is possible.
5.     Its value is least affected by sampling fluctuations.
Demerits:
1.     The value of A.M. is highly affected by extreme values of the variable.
2.     A.M. cannot be calculated if even a single observation in the series is missing.
3.     A.M. can be a value that does not exist in the series.
4.     A.M. cannot be determined if the data is open-end class.

Median: It is denoted by  Md .  Median is value of the size of the central value of the arranged data. It is the value of the middle item and divides the series into two equal parts.  The no. of items which are < Md and the no. of items which are > Md are same. Median is positional average.
            Individual Series:-
                        Step i) Arrange the data in ascending order or descending order.
            Step ii) Identify the the no. of values N
if  N is odd then Md = (N+1) / 2 th term
if  N is even then Md = N/2th and (N/2) + 1 th terms
            Discrete Series: -
                        Step i) Arrange the xi’s in ascending order
Step ii)            Find less than cumulative frequencies (L.C.F).
Step iii) Identify L.C.F just greater then N/2, the corresponding xi is Median.
Continuous Series: -
Step i)   Arrange the xi’s in ascending order
Step ii)  Find less than cumulative frequencies (L.C.F).
Step iii) Identify L.C.F just greater than N/2, the corresponding class is Median class.
                                                                    
Uses:
1.     It is easy to calculate and understand. In some cases it can be located simply by inspection.
2.     It is a truly defined average as it is the central position of the given data
3.     Median can be defined for qualitative data.
4.     It is not affected by extreme observations.
5.     The value of median can also be located graphically.
6.     Median is especially useful in the case of open-end distribution.
Demerits:
1.     For locating median, data have to be arranged in ascending or descending order which is quite tedious for a long series of observations.
2.     Its determination is not based on all the observations.
3.     It is not capable of further algebraic treatment.
4.     Its value is affected by fluctuations of sampling
5.     In a continuous series, median is calculated by using an interpolation formula.

Mode: It is size of which possess maximum frequency. The value of the variable which occurs most frequently in distribution is called mode. Mode is model value. It is denoted by Mo.
I.S: - Mode is the value which is maximum frequency in the data set.
D.S:- Mode if values of Xi which is corresponding to maximum frequency
C.S:-   Step 1: First we identify mode class according to the highest frequency.
                        Step2: 
                        Where, l is lower limit of modal class.
                                    f is frequency of modal class
                                    f1, f2 is preceding and succeeding frequency of modal class
            Uses:
1.               It is comparatively easy to understand.
2.               It is the simplest descriptive measure of average.
3.               It is not affected by extreme items. It can be obtained even if the extreme values are not given.
4.               It can be determined for open-end distributions.
5.               Mode has been defined as the most typical value of a distribution. Therefore, it is a useful average for many practical situations, such as, average size of shoe, average price of a commodity, the average type of dress, average wages and so on.
            Demerits:
1.               It is not precisely defined.
2.           It is not based on all the observations.
3.           It is not capable of being handled algebraically as its value is not based on all the        observations.
4.               The mode does not exist in many cases in many cases while there may be more than one mode in other cases. i.e. it is not useful as an average in such situations.
5.               The value of mode is significantly affected by the size of the class-interval which is the basis of grouping the frequencies.
6.               If the data contains single mode then it is called Uni-variate Model or Unimodel distribution.
If the data contain two modes then it is called Bi-Model distribution.
If the data contain more than two modes then it is called Multi-model distribution.


Geometric mean: -    It is denoted by G
            I.S:-           D.S:-     C.S:-

            Merits:
1.     G.M. is highly useful in averaging ratios percentages and rate of increase between two periods.
2.     G.M. is important in the construction of index numbers.
3.     It is based on all observations.
4.     It is rigidly defined.
5.     It is less affected by the extreme values.
6.     It is useful in studying economic and social data.
          Demerits:
1.     It is difficult to understand.
2.     Non-mathematical persons cannot do calculations.
3.     It has restricted applications.
4.     Any one of the observation is zero then G.M. is zero

Harmonic mean: It is denoted by H.
I.S:- H.M. is reciprocal of average of reciprocal of observations.
            D.S:-             C.S:-
            Merits:
1.     It is rigidly defined.
2.     Its computation is based on all the observations.
3.     It is capable of further algebraic treatment.
4.     It is also not affected much by sampling fluctuations.
5.     It is very useful for measuring average relative changes in certain types of rates or ratios.
            Demerits:
1.     It is not easy to understand.
2.     It is rather complicated to calculate.
3.     It gives more weight to small observation and thus may lead to fallacious results. However, in view of this property, the harmonic mean is more useful when more weights are to be given to smaller observations.
4.     Any one the observation is zero then H.M. fails.

Relation between Mean, Median and Mode:

i)                If the data is symmetric then Mean = Median = Mode.
ii)              If the data is asymmetric ( not symmetric) then Mode = 3Median – 2 Mean.
Relation between A.M, G.M, H.M:

i)                If the observations are same then A.M. = G.M. = H.M.
If the observations are different then A.M.  >  G.M. > H.M.
            Generally
ii)             
Two Numbers:  If a, b are two numbers then

Measure of Dispersion

MBA - Measure Of Dispersion

Dispersion measures the extent to which the items vary from some central value. Dispersion is also called Scatter, Spread or Variation.
Measure of Dispersion: It expresses quantitatively the degree of variation but not the direction of variation. It is called 2nd degree average. The measures are
1.        Range      2.Quartile Deviation     3.Mean Deviation    4.Variance and Standard deviation

1.     RANGE: It is denoted by R.
            I.S.:     R = Max(xi) – Min(xi)          D.S.: R = Max(xi) – Min(xi)
            C.S.:     R = Upper bound of last Class – Lower bound of First Class
Merits:
1.     It is simple to understand.
2.     It is easy to calculate and provides a broad picture of the scatteredness in the data quickly.
Demerits:
1.     Its composition is not based on all the observations.
2.     It is affected by extreme items.
3.     It is very much influenced by sampling fluctuations.
4.     Range is a crude measure of dispersion. It does not tell us about the variation in the observations relative to the average.
5.     Range cannot be calculated for open end class.

2.     Quartile Deviation: It is denoted by Q.D. It is also called Semi-inter Quartile Range.
Q.D.= (Q3 + Q1) / 2

Quartiles:  The whole data can be divided into 4 equal parts by using 3 quartiles namely Q1, Q2 and Q3.

            




          Merits:
1.     Q.D. is simple to calculate.
2.     Q.D. is easy to understand.
3.     Q.D. is useful for measuring variations in open-end classes.
4.     It is also very useful when extreme items are likely to affect the analysis.
            Demerits:
1.     The calculation of Q.D. is not based on all observations.
2.     It is not possible to give it further algebraic treatment.
3.     It is very much affect by sampling fluctuations.

3.     Mean Deviation: It is denoted by M.D.
  


           Merits:
1.     M.D. is simple to calculate.
2.     M.D. is easy to understand.
3.     Its computation is based on all the observations.
4.     It is not affected by extreme observations.
            Demerits:
1.     In the calculation of mean deviation, the algebraic signs of the deviaitons are ignored therefore its definition is non-algebraic. It view of this, mean deviation is no used in further statistical calculations.
2.     If is not well defined measure as any of the central value can be used in its computation. More so, the mean deviation calculated from different averages (mean, median, or mode) will not be the same.

4.   Variance: 
          It is denoted by 

     Standard deviation: The concept of S.D. was first introduced by Karl Pearson. It is one of the most        popular and important measure of dispersion. It satisfies most of the properties of a good measure of   dispersion.



Properties:
  1. Combined Variance:


            Merits:
1.     It is rigidly defined.
2.     Its computation is based on all the observations.
3.     It is amenable to further algebraic treatment which makes it the most important and widely used measure of dispersion. For example, the S.D. is used in computing skewness, correlation etc. It is important statistical measure in sampling theory.
4.     Among all the measures of dispersion, it is least affected by sampling fluctuations.
            Demerits:
1.     S.D. is comparatively difficult to calculate.
2.     It gives greater weight to extreme observations.
3.     It is an absolute measure of dispersion and cannot be used for comparing variability of two or more distributions expressed in different units.

Relative Measure of Dispersion:

1.     Coefficient of Variation = C.V. = 


Regression Analysis

Correlation Analysis

Correlation

Bi-variate distribution: If the distribution has two variables taking values at a time.
X :       x1,x2,…xn
Y :       y1,y2,….yn
Ex: The bivariate distribution of height and weight of the students is
                        Height cm :     165      150      155      170      163      162
                        Weight kg :     40        45        50        60        48        46

Def: The relation between two variables is correlation. In other words, If the change in one variable affects a changes in the other variable, the variable are said to be correlated or correlation.
            Ex: i) Height and Weight       ii) price and demand iii) Yield and rainfall
Types of correlation  
1.     Positive correlation: If the two variables deviate/change in same direction, i.e., if the increase (or decrease) in one variable in a corresponding increase (or decrease) in the other variable, the correlation is said to be direct or positive correlation.
Ex: Income and Expenditure, Height and Weight
2.     Negative correlation: If the two variables deviate/change in opposite direction, i.e., if the increase (or decrease) in one variable in a corresponding decrease (or increase) in the other variable, the correlation is said to be diverse or negative correlation
Ex: Price and Demand, Volume and Pressure of a perfect gas
3.     Perfect +ve correlation: Both variable changes in the same directions with same proportionalities.
4.     Perfect –ve correlation: Both variable changes in the opposite directions with same proportionalities.
5.     Un-correlation: If the change in one variable does not effect of the changes in the other variable is un-correlation.
Measurement of correlation:
1.     Scattered diagram: If one variable X is plotted on the X – axis and the other one is plotted on the Y – axis, then each paired observation shall have one point on the graph. The diagram of the dots so obtain is called scatter diagram.
This diagram can certainly give an immediate and fairly good picture as how the two variables are mutually related. If the points are very dense it will indicate that the two variables are high correlated. If the points are very widely scatted, that would indicate that a poor or weak correlation. It gives only direction of the correlation but not degree of correlation.
The diagram may be
2.     Karl Pearson correlation coefficient: It is denoted by
It is also called ‘Product moment correlation coefficient’. It gives us both nature and degree of correlation









Properties:
i.                 The linear relationship between two variables is k.p.c.c.
ii.               K.p.c.c. value always lies between two variables is -1 and +1. (i.e.)
iii.             If rXY > 0 then +ve correlation.                      If rXY < 0 then –ve correlation.
If rXY = +1 then Perfect +ve correlation.        If rXY = - 1 then Perfect –ve corr.
If rXY = 0 then Uncorrelation.
iv.             rxy = ryx
v.               Let the new variables
vi.             If X and Y are two independent r.v.s then rxy = 0 but converse is need not be true.

Merits:

1.     It is most important and popular method for measuring the relationship between two variables. It gives a precise and quantitative value indicates the degree of relationship existing between two variables.
2.     It measures the direction as well as relation between two variables.
Demerits:

1.     The value of the coefficient is affected by extreme items.
2.     Its computational procedure is difficult as compared to other methods.
3.     Coefficient’s value lies between -1 and +1, therefore it value needs a careful interpretation.

3.          Spearman rank correlation: It is denoted by .
      Properties:
i.                 S.r.c.c. value always lies between -1 and +1. i.e. .
ii.               S.r.c.c. gives us both nature and degree of correlation
iii.             Repeated Ranks: In xi’s or yi’s, some values are same then we give same rank. But we did not give same rank to them. So some error occurred. The error can be rectified by C.F. The S.R.C.C formula be
Merits:
1.     The method is simple and easily understandable as compared to the Karl Pearson’s method.
2.     The method is especially useful if data is qualitative.
3.     The method can be applied to irregular data also as it does not assume that the data should be normal.
Demerits:
1.     The method can be applied to ungrouped data only.
2.     The ranking procedure involved in this procedure ignores the actual magnitude of data and as such, the results obtained are only approximates.
3.     The computation procedure becomes difficult as the number of paired observations increases.

Probable Error: It is measure of reliability of correlation coefficient.
            If r < PE (r), there is no correlation and no significant.

            If r > 6PE(r), there is correlation and significant.