ECMT 1020: Introduction to Econometrics 计量经济学 代写
100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信:120591129
ECMT 1020: Introduction to Econometrics 计量经济学 代写
ECMT 1020: Introduction to Econometrics
Lecture 1
Instructor: Kadir Atalay
Contact: kadir.atalay@sydney.edu.au
School of Economics
The University of Sydney
Contact Information
Unit Coordinator & Instructor W1- W6 : Kadir Atalay
o Email: kadir.atalay@sydney.edu.au
o Office: Room 435, Merewether Building ( H04)
o Office Hours: Wednesday, 12.30 -14.30 or by appointment
Instructor W7-W13:Yi Sun
o Email: yi.sun@sydney.edu.au
o Office: Room 488, Merewether Building ( H04)
o Office Hours: Tentatively ; Tuesday 15.30 -17.30
Tutors: See Blackboard
Contact Information
Unit Coordinator & Instructor W1- W6 : Kadir Atalay
o Email: kadir.atalay@sydney.edu.au
o Office: Room 435, Merewether Building ( H04)
o Office Hours: Wednesday, 12.30 -14.30 or by appointment
Some Rules
o You should contact me by email.
o Use your USyd email - identify yourself with your name and SID
o Any questions regarding the tutorial program including administrative matters
regarding tutorial allocation should be directed to your tutor
Outline of Lecture
Course Outline
o Textbook
o Assessment
o Tutorials
o Unit Schedule
Analysis of Economic Data
o Types of Data
Univariate Data Summary
o Summary Statistics for Numerical Data
Course Website
We will have a course website on Blackboard:
o http://elearning.sydney.edu.au
Special Announcements: It is essential that you log in at least twice per week to
keep abreast of unit-wide announcements and use the resources to supplement
your learning.
UoS outline, online quizzes , practice questions, data files and lecture slides,
tutorial questions will be posted there.
Lecture slides will be posted, typically about 1 or 2 days before lecture.
Please treat lecture slides as an outline to read before the lecture and fill in the
gaps during or after class.
Textbook
The required text is
o “ ANALYSIS OF ECONOMICS DATA: AN INTRODUCTION TO
ECONOMETRICS” by A. Colin Cameron
This is a draft of book that will be published in late 2018. This version is
particularly tailored for ECMT 1020. We will cover first 17 Chapters (out of 20)
And it will be available as a course reader from University Copy Centre (by 28 th ).
o The University Copy Centre is located on the ground floor of the University
of Sydney Sports and Aquatic Centre.
There will be a copy on reserve in the library.
Additional texts for reference – all available in the library:
J.M. Wooldridge Introductory Econometrics: A Modern Approach. 5th Edition
(used in ECMT 2150); Gujurati, D.N. , Basic Econometrics , McGraw-Hill,
Assessment
• Your final grade for this unit will be based on six items:four online quizzes, a mid-
semester exam, and a final exam. All items are to be completed individually
ASSESSMENT TASKS AND DUE DATES
Assessment Name Weight Due Time Due Date
Online Quiz 1 5% noon 21-Aug-2017
Online Quiz 2 5% 20:00 8-Sept-2017
Mid-Semester Exam 30% 18.00 (Tentatively ) 12-Sept-2017
Online Quiz 3 5% noon 16-Oct-2017
Online Quiz 4 5% noon 3-Nov-2017
Final Exam 50% Final Exam Period Final Exam Period
Mid-Session Examination
o A 75 minutes exam will be held during Week 7 – (Tentatively Tuesday, 12
September 2017 , 18.00 pm ) The exact time and date will be announced
soon.
o Lecture Material for weeks 1-6 will be examined
Final Exam
o Final will be cumulative but will place greater emphasis on new topics (we
will go over what that means closer to the exam)
Lecture Topics
• First three weeks: univariate data. This is partly a recap of ECMT1010: How can we
summarize and visualize data? What can a sample tell us about the population, and
how can we express our uncertainty about such inference? What changes if we
transform our data? We will particularly focus on those aspects that are relevant for
economic analysis
• Second three weeks: bivariate data. How does one economic variable influence
another one? And again, how certain are we about our inference? We study both the
necessary theory and many economic examples
• Last six weeks: multivariate data. Here, we extend our results to cases where there is
not just one, but several explanatory variables. Finally, we also look at what to do if
the statistical model we estimate is not a good representation of economic reality: How
can we find out? And how much of our results can we salvage?
Tutorials
Tutorials start next week!
o There is one two-hour tutorial session each week, starting next week.
Participation is not mandatory, but is strongly encouraged. Tutorials are a
good opportunity to raise any questions you may have
o Use tutorials to raise questions about the material
o Exercises will be set each week. Do try to solve them!
The answers will be posted later, but before the mid-semester or final exam
In even-numbered weeks, tutorial sessions are held in regular classrooms.
These sessions are intended to become more familiar with the material
covered in class, as well as providing exam practice
In odd-numbered weeks, tutorial sessions are held in computer labs. These
sessions are intended to apply the material covered in class to real-world
economic problems, as well as learning the basics of the Stata software
package, which is widely used in later courses here at uni, as well as in
many jobs in the industry.
Computer Labs
o Week 3/5/7/9/11/13
o Computer Exercises
o Use of an econometrics or statistical package:
STATA
STATA
Throughout this unit you will be required to use a computer and specialised
econometric software. (Computer Labs/Tutorials /Online Quizzes)
The statistics and data analysis program STATA will be taught as part of this unit
– and will be regularly demonstrated during the lectures.
This software is available through the Virtual Desktop so you can use it in any of
the ICT Access Labs, Learning Hubs or Libraries. (see instructions in the UoS
outline). Also available in Labs 1-5 of Economics and Business Building (H69)
Some of the learning and access labs are listed below: PNR Learning Hub
;Carslaw Learning Hub; Wentworth Learning Hub; Law Access Lab; Madsen
Access Lab Cumberland Access Lab
If you wish to buy your own license to use STATA on your computer
o http://www.survey-design.com.au/buygradplan.html
(Small Stata will be sufficient for this course)
There is a brief introduction to Stata in Appendix A of the textbook. Generally,
we will just introduce new commands as they are needed. Stata's help facilities are
also pretty good.
Mathematics
•I appreciate many of you haven't had a lot of recent maths practice, and I'll try to
make things smooth.
• Calculus is not needed for this course, although it may help guide your intuitive
understanding of some of the material. Later ECMT courses, as well as higher-division
macro and micro units, will require it though.
• Some familiarity with basic algebra, such as working with summations, is assumed
• If you find that the algebra during the lectures or in the tutorials is moving too fast
for you,
1- please take advantage of the university's Maths Learning Centre. They have free
drop-in classes, including some specifically tailored for economics students. Don't
be ashamed or afraid, they're there to help!
2- LET ME KNOW!!!!! Happy to help you!
Chapter 1‐ Analysis of Economic Data
14
Use of Economic Data
In a nutshell, econometrics is the use of statistical methods to answer
economic questions
Describing the economic “landscape”
o What is the annual growth rate of GDP ? Has unemployment risen over
past year?
o Do people with higher levels of education earn more?
o Descriptive statistics motivate economic theory
Testing or attempting to distinguish between economic theories
o Is it true that stock returns are unpredictable?
Evaluating government and business policy
o Did those incredibly low interest rates in recent years really help stimulate
the economy?
Chapter 1‐ Analysis of Economic Data
15
RECAP ECMT 1010
Chapter 1‐ Analysis of Economic Data
16
RECAP ECMT 1010:
Chapter 1‐ Analysis of Economic Data
17
Types of Data
There are a variety of different types of data that you will encounter in economics. The
ways in which we categorize types of data include the following:
Value: numerical data, categorical data
Unit of observation: cross-section data, time series data, panel data
Number of variables: univariate data, bivariate data, multivariate data
Chapter 1‐ Analysis of Economic Data
18
Types of Data / Value / Numerical Data (Quantitative)
Numerical data are data that are naturally recorded and interpreted as numbers. They
can be continuous or discrete. Examples of numerical data include:
Annual income (continuous)
Hours worked (discrete)
Annual GDP (continuous)
Number of times a person has visited dentist (discrete)
Discrete numerical data take only integer values.
Types of Data / Value / Categorical Data
Categorical data are data that are recorded as belonging to one or more groups. They
can be recorded as numbers but these numbers have no inherent meaning. Examples of
categorical data include:
Gender ; Religion; Birth Place …
Chapter 1‐ Analysis of Economic Data
19
Chapter 1‐ Analysis of Economic Data
20
Types of Data / Units of Observation
Economics data are most often observational data, meaning they are based on
observations of actual behavior in an uncontrolled environment.
Types of Data/ Units of Observation / Cross-section data
Cross-section data are data on different entities collected at a common point in
time.
o Sample of individuals, households, firms, countries, other units taken at a
point in time (“snapshot”).
Notation: ? ? ,? ? 1,…,?
o i specifies a particular individual for an observation
o n is the total number of individuals observed ( typically called the sample
size)
o x is the value of whatever variable we are observing.
Examples: a single year of census data, unemployment rates by state for a
particular year
Chapter 1‐ Analysis of Economic Data
21
Examples of a cross-sectional data set:
Data set on hourly wages of individuals in 2014
observation hourly wage
1 17.15
2 35.54
3 51.05
.
.
.
498 16.87
499 19.00
500 41.35
? ? ,? ? 1,…,500 → ? ? ? 51.05 ; ? ??? ? 19.00
Note that the order of the observations (observation number) is not important.
Chapter 1‐ Analysis of Economic Data
22
Types of Data/Units of Observation / Time-series data
Time-series data are data on the same quantity at different points in time.
Notation: ? ? ,? ? 1,…,?
o t specifies time period of an observation
o T is the total number of time periods
o x is the value of whatever variable we are observing.
Examples: GDP of a country overtime, daily averages of the S&P,monthly
unemployment rate.
Example: data on minimum wages (Australia , 1950 to 1987)
Year hourly wage
1950 0.20
1951 0.21
1952 0.23
. .
. .
1987 3.35
Chapter 1‐ Analysis of Economic Data
23
Types of Data/ Units of Observation / Panel data
Panel data are data on different individuals with each individual observed at multiple
points in time.
Notation: ? ?,? ,? ? 1,…,?; ? ? 1,…,?
Panel data is a mixture of cross-section and time series data
Examples: earnings of USyd graduates over time; life expectancy by country over
time
Data set on hourly wages of individuals in 2013-14
observation person year hourly wage
1 1 2013 16.42
2 1 2014 17.15
3 2 2013 37.41
4 2 2014 35.54
. . . .
. . . .
499 250 2013 40.22
500 250 2014 41.35
Chapter 1‐ Analysis of Economic Data
24
Types of Data / Number of Variables / Univariate Data
Univariate data is a single data series containing observations of only one variable.
Notation: ? ? ??? ????? ??????? ???? ; ? ? ??? ???? ?????? ????
Examples: Earnings of uni.graduates in 2012; inflation rate from 1960 to 2014
Types of Data / Number of Variables / Bivariate Data
Bivariate data is composed of two potentially related data series.
Notation: ?? ? ,? ? ? ??? ????? ??????? ???? ;?? ? ,? ? ? ??? ???? ?????? ????
We are often interested in the relationship between x and y.
Examples: Education and earnings of individuals; inflation and unemployment
rates over time.
Chapter 1‐ Analysis of Economic Data
25
Types of Data / Number of Variables / Multivariate Data
Bivariate data is composed of three or more potentially related data series.
Notation: ?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ????? ??????? ???? ;
?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ???? ?????? ???? ;
We are often interested in how ? ? ,…? ? ??? ??????? ?? ?
Examples: Inputs and outputs and profits for a firm over time;
Education, experience, gender and income for a cross-section of individuals.
Chapter 1‐ Analysis of Economic Data
26
What do we do with economic data?
The basic steps of data analysis:
1- Data Summary
2- Statistical Inference
3- Interpretation
Chapter 1‐ Analysis of Economic Data
27
Steps of Data Analysis: Data Summary
To summarize data, we typically use a combination of visual representations of
the data and statistics
Visual representations include a variety of graphs and charts (scatterplots,
histograms, maps, etc.)
Statistics can measure characteristics of a single variable (mean, median, variance,
etc.) or relationships between multiple variables (covariance, correlation, linear
regression, etc.)
The choice of summary statistics and graphs depends on both the type of data
available and what the researcher is interested in
Chapter 1‐ Analysis of Economic Data
28
Steps of Data Analysis: Statistical Inference
The basic idea of statistical inference is to draw conclusions about a relationship
we cannot observe
We typically cannot reach definitive conclusions because we only get to observe a
sample rather than the population
Statistical inference requires using what we know about the sample and about
probability to reach a conclusion about the probable characteristics of variables
and relationships between them at the population level
RECAP - ECMT 1010
Chapter 1‐ Analysis of Economic Data
29
Chapter 1‐ Analysis of Economic Data
30
Reminder: Statistics (1)
• Statistics is using data to figure out as much as we can about a parameter that we cannot
observe
• Statistical model describes a population that we cannot observe. (Mainly because it
would be too much work -the education and salary of every person on Earth - or we have
a population that has infinite points “assume X follows a normal distribution…”)
• This model generally has one or a few parameters, describing the thing we're interested
in: the correlation between education and salary.
• We then assume that our dataset is a sample taken from the population we have
described. From this dataset, we calculate an estimator for the true but unknown
parameter: often something like a sample correlation, or a sample mean
• Standard practice in statistics is to use Greek letters for population quantities
? ? ,? ,?,?? ) and Latin letters for sample quantities ??̅,? ,?,? ). The textbook largely
follows this rule
Chapter 1‐ Analysis of Economic Data
31
• Finally, inference happens. Our estimator is probably not exactly equal to the
parameter, but can we say something about how far off it is likely to be? This is where
confidence intervals show up
• More formalities about sampling and inference later in the course, starting next week.
For today, we focus on the sample itself
Chapter 1‐ Analysis of Economic Data
32
Focus of this course: Regression Analysis
ECMT1010 focuses on data on a single variable considered in isolation (such as
coin toss)
In this class, we start analyzing univariate data – studying a single data series
(similar to ECMT1010)
Most economic data analysis is focused on measuring the relationship between
two or more variables.
o We want to understand the inter-relationships (and perhaps causality) ( such
as effect of minimum wage laws on unemployment)
o The main statistical method is called “regression analysis”.
Bivariate data (two related series) – Chapter 8 to12
Multivariate data (three or more related series ) – Chapter 13 to 17
Chapter 2‐ Univariate Data Summary
33
Chapter 2 - Univariate Data Summary
Univariate data are a single series of data that are observations on one variable.
A numerical data example is annual earnings for each person in a sample of
women.
A categorical data example is expenditures in each of a number of categories.
Our main focus :
(1) Summary Statistics for Numerical Data
(2) Charts for Numerical Data
Chapter 2‐ Univariate Data Summary
34
Summary Statistics for Univariate Data
Graphs are nice for giving people a quick glimpse of data
However, there is a lot of ambiguity about interpreting graphs and comparing one to
another.
Where is the mean? What is a wide distribution and what is a narrow one? Are tails
big or small? Etc.
Summary statistics give us a standardized way of summarizing univariate data
People know what the numbers mean and they can be compared across different
samples
Chapter 2‐ Univariate Data Summary
35
Types of Summary Statistics
We're often interested in describing the following characteristics of the
distribution of a data series:
o Central tendency – where is the center of the distribution of the data?
What is a typical Australian employee's salary, whatever “typical” means?
o Dispersion –how spread out is the data?
How much inequality is there in our income distribution?
o Skewness (asymmetry) – how symmetric (or asymmetric) is the distribution?
How many millionaires are there, compared to minimum-wage workers?
o Kurtosis (Peakedness) –how fat are the tails, how tall is the peak ?
How rare are minimum-wage workers and millionaires, compared to typical
earners?
Chapter 2‐ Univariate Data Summary
36
A little Math Review:
If X takes n values, ? ? ,? ? … ? ??? ,? ? their sum is
?? ?
?
???
? ? ? ? ? ? ? ? ? ? ⋯? ? ??? ? ? ?
If g(x) is a function of x, then
???? ? ?
?
???
? ??? ? ? ? ??? ? ? ? ??? ? ? ? ⋯? ??? ? ?
If “a” and “b” are constant, then
o ∑ ? ? ? ∗ ?
?
???
o ∑ ?? ?
?
???
? ?∑ ? ?
?
???
o ∑ ?? ? ?? ? ? ? ?? ? ?∑
? ?
?
???
?
???
o ∑ ?? ?
?
???
? ? ? ? ? ∑ ? ?
?
???
? ∑ ? ?
?
???
o ∑ ?? ?
?
???
∗ ? ? ? ? ∑ ? ?
?
???
∗ ∑ ? ?
?
???
Chapter 2‐ Univariate Data Summary
37
Types of Summary Statistics – Empirical Example
To go over these different types of summary statistics, we will use the following
example:
This is the distribution of annual earnings of a sample of 171 women who are 30 years
of age in 2010. The data are in “EARNINGS.dta” in BB.
0 5 10 15 20 25
Frequency
0 25000 50000 75000 100000 125000 150000 175000 200000
Earnings
Chapter 2‐ Univariate Data Summary
38
Measures of Central Tendency
A measure of central tendency / central location describes the center of the
distribution in the data
Tells us whether center of distribution is
Answer the question, “What is a typical value in this sample?”
Several measures
o Sample mean
o Sample median
o Sample midrange
o Sample mode
Chapter 2‐ Univariate Data Summary
39
The Sample Mean
Most common way to measure central tendency
It is also called as sample average
Definition:
?̅ ?
1
? ?? ?
?
???
Weights all observations equally!
STATA command mean variable_name
sum variable_name
tabstat variable_name, stat(mean)
Chapter 2‐ Univariate Data Summary
40
The Sample Median
Value that divides the sample into two halves (50% of observations are above
value and 50% are below)
Order data from lowest to highest value the median is that value that divides the
ordered data into two halves (is the one that ends up in the middle).
When n (number of observation) is an odd number, median is the middle value,
when n is an even number, use the average of the two middle observations.
Less sensitive to outliers than the sample average
(An outlying observation, or outlier is an observation that is unusually large or
small)
Other quantiles can be used
STATA command sum variable_name , detail
tabstat variable_name, stat(median)
Chapter 2‐ Univariate Data Summary
41
The Mean Vs. The Median
What is the typical Australian worker’s wages?
Among full time workers, the average wage is $72,000 per year in 2011
, the median wage is $57,400 per year in 2011
Note that mean is over 25% larger than the median.
Why is there such a big difference? Which of these numbers is more relevant.
Chapter 2‐ Univariate Data Summary
42
The Sample Midrange
The sample midrange is the average of the smallest and largest observations.
Not a very commonly used measure
Extremely sensitive to outliers
STATA command sum variable_name , detail (see 2 nd column)
The Sample Mode
The most frequently occurring value in sample
Useful with discrete data and cases where particular values are meaningful (4
years of high school,40 hours of work each week, ...).
STATA command tab
Chapter 2‐ Univariate Data Summary
43
Quartiles , Deciles and Percentiles
Median is the point that equally divides an ordered sample.
Lower Quartile is that point where ¼ (¾) of sample lies below (above)
Upper Quartile is that point where ¾ (¼) of sample lies below (above)
STATA command sum variable_name , detail (see 2 nd column)
Finer divisions:
p th percentile is the value for which p percent of the observed values are equal to
or less than the value.
Median – 50 th ; Upper Quartile- 75 th ; Lower Quartile- 25 th percentiles.
Deciles split the ordered sample into tenths.
Quantile is a percentile reported as a fraction of one rather than percentage.
(0.56 quantile =56 th percentile)
STATA command tabstat variable_name , stat(p1 p5 ..)
Chapter 2‐ Univariate Data Summary
44
These four measures of central tendency can give very different answers to the
question, what is a typical salary? Which one to use depends on which question you
are trying to answer.
Chapter 2‐ Univariate Data Summary
45
Measures of Dispersion
Characterize the spread or width of the distribution: How far away do observations
tend to be from the mean?
Different measures:
o Sample variance
o Sample standard deviation
o Sample coefficient of variation
o Sample range and inter-quartile range
Like measures of central tendency, the different measures have different benefits
and drawbacks
STATA command sum variable_name , detail (see 2 nd column)
tabstat variable_name , stat (… )
Chapter 2‐ Univariate Data Summary
46
Sample Variance
How far away do observations tend to be from the mean?
Simply calculating ?
? ∑
?? ? ? ?̅?
?
???
is not useful: positive and negative differences
cancel out and the result is always zero
So we worked with squared deviations instead. The sample variance is defined
? ? ?
1
? ? 1 ??? ?
? ?̅? ?
?
???
The division by n - 1 rather than n is a “degrees of freedom” correction, which is
necessary because we are using a sample mean ?̅ rather than the population
mean ?
When we start working with multivariate data, you'll often see n – k popping up
for much the same reason. This is worth remembering: in general,
“degrees of freedom = observations - estimated parameters”
Chapter 2‐ Univariate Data Summary
47
Sample Variance
Approximately equal to the average squared deviation from mean:
? ? ?
1
? ? 1 ??? ?
? ?̅? ?
?
???
As the sample variance increases, the spread of the data gets wider
STATA command sum variable_name , detail (see 3 rd column)
tabstat variable_name , stat(variance)
One problem with variances is that they're hard to interpret. If x is measured in
dollars, ? ? is in squared dollars - whatever that means
Chapter 2‐ Univariate Data Summary
48
Sample Standard Deviation
Standard deviation is just the square root of the variance:
? ? ? ? ? ? ?
1
? ? 1 ??? ?
? ?̅? ?
?
???
Roughly the average deviation of the data from its mean.
It has the same units as the data ( not the case in variance)
If one sample has a larger sample standard deviation than another, then we view
the sample as having greater variability.
STATA command sum variable_name (see 3 rd column)
tabstat variable_name, stat(sd)
Chapter 2‐ Univariate Data Summary
49
Interpretation of the Standard Deviation
A useful way to interpret the standard deviation is to use results for the normal
distribution (see ECMT 1010).
The probability of being within one, two standard deviations of mean is 0.68 and
0.95
For other distributions we know that at least ¾ of a random sample is within
the two standard deviation (Chebychev’s inequality)
Chapter 2‐ Univariate Data Summary
50
Recap:
Many things are approximately normally distributed. For normal distributions, we can
interpret the standard deviation as follows:
68% of the observations will be less than one standard deviation away from the
mean
95%, less than two standard deviations
Almost 100%, less than three standard deviations
Even if the distribution is not normal, we still have some bounds .
At least 75% within two sd, at least 88.89% within three sd
In general, at least a fraction 1 ? 1/? ? within c sd. This result is called
Chebychev's inequality (NO NEED TO MEMORIZE)
Chapter 2‐ Univariate Data Summary
51
Sample Coefficient of Variation
Sample standard deviation relative to sample mean
?? ?
?
?̅
Standardized measure: no units, can be compared across series.
STATA command sum variable_name, detail
(use the info in the second and third columns)
tabstat variable_name , stat(cv)
Chapter 2‐ Univariate Data Summary
52
Sample Range
Difference between the largest and smallest values in the sample
Simplest measure of dispersion but also the least interesting
Very sensitive to outliers
STATA command sum variable_name (last two columns).
tabstat variable_name , stat(range)
Sample Inter-Quartile Range
Variation on sample range that is less sensitive to outliers
Equal to difference between 75 th and 25 th percentile of the distribution
STATA command tabstat variable_name , stat(iqr)
Average Absolute Deviation
Another measure that is more resistant to outliers
?
? ?|? ?
? ?̅|
?
???
Chapter 2‐ Univariate Data Summary
53
Symmetry
A distribution is symmetric if its shape is the same when reflected around the
median. A common example is the normal distribution
Chapter 2‐ Univariate Data Summary
54
Measuring Symmetry (or Asymmetry)
Typically use skewness to measure symmetry
Right- skewed: Distribution has a long right tail and data are concentrated to the
left
Left-skewed: Distribution has a long left tail and data are concentrated to the right
Where are the mean and medians?
0 200 400 600 800
Frequency
0 2 4 6
x
Symmetric
0 500 1000 1500
Frequency
0 2 4 6
y
Right-skewed
0 200 400 600 800 1000
Frequency
0 2 4 6
z
Left-skewed
Chapter 2‐ Univariate Data Summary
55
One way to test for right- or left- skewed is to compare median to mean.
Symmetric: ?̅ ? ?????????
Right-skewed: : ?̅ ? ?????????
Left-skewed: : ?̅ ? ?????????
Formal Measure of Asymmetry is skewness test:
???? ?
1
? ∑
??
? ? ?̅? ?
?
???
? 1
? ∑
??
? ? ?̅? ?
?
???
?
?/?
Interpretation of static : symmetric = 0; right-skewed >0 ; left skewed <0.
STATA command tabstat variable_name, stat(skewness)
Chapter 2‐ Univariate Data Summary
56
Distribution of arrival delays for United Airline flights into San Francisco
International Airport, January 2014
Mean = 11.39; Median = 0 ; Skewness: 5.66
0 100 200 300 400 500
Frequency
-25 0 25 50 75 100 125 150
Arrival Delay (minutes)
Chapter 2‐ Univariate Data Summary
57
Distribution of 500 fastest 100m times as of December 2014
Mean = 9.90; Median = 9.92 ; Skewness:-1. 52
0 20 40 60 80 100
Frequency
9.58 9.63 9.68 9.73 9.78 9.83 9.88 9.93 9.98
x
Chapter 2‐ Univariate Data Summary
58
Kurtosis
Measures the relative importance of the observations in the tail of the distribution.
(How fat the tails of distribution are.)
Simplest measure is:
???? ?
1
? ∑
??
? ? ?̅? ?
?
???
? 1
? ∑
??
? ? ?̅? ?
?
???
?
?
Note: different computer programs can use slightly different formulae.
STATA command tabstat variable_name, stat(kurt)
Chapter 2‐ Univariate Data Summary
59
How to interpret:
Normal distribution with Kurtosis=3 is the benchmark.
Excess Kurtosis measures kurtosis relative to the normal distribution
?????????? ≅ ???? ? ?
If Excess Kurtosis is equal to 0, the distribution has the shape of normal distribution.
Positive Excess Kurtosis, the distribution has fat tails greater area in the tails than
for the normal distribution with the same mean and variance.
Negative Excess Kurtosis, the distribution has skinny tails.
Chapter 2‐ Univariate Data Summary
60
Chapter 2‐ Univariate Data Summary
61
ECMT 1020: Introduction to Econometrics 计量经济学 代写
How to present key summary statistics for the data?
Tables (see for example, Table 2.1 in your book)
Annual earnings of 30 year old female full time workers in 2010
Box and Whisker Plots (Box Plot)
All box-and-whisker plots give the lower quartile, median and upper quartile;
these form the “box.”
Simple box-and-whisker plots additionally give the minimum and maximum;
these form the “whiskers.”
More complicated box-and-whisker plots additionally plot outlying values.
Chapter 2‐ Univariate Data Summary
62
In complicated box and whiskers, whiskers are data-determined lower and upper
bounds. Outlying observations are the values that exceed these bounds.
This is a complicated form of
Box Plot.
In this case, upper bar equals to
upper quartile + 1.5 times inter-
quantile range.
The six dots are the outliers.
Lower bound is the minimum
sample value.
No outliers in the below lower
bound.
No values that are lower than.
(25k -1.50*(50k-25k) = -12.5k
Right-skewed data
Chapter 2‐ Univariate Data Summary
63
Chapter 2‐ Univariate Data Summary
64
Graphical Representations of Univariate Data
Chapter 2‐ Univariate Data Summary
65
Graphical Representations of Univariate Data
With univariate data, we have a few different options for graphing the data. The most
common are:
Histograms - graphs showing the frequency of occurrence of different values
Line charts - plots of the variable value against the observation number
Pie charts, bar charts, column charts - various ways to present observations
that are measured in different categories
Chapter 2‐ Univariate Data Summary
66
A Histogram example using absolute frequencies
o Absolute frequency - just the number of times a particular value is observed in the
data should be problematic if n is large (i.e. hard to read “y” axis)
STATA command histogram variable_name, frequency
0 10 20 30 40 50
Frequency
0 50000 100000 150000 200000
earnings
Chapter 2‐ Univariate Data Summary
67
A Histogram example using relative frequencies
o Relative frequency - the number of times a value is observed as a percentage of all
observations
STATA command histogram variable_name, percent
0 10 20 30
Percent
0 50000 100000 150000 200000
earnings
Chapter 2‐ Univariate Data Summary
68
Histograms
There are a few choices to make when constructing a histogram.
Whether to use absolute frequency or relative frequency for the vertical axis
o Absolute frequency - just the number of times a particular value is observed
in the data
ECMT 1020: Introduction to Econometrics 计量经济学 代写
o Relative frequency - the number of times a value is observed as a percentage
of all observations
o Either choice will lead to the same shape for the histogram
How large to make the bin sizes
o If the data take on many different values, you'll want to group data into bins
o In general, the more observations you have, the more bins you use.
o A common default choice is √?
Chapter 2‐ Univariate Data Summary
69
Histograms
Number of bins:
Few bins not enough information | too many bins hard to read
Rule of thumb is
√?
, in our example is √171 ? 13 ????
o The width of the bin (172,000 -1,050)/13 =13,150.
Stem and Leaf display (A variation of Histogram)
Chapter 2‐ Univariate Data Summary
70
Smoothed Histograms
Data that take many different values, such as earnings data, have an underlying
continuous probability density function rather than a discrete probability mass
function. (We are going to talk more about these next weeks).
This form of data can be better presented by a smooth graph, than discrete bins.
A smoothed histogram smooths the histogram in two ways.
o First, it uses rolling bins (or windows) that are overlapping rather than distinct.
o Second, in counting the fraction of the sample within each bin it gives more
weight to observations that are closest to the center of the window and less to
those near the ends of the window.
A well-known example is a kernel density estimate. (choice of window ,similar to
bin size)
Chapter 2‐ Univariate Data Summary
71
Kernel Density – Example of Earnings data
kdensity earnings kdensity earnings, bwidth(10000)
0 5.000e-06.00001 .000015 .00002 .000025
Density
0 50000 100000 150000 200000
earnings
kernel = epanechnikov, bandwidth = 5.0e+03
Default window width
0 5.000e-06 .00001 .000015 .00002
Density
0 50000 100000 150000 200000
earnings
kernel = epanechnikov, bandwidth = 1.0e+04
Wider window width
Chapter 2‐ Univariate Data Summary
72
Line Charts
When the observations in a univariate dataset have a natural order, it often makes sense
to use a line chart
A line chart plots successive values of the data against the successive index values
This offers an easy way to visualize whether values are getting larger or smaller
Line charts are most common with time series data
STATA command tsline variable_name
Data is real GSP per capita in US
Chapter 2‐ Univariate Data Summary
73
Categorical Data – Pie and Bar Charts
Histograms are good for representing numerical univariate data. For categorical
univariate data, we typically use pie charts or bar/column charts.
o Pie charts are perhaps the easiest way for people to visualize percentages
o Bar/column charts have the advantage of being able to show both relative and
absolute frequencies
o Bar/column charts will become more useful as we start adding more variables
For more on STATA graph commands:
(a) use drop-down “graphics” menu on the top-right corner
(b) type help graph in command window.
(c) Or just google…
Chapter 2‐ Univariate Data Summary
74
Some other examples of Visual Presentation of Data
Google Trends data (http://www.google.com.au/trends/) for the word “ cricket ” (blue
line) and the word “ football ”(red line) – only for Australia.
Chapter 2‐ Univariate Data Summary
75
Some other examples of Visual Presentation of Data
Wordle generated from Obama's 2009 State of the Union address (after start of
recession)
ECMT 1020: Introduction to Econometrics 计量经济学 代写