ECMT 1020: Introduction to Econometrics 计量经济学代写

100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信：153688106

ECMT 1020: Introduction to Econometrics 计量经济学代写

ECMT 1020: Introduction to Econometrics

Lecture 1

Instructor: Kadir Atalay

Contact: kadir.atalay@sydney.edu.au

School of Economics

The University of Sydney

Contact Information

 Unit Coordinator & Instructor W1- W6 : Kadir Atalay

o Email: kadir.atalay@sydney.edu.au

o Office: Room 435, Merewether Building ( H04)

o Office Hours: Wednesday, 12.30 -14.30 or by appointment

 Instructor W7-W13:Yi Sun

o Email: yi.sun@sydney.edu.au

o Office: Room 488, Merewether Building ( H04)

o Office Hours: Tentatively ; Tuesday 15.30 -17.30

 Tutors: See Blackboard

Contact Information

 Unit Coordinator & Instructor W1- W6 : Kadir Atalay

o Email: kadir.atalay@sydney.edu.au

o Office: Room 435, Merewether Building ( H04)

o Office Hours: Wednesday, 12.30 -14.30 or by appointment

 Some Rules

o You should contact me by email.

o Use your USyd email - identify yourself with your name and SID

o Any questions regarding the tutorial program including administrative matters

regarding tutorial allocation should be directed to your tutor

Outline of Lecture

 Course Outline

o Textbook

o Assessment

o Tutorials

o Unit Schedule

 Analysis of Economic Data

o Types of Data

 Univariate Data Summary

o Summary Statistics for Numerical Data

Course Website

 We will have a course website on Blackboard:

o http://elearning.sydney.edu.au

 Special Announcements: It is essential that you log in at least twice per week to

keep abreast of unit-wide announcements and use the resources to supplement

your learning.

 UoS outline, online quizzes , practice questions, data files and lecture slides,

tutorial questions will be posted there.

 Lecture slides will be posted, typically about 1 or 2 days before lecture.

 Please treat lecture slides as an outline to read before the lecture and fill in the

gaps during or after class.

Textbook

 The required text is

o “ ANALYSIS OF ECONOMICS DATA: AN INTRODUCTION TO

ECONOMETRICS” by A. Colin Cameron

 This is a draft of book that will be published in late 2018. This version is

particularly tailored for ECMT 1020. We will cover first 17 Chapters (out of 20)

 And it will be available as a course reader from University Copy Centre (by 28 th ).

o The University Copy Centre is located on the ground floor of the University

of Sydney Sports and Aquatic Centre.

 There will be a copy on reserve in the library.

 Additional texts for reference – all available in the library:

J.M. Wooldridge Introductory Econometrics: A Modern Approach. 5th Edition

(used in ECMT 2150); Gujurati, D.N. , Basic Econometrics , McGraw-Hill,

Assessment

• Your final grade for this unit will be based on six items:four online quizzes, a mid-

semester exam, and a final exam. All items are to be completed individually

ASSESSMENT TASKS AND DUE DATES

Assessment Name Weight Due Time Due Date

Online Quiz 1 5% noon 21-Aug-2017

Online Quiz 2 5% 20:00 8-Sept-2017

Mid-Semester Exam 30% 18.00 (Tentatively ) 12-Sept-2017

Online Quiz 3 5% noon 16-Oct-2017

Online Quiz 4 5% noon 3-Nov-2017

Final Exam 50% Final Exam Period Final Exam Period

 Mid-Session Examination

o A 75 minutes exam will be held during Week 7 – (Tentatively Tuesday, 12

September 2017 , 18.00 pm ) The exact time and date will be announced

soon.

o Lecture Material for weeks 1-6 will be examined

 Final Exam

o Final will be cumulative but will place greater emphasis on new topics (we

will go over what that means closer to the exam)

Lecture Topics

• First three weeks: univariate data. This is partly a recap of ECMT1010: How can we

summarize and visualize data? What can a sample tell us about the population, and

how can we express our uncertainty about such inference? What changes if we

transform our data? We will particularly focus on those aspects that are relevant for

economic analysis

• Second three weeks: bivariate data. How does one economic variable influence

another one? And again, how certain are we about our inference? We study both the

necessary theory and many economic examples

• Last six weeks: multivariate data. Here, we extend our results to cases where there is

not just one, but several explanatory variables. Finally, we also look at what to do if

the statistical model we estimate is not a good representation of economic reality: How

can we find out? And how much of our results can we salvage?

Tutorials

 Tutorials start next week!

o There is one two-hour tutorial session each week, starting next week.

Participation is not mandatory, but is strongly encouraged. Tutorials are a

good opportunity to raise any questions you may have

o Use tutorials to raise questions about the material

o Exercises will be set each week. Do try to solve them!

 The answers will be posted later, but before the mid-semester or final exam

 In even-numbered weeks, tutorial sessions are held in regular classrooms.

These sessions are intended to become more familiar with the material

covered in class, as well as providing exam practice

 In odd-numbered weeks, tutorial sessions are held in computer labs. These

sessions are intended to apply the material covered in class to real-world

economic problems, as well as learning the basics of the Stata software

package, which is widely used in later courses here at uni, as well as in

many jobs in the industry.

 Computer Labs

o Week 3/5/7/9/11/13

o Computer Exercises

o Use of an econometrics or statistical package:

 STATA

STATA

 Throughout this unit you will be required to use a computer and specialised

econometric software. (Computer Labs/Tutorials /Online Quizzes)

 The statistics and data analysis program STATA will be taught as part of this unit

– and will be regularly demonstrated during the lectures.

 This software is available through the Virtual Desktop so you can use it in any of

the ICT Access Labs, Learning Hubs or Libraries. (see instructions in the UoS

outline). Also available in Labs 1-5 of Economics and Business Building (H69)

 Some of the learning and access labs are listed below: PNR Learning Hub

;Carslaw Learning Hub; Wentworth Learning Hub; Law Access Lab; Madsen

Access Lab Cumberland Access Lab

 If you wish to buy your own license to use STATA on your computer

o http://www.survey-design.com.au/buygradplan.html

(Small Stata will be sufficient for this course)

 There is a brief introduction to Stata in Appendix A of the textbook. Generally,

we will just introduce new commands as they are needed. Stata's help facilities are

also pretty good.

Mathematics

•I appreciate many of you haven't had a lot of recent maths practice, and I'll try to

make things smooth.

• Calculus is not needed for this course, although it may help guide your intuitive

understanding of some of the material. Later ECMT courses, as well as higher-division

macro and micro units, will require it though.

• Some familiarity with basic algebra, such as working with summations, is assumed

• If you find that the algebra during the lectures or in the tutorials is moving too fast

for you,

1- please take advantage of the university's Maths Learning Centre. They have free

drop-in classes, including some specifically tailored for economics students. Don't

be ashamed or afraid, they're there to help!

2- LET ME KNOW!!!!! Happy to help you!

Chapter 1‐ Analysis of Economic Data

Use of Economic Data

 In a nutshell, econometrics is the use of statistical methods to answer

economic questions

 Describing the economic “landscape”

o What is the annual growth rate of GDP ? Has unemployment risen over

past year?

o Do people with higher levels of education earn more?

o Descriptive statistics motivate economic theory

 Testing or attempting to distinguish between economic theories

o Is it true that stock returns are unpredictable?

 Evaluating government and business policy

o Did those incredibly low interest rates in recent years really help stimulate

the economy?

Chapter 1‐ Analysis of Economic Data

RECAP ECMT 1010

Chapter 1‐ Analysis of Economic Data

RECAP ECMT 1010:

Chapter 1‐ Analysis of Economic Data

Types of Data

There are a variety of different types of data that you will encounter in economics. The

ways in which we categorize types of data include the following:

 Value: numerical data, categorical data

 Unit of observation: cross-section data, time series data, panel data

 Number of variables: univariate data, bivariate data, multivariate data

Chapter 1‐ Analysis of Economic Data

Types of Data / Value / Numerical Data (Quantitative)

Numerical data are data that are naturally recorded and interpreted as numbers. They

can be continuous or discrete. Examples of numerical data include:

 Annual income (continuous)

 Hours worked (discrete)

 Annual GDP (continuous)

 Number of times a person has visited dentist (discrete)

Discrete numerical data take only integer values.

Types of Data / Value / Categorical Data

Categorical data are data that are recorded as belonging to one or more groups. They

can be recorded as numbers but these numbers have no inherent meaning. Examples of

categorical data include:

 Gender ; Religion; Birth Place …

Chapter 1‐ Analysis of Economic Data

Types of Data / Units of Observation

Economics data are most often observational data, meaning they are based on

observations of actual behavior in an uncontrolled environment.

Types of Data/ Units of Observation / Cross-section data

 Cross-section data are data on different entities collected at a common point in

time.

o Sample of individuals, households, firms, countries, other units taken at a

point in time (“snapshot”).

 Notation: ? ? ,? ? 1,…,?

o i specifies a particular individual for an observation

o n is the total number of individuals observed ( typically called the sample

size)

o x is the value of whatever variable we are observing.

 Examples: a single year of census data, unemployment rates by state for a

particular year

Chapter 1‐ Analysis of Economic Data

Examples of a cross-sectional data set:

Data set on hourly wages of individuals in 2014

observation hourly wage

1 17.15

2 35.54

3 51.05

498 16.87

499 19.00

500 41.35

? ? ,? ? 1,…,500 → ? ? ? 51.05 ; ? ??? ? 19.00

Note that the order of the observations (observation number) is not important.

Chapter 1‐ Analysis of Economic Data

Types of Data/Units of Observation / Time-series data

 Time-series data are data on the same quantity at different points in time.

 Notation: ? ? ,? ? 1,…,?

o t specifies time period of an observation

o T is the total number of time periods

o x is the value of whatever variable we are observing.

 Examples: GDP of a country overtime, daily averages of the S&P,monthly

unemployment rate.

Example: data on minimum wages (Australia , 1950 to 1987)

Year hourly wage

1950 0.20

1951 0.21

1952 0.23

. .

1987 3.35

Chapter 1‐ Analysis of Economic Data

Types of Data/ Units of Observation / Panel data

Panel data are data on different individuals with each individual observed at multiple

points in time.

 Notation: ? ?,? ,? ? 1,…,?; ? ? 1,…,?

 Panel data is a mixture of cross-section and time series data

 Examples: earnings of USyd graduates over time; life expectancy by country over

time

 Data set on hourly wages of individuals in 2013-14

observation person year hourly wage

1 1 2013 16.42

2 1 2014 17.15

3 2 2013 37.41

4 2 2014 35.54

. . . .

499 250 2013 40.22

500 250 2014 41.35

Chapter 1‐ Analysis of Economic Data

Types of Data / Number of Variables / Univariate Data

Univariate data is a single data series containing observations of only one variable.

 Notation: ? ? ??? ????? ??????? ???? ; ? ? ??? ???? ?????? ????

 Examples: Earnings of uni.graduates in 2012; inflation rate from 1960 to 2014

Types of Data / Number of Variables / Bivariate Data

Bivariate data is composed of two potentially related data series.

 Notation: ?? ? ,? ? ? ??? ????? ??????? ???? ;?? ? ,? ? ? ??? ???? ?????? ????

 We are often interested in the relationship between x and y.

 Examples: Education and earnings of individuals; inflation and unemployment

rates over time.

Chapter 1‐ Analysis of Economic Data

Types of Data / Number of Variables / Multivariate Data

Bivariate data is composed of three or more potentially related data series.

 Notation: ?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ????? ??????? ???? ;

?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ???? ?????? ???? ;

 We are often interested in how ? ? ,…? ? ??? ??????? ?? ?

 Examples: Inputs and outputs and profits for a firm over time;

Education, experience, gender and income for a cross-section of individuals.

Chapter 1‐ Analysis of Economic Data

What do we do with economic data?

The basic steps of data analysis:

1- Data Summary

2- Statistical Inference

3- Interpretation

Chapter 1‐ Analysis of Economic Data

Steps of Data Analysis: Data Summary

 To summarize data, we typically use a combination of visual representations of

the data and statistics

 Visual representations include a variety of graphs and charts (scatterplots,

histograms, maps, etc.)

 Statistics can measure characteristics of a single variable (mean, median, variance,

etc.) or relationships between multiple variables (covariance, correlation, linear

regression, etc.)

 The choice of summary statistics and graphs depends on both the type of data

available and what the researcher is interested in

Chapter 1‐ Analysis of Economic Data

Steps of Data Analysis: Statistical Inference

 The basic idea of statistical inference is to draw conclusions about a relationship

we cannot observe

 We typically cannot reach definitive conclusions because we only get to observe a

sample rather than the population

 Statistical inference requires using what we know about the sample and about

probability to reach a conclusion about the probable characteristics of variables

and relationships between them at the population level

RECAP - ECMT 1010

Chapter 1‐ Analysis of Economic Data

Reminder: Statistics (1)

• Statistics is using data to figure out as much as we can about a parameter that we cannot

observe

• Statistical model describes a population that we cannot observe. (Mainly because it

would be too much work -the education and salary of every person on Earth - or we have

a population that has infinite points “assume X follows a normal distribution…”)

• This model generally has one or a few parameters, describing the thing we're interested

in: the correlation between education and salary.

• We then assume that our dataset is a sample taken from the population we have

described. From this dataset, we calculate an estimator for the true but unknown

parameter: often something like a sample correlation, or a sample mean

• Standard practice in statistics is to use Greek letters for population quantities

? ? ,? ,?,?? ) and Latin letters for sample quantities ??̅,? ,?,? ). The textbook largely

follows this rule

Chapter 1‐ Analysis of Economic Data

• Finally, inference happens. Our estimator is probably not exactly equal to the

parameter, but can we say something about how far off it is likely to be? This is where

confidence intervals show up

• More formalities about sampling and inference later in the course, starting next week.

For today, we focus on the sample itself

Chapter 1‐ Analysis of Economic Data

Focus of this course: Regression Analysis

 ECMT1010 focuses on data on a single variable considered in isolation (such as

coin toss)

 In this class, we start analyzing univariate data – studying a single data series

(similar to ECMT1010)

 Most economic data analysis is focused on measuring the relationship between

two or more variables.

o We want to understand the inter-relationships (and perhaps causality) ( such

as effect of minimum wage laws on unemployment)

o The main statistical method is called “regression analysis”.

 Bivariate data (two related series) – Chapter 8 to12

 Multivariate data (three or more related series ) – Chapter 13 to 17

Chapter 2‐ Univariate Data Summary

Chapter 2 - Univariate Data Summary

 Univariate data are a single series of data that are observations on one variable.

 A numerical data example is annual earnings for each person in a sample of

women.

 A categorical data example is expenditures in each of a number of categories.

Our main focus :

 (1) Summary Statistics for Numerical Data

 (2) Charts for Numerical Data

Chapter 2‐ Univariate Data Summary

Summary Statistics for Univariate Data

 Graphs are nice for giving people a quick glimpse of data

 However, there is a lot of ambiguity about interpreting graphs and comparing one to

another.

 Where is the mean? What is a wide distribution and what is a narrow one? Are tails

big or small? Etc.

 Summary statistics give us a standardized way of summarizing univariate data

 People know what the numbers mean and they can be compared across different

samples

Chapter 2‐ Univariate Data Summary

Types of Summary Statistics

 We're often interested in describing the following characteristics of the

distribution of a data series:

o Central tendency – where is the center of the distribution of the data?

What is a typical Australian employee's salary, whatever “typical” means?

o Dispersion –how spread out is the data?

How much inequality is there in our income distribution?

o Skewness (asymmetry) – how symmetric (or asymmetric) is the distribution?

How many millionaires are there, compared to minimum-wage workers?

o Kurtosis (Peakedness) –how fat are the tails, how tall is the peak ?

How rare are minimum-wage workers and millionaires, compared to typical

earners?

Chapter 2‐ Univariate Data Summary

A little Math Review:

If X takes n values, ? ? ,? ? … ? ??? ,? ? their sum is

?? ?

???

? ? ? ? ? ? ? ? ? ? ⋯? ? ??? ? ? ?

 If g(x) is a function of x, then

???? ? ?

???

? ??? ? ? ? ??? ? ? ? ??? ? ? ? ⋯? ??? ? ?

 If “a” and “b” are constant, then

o ∑ ? ? ? ∗ ?

???

o ∑ ?? ?

???

? ?∑ ? ?

???

o ∑ ?? ? ?? ? ? ? ?? ? ?∑

? ?

???

o ∑ ?? ?

???

? ? ? ? ? ∑ ? ?

???

? ∑ ? ?

???

o ∑ ?? ?

???

∗ ? ? ? ? ∑ ? ?

???

∗ ∑ ? ?

???

Chapter 2‐ Univariate Data Summary

Types of Summary Statistics – Empirical Example

To go over these different types of summary statistics, we will use the following

example:

This is the distribution of annual earnings of a sample of 171 women who are 30 years

of age in 2010. The data are in “EARNINGS.dta” in BB.

0 5 10 15 20 25

Frequency

0 25000 50000 75000 100000 125000 150000 175000 200000

Earnings

Chapter 2‐ Univariate Data Summary

Measures of Central Tendency

A measure of central tendency / central location describes the center of the

distribution in the data

 Tells us whether center of distribution is

 Answer the question, “What is a typical value in this sample?”

 Several measures

o Sample mean

o Sample median

o Sample midrange

o Sample mode

Chapter 2‐ Univariate Data Summary

The Sample Mean

 Most common way to measure central tendency

 It is also called as sample average

 Definition:

?̅ ?

? ?? ?

???

 Weights all observations equally!

 STATA command mean variable_name

sum variable_name

tabstat variable_name, stat(mean)

Chapter 2‐ Univariate Data Summary

The Sample Median

 Value that divides the sample into two halves (50% of observations are above

value and 50% are below)

 Order data from lowest to highest value the median is that value that divides the

ordered data into two halves (is the one that ends up in the middle).

 When n (number of observation) is an odd number, median is the middle value,

when n is an even number, use the average of the two middle observations.

 Less sensitive to outliers than the sample average

(An outlying observation, or outlier is an observation that is unusually large or

small)

 Other quantiles can be used

 STATA command sum variable_name , detail

tabstat variable_name, stat(median)

Chapter 2‐ Univariate Data Summary

The Mean Vs. The Median

What is the typical Australian worker’s wages?

Among full time workers, the average wage is $72,000 per year in 2011

, the median wage is $57,400 per year in 2011

 Note that mean is over 25% larger than the median.

 Why is there such a big difference? Which of these numbers is more relevant.

Chapter 2‐ Univariate Data Summary

The Sample Midrange

 The sample midrange is the average of the smallest and largest observations.

 Not a very commonly used measure

 Extremely sensitive to outliers

 STATA command sum variable_name , detail (see 2 nd column)

The Sample Mode

 The most frequently occurring value in sample

 Useful with discrete data and cases where particular values are meaningful (4

years of high school,40 hours of work each week, ...).

 STATA command tab

Chapter 2‐ Univariate Data Summary

Quartiles , Deciles and Percentiles

 Median is the point that equally divides an ordered sample.

 Lower Quartile is that point where ¼ (¾) of sample lies below (above)

 Upper Quartile is that point where ¾ (¼) of sample lies below (above)

 STATA command sum variable_name , detail (see 2 nd column)

Finer divisions:

 p th percentile is the value for which p percent of the observed values are equal to

or less than the value.

 Median – 50 th ; Upper Quartile- 75 th ; Lower Quartile- 25 th percentiles.

 Deciles split the ordered sample into tenths.

 Quantile is a percentile reported as a fraction of one rather than percentage.

(0.56 quantile =56 th percentile)

STATA command tabstat variable_name , stat(p1 p5 ..)

Chapter 2‐ Univariate Data Summary

These four measures of central tendency can give very different answers to the

question, what is a typical salary? Which one to use depends on which question you

are trying to answer.

Chapter 2‐ Univariate Data Summary

Measures of Dispersion

 Characterize the spread or width of the distribution: How far away do observations

tend to be from the mean?

 Different measures:

o Sample variance

o Sample standard deviation

o Sample coefficient of variation

o Sample range and inter-quartile range

 Like measures of central tendency, the different measures have different benefits

and drawbacks

 STATA command sum variable_name , detail (see 2 nd column)

tabstat variable_name , stat (… )

Chapter 2‐ Univariate Data Summary

Sample Variance

How far away do observations tend to be from the mean?

Simply calculating ?

? ∑

?? ? ? ?̅?

???

is not useful: positive and negative differences

cancel out and the result is always zero

So we worked with squared deviations instead. The sample variance is defined

? ? ?

? ? 1 ??? ?

? ?̅? ?

???

 The division by n - 1 rather than n is a “degrees of freedom” correction, which is

necessary because we are using a sample mean ?̅ rather than the population

mean ?

 When we start working with multivariate data, you'll often see n – k popping up

for much the same reason. This is worth remembering: in general,

“degrees of freedom = observations - estimated parameters”

Chapter 2‐ Univariate Data Summary

Sample Variance

 Approximately equal to the average squared deviation from mean:

? ? ?

? ? 1 ??? ?

? ?̅? ?

???

 As the sample variance increases, the spread of the data gets wider

 STATA command sum variable_name , detail (see 3 rd column)

tabstat variable_name , stat(variance)

 One problem with variances is that they're hard to interpret. If x is measured in

dollars, ? ? is in squared dollars - whatever that means

Chapter 2‐ Univariate Data Summary

Sample Standard Deviation

 Standard deviation is just the square root of the variance:

? ? ? ? ? ? ?

? ? 1 ??? ?

? ?̅? ?

???

 Roughly the average deviation of the data from its mean.

 It has the same units as the data ( not the case in variance)

 If one sample has a larger sample standard deviation than another, then we view

the sample as having greater variability.

 STATA command sum variable_name (see 3 rd column)

tabstat variable_name, stat(sd)

Chapter 2‐ Univariate Data Summary

Interpretation of the Standard Deviation

A useful way to interpret the standard deviation is to use results for the normal

distribution (see ECMT 1010).

 The probability of being within one, two standard deviations of mean is 0.68 and

0.95

 For other distributions  we know that at least ¾ of a random sample is within

the two standard deviation (Chebychev’s inequality)

Chapter 2‐ Univariate Data Summary

Recap:

Many things are approximately normally distributed. For normal distributions, we can

interpret the standard deviation as follows:

 68% of the observations will be less than one standard deviation away from the

mean

 95%, less than two standard deviations

 Almost 100%, less than three standard deviations

 Even if the distribution is not normal, we still have some bounds .

At least 75% within two sd, at least 88.89% within three sd

 In general, at least a fraction 1 ? 1/? ? within c sd. This result is called

Chebychev's inequality (NO NEED TO MEMORIZE)

Chapter 2‐ Univariate Data Summary

Sample Coefficient of Variation

 Sample standard deviation relative to sample mean

?? ?

?̅

 Standardized measure: no units, can be compared across series.

 STATA command sum variable_name, detail

(use the info in the second and third columns)

tabstat variable_name , stat(cv)

Chapter 2‐ Univariate Data Summary

 Sample Range

 Difference between the largest and smallest values in the sample

 Simplest measure of dispersion but also the least interesting

 Very sensitive to outliers

 STATA command sum variable_name (last two columns).

tabstat variable_name , stat(range)

Sample Inter-Quartile Range

 Variation on sample range that is less sensitive to outliers

 Equal to difference between 75 th and 25 th percentile of the distribution

 STATA command tabstat variable_name , stat(iqr)

Average Absolute Deviation

 Another measure that is more resistant to outliers

? ?|? ?

? ?̅|

???

Chapter 2‐ Univariate Data Summary

Symmetry

 A distribution is symmetric if its shape is the same when reflected around the

median. A common example is the normal distribution

Chapter 2‐ Univariate Data Summary

Measuring Symmetry (or Asymmetry)

 Typically use skewness to measure symmetry

 Right- skewed: Distribution has a long right tail and data are concentrated to the

left

 Left-skewed: Distribution has a long left tail and data are concentrated to the right

Where are the mean and medians?

0 200 400 600 800

Frequency

0 2 4 6

Symmetric

0 500 1000 1500

Frequency

0 2 4 6

Right-skewed

0 200 400 600 800 1000

Frequency

0 2 4 6

Left-skewed

Chapter 2‐ Univariate Data Summary

 One way to test for right- or left- skewed is to compare median to mean.

Symmetric: ?̅ ? ?????????

Right-skewed: : ?̅ ? ?????????

 Left-skewed: : ?̅ ? ?????????

 Formal Measure of Asymmetry is skewness test:

???? ?

? ∑

? ? ?̅? ?

???

? 1

? ∑

? ? ?̅? ?

???

?/?

Interpretation of static : symmetric = 0; right-skewed >0 ; left skewed <0.

STATA command tabstat variable_name, stat(skewness)

Chapter 2‐ Univariate Data Summary

Distribution of arrival delays for United Airline flights into San Francisco

International Airport, January 2014

Mean = 11.39; Median = 0 ; Skewness: 5.66

0 100 200 300 400 500

Frequency

-25 0 25 50 75 100 125 150

Arrival Delay (minutes)

Chapter 2‐ Univariate Data Summary

Distribution of 500 fastest 100m times as of December 2014

Mean = 9.90; Median = 9.92 ; Skewness:-1. 52

0 20 40 60 80 100

Frequency

9.58 9.63 9.68 9.73 9.78 9.83 9.88 9.93 9.98

Chapter 2‐ Univariate Data Summary

Kurtosis

 Measures the relative importance of the observations in the tail of the distribution.

(How fat the tails of distribution are.)

 Simplest measure is:

???? ?

? ∑

? ? ?̅? ?

???

? 1

? ∑

? ? ?̅? ?

???

Note: different computer programs can use slightly different formulae.

STATA command tabstat variable_name, stat(kurt)

Chapter 2‐ Univariate Data Summary

How to interpret:

 Normal distribution with Kurtosis=3 is the benchmark.

 Excess Kurtosis measures kurtosis relative to the normal distribution

?????????? ≅ ???? ? ?

If Excess Kurtosis is equal to 0, the distribution has the shape of normal distribution.

Positive Excess Kurtosis, the distribution has fat tails  greater area in the tails than

for the normal distribution with the same mean and variance.

Negative Excess Kurtosis, the distribution has skinny tails.

Chapter 2‐ Univariate Data Summary

61

ECMT 1020: Introduction to Econometrics 计量经济学代写

How to present key summary statistics for the data?

 Tables (see for example, Table 2.1 in your book)

Annual earnings of 30 year old female full time workers in 2010

 Box and Whisker Plots (Box Plot)

 All box-and-whisker plots give the lower quartile, median and upper quartile;

these form the “box.”

 Simple box-and-whisker plots additionally give the minimum and maximum;

these form the “whiskers.”

 More complicated box-and-whisker plots additionally plot outlying values.

Chapter 2‐ Univariate Data Summary

 In complicated box and whiskers, whiskers are data-determined lower and upper

bounds. Outlying observations are the values that exceed these bounds.

 This is a complicated form of

Box Plot.

 In this case, upper bar equals to

upper quartile + 1.5 times inter-

quantile range.

 The six dots are the outliers.

 Lower bound is the minimum

sample value.

 No outliers in the below lower

bound.

 No values that are lower than.

(25k -1.50*(50k-25k) = -12.5k

 Right-skewed data

Chapter 2‐ Univariate Data Summary

Graphical Representations of Univariate Data

Chapter 2‐ Univariate Data Summary

Graphical Representations of Univariate Data

With univariate data, we have a few different options for graphing the data. The most

common are:

 Histograms - graphs showing the frequency of occurrence of different values

 Line charts - plots of the variable value against the observation number

 Pie charts, bar charts, column charts - various ways to present observations

that are measured in different categories

Chapter 2‐ Univariate Data Summary

A Histogram example using absolute frequencies

o Absolute frequency - just the number of times a particular value is observed in the

data  should be problematic if n is large (i.e. hard to read “y” axis)

 STATA command  histogram variable_name, frequency

0 10 20 30 40 50

Frequency

0 50000 100000 150000 200000

earnings

Chapter 2‐ Univariate Data Summary

A Histogram example using relative frequencies

o Relative frequency - the number of times a value is observed as a percentage of all

observations

 STATA command  histogram variable_name, percent

0 10 20 30

Percent

0 50000 100000 150000 200000

earnings

Chapter 2‐ Univariate Data Summary

Histograms

There are a few choices to make when constructing a histogram.

 Whether to use absolute frequency or relative frequency for the vertical axis

o Absolute frequency - just the number of times a particular value is observed

in the data

ECMT 1020: Introduction to Econometrics 计量经济学代写

o Relative frequency - the number of times a value is observed as a percentage

of all observations

o Either choice will lead to the same shape for the histogram

 How large to make the bin sizes

o If the data take on many different values, you'll want to group data into bins

o In general, the more observations you have, the more bins you use.

o A common default choice is √?

Chapter 2‐ Univariate Data Summary

Histograms

Number of bins:

 Few bins  not enough information | too many bins  hard to read

 Rule of thumb is

√?

, in our example is √171 ? 13 ????

o The width of the bin (172,000 -1,050)/13 =13,150.

Stem and Leaf display (A variation of Histogram)

Chapter 2‐ Univariate Data Summary

Smoothed Histograms

 Data that take many different values, such as earnings data, have an underlying

continuous probability density function rather than a discrete probability mass

function. (We are going to talk more about these next weeks).

 This form of data can be better presented by a smooth graph, than discrete bins.

 A smoothed histogram smooths the histogram in two ways.

o First, it uses rolling bins (or windows) that are overlapping rather than distinct.

o Second, in counting the fraction of the sample within each bin it gives more

weight to observations that are closest to the center of the window and less to

those near the ends of the window.

 A well-known example is a kernel density estimate. (choice of window ,similar to

bin size)

Chapter 2‐ Univariate Data Summary

Kernel Density – Example of Earnings data

kdensity earnings kdensity earnings, bwidth(10000)

0 5.000e-06.00001 .000015 .00002 .000025

Density

0 50000 100000 150000 200000

earnings

kernel = epanechnikov, bandwidth = 5.0e+03

Default window width

0 5.000e-06 .00001 .000015 .00002

Density

0 50000 100000 150000 200000

earnings

kernel = epanechnikov, bandwidth = 1.0e+04

Wider window width

Chapter 2‐ Univariate Data Summary

Line Charts

When the observations in a univariate dataset have a natural order, it often makes sense

to use a line chart

 A line chart plots successive values of the data against the successive index values

 This offers an easy way to visualize whether values are getting larger or smaller

 Line charts are most common with time series data

 STATA command  tsline variable_name

Data is real GSP per capita in US

Chapter 2‐ Univariate Data Summary

Categorical Data – Pie and Bar Charts

 Histograms are good for representing numerical univariate data. For categorical

univariate data, we typically use pie charts or bar/column charts.

o Pie charts are perhaps the easiest way for people to visualize percentages

o Bar/column charts have the advantage of being able to show both relative and

absolute frequencies

o Bar/column charts will become more useful as we start adding more variables

For more on STATA graph commands:

(a) use drop-down “graphics” menu on the top-right corner

(b) type help graph in command window.

Chapter 2‐ Univariate Data Summary

Some other examples of Visual Presentation of Data

Google Trends data (http://www.google.com.au/trends/) for the word “ cricket ” (blue

line) and the word “ football ”(red line) – only for Australia.

Chapter 2‐ Univariate Data Summary

Some other examples of Visual Presentation of Data

Wordle generated from Obama's 2009 State of the Union address (after start of

recession)

ECMT 1020: Introduction to Econometrics 计量经济学代写

ECMT 1020: Introduction to Econometrics 计量经济学 代写

ECMT 1020: Introduction to Econometrics 计量经济学代写