ECMT 1020: Introduction to Econometrics 计量经济学 代写

  • 100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信:120591129
  • ECMT 1020: Introduction to Econometrics  计量经济学 代写


    ECMT 1020: Introduction to Econometrics
    Lecture 1
    Instructor: Kadir Atalay
    Contact: kadir.atalay@sydney.edu.au
    School of Economics
    The University of Sydney
    Contact Information
     Unit Coordinator & Instructor W1- W6 : Kadir Atalay
    o Email: kadir.atalay@sydney.edu.au
    o Office: Room 435, Merewether Building ( H04)
    o Office Hours: Wednesday, 12.30 -14.30 or by appointment
     Instructor W7-W13:Yi Sun
    o Email: yi.sun@sydney.edu.au
    o Office: Room 488, Merewether Building ( H04)
    o Office Hours: Tentatively ; Tuesday 15.30 -17.30
     Tutors: See Blackboard
    Contact Information
     Unit Coordinator & Instructor W1- W6 : Kadir Atalay
    o Email: kadir.atalay@sydney.edu.au
    o Office: Room 435, Merewether Building ( H04)
    o Office Hours: Wednesday, 12.30 -14.30 or by appointment
     Some Rules
    o You should contact me by email.
    o Use your USyd email - identify yourself with your name and SID
    o Any questions regarding the tutorial program including administrative matters
    regarding tutorial allocation should be directed to your tutor
    Outline of Lecture
     Course Outline
    o Textbook
    o Assessment
    o Tutorials
    o Unit Schedule
     Analysis of Economic Data
    o Types of Data
     Univariate Data Summary
    o Summary Statistics for Numerical Data
    Course Website
     We will have a course website on Blackboard:
    o http://elearning.sydney.edu.au
     Special Announcements: It is essential that you log in at least twice per week to
    keep abreast of unit-wide announcements and use the resources to supplement
    your learning.
     UoS outline, online quizzes , practice questions, data files and lecture slides,
    tutorial questions will be posted there.
     Lecture slides will be posted, typically about 1 or 2 days before lecture.
     Please treat lecture slides as an outline to read before the lecture and fill in the
    gaps during or after class.
    Textbook
     The required text is
    o  “ ANALYSIS OF ECONOMICS DATA: AN INTRODUCTION TO
    ECONOMETRICS” by A. Colin Cameron
     This is a draft of book that will be published in late 2018. This version is
    particularly tailored for ECMT 1020. We will cover first 17 Chapters (out of 20)
     And it will be available as a course reader from University Copy Centre (by 28 th ).
    o The University Copy Centre is located on the ground floor of the University
    of Sydney Sports and Aquatic Centre.
     There will be a copy on reserve in the library.
     Additional texts for reference – all available in the library:
    J.M. Wooldridge Introductory Econometrics: A Modern Approach. 5th Edition
    (used in ECMT 2150); Gujurati, D.N. , Basic Econometrics , McGraw-Hill,
    Assessment
    • Your final grade for this unit will be based on six items:four online quizzes, a mid-
    semester exam, and a final exam. All items are to be completed individually
    ASSESSMENT TASKS AND DUE DATES
    Assessment Name  Weight  Due Time  Due Date
    Online Quiz 1  5%  noon  21-Aug-2017
    Online Quiz 2  5%  20:00  8-Sept-2017
    Mid-Semester Exam 30%  18.00 (Tentatively )  12-Sept-2017
    Online Quiz 3  5%  noon  16-Oct-2017
    Online Quiz 4  5%  noon  3-Nov-2017
    Final Exam  50%  Final Exam Period  Final Exam Period
     Mid-Session Examination
    o A 75 minutes exam will be held during Week 7 – (Tentatively Tuesday, 12
    September 2017 , 18.00 pm ) The exact time and date will be announced
    soon.
    o Lecture Material for weeks 1-6 will be examined
     Final Exam
    o Final will be cumulative but will place greater emphasis on new topics (we
    will go over what that means closer to the exam)
    Lecture Topics
    • First three weeks: univariate data. This is partly a recap of ECMT1010: How can we
    summarize and visualize data? What can a sample tell us about the population, and
    how can we express our uncertainty about such inference? What changes if we
    transform our data? We will particularly focus on those aspects that are relevant for
    economic analysis
    • Second three weeks: bivariate data. How does one economic variable influence
    another one? And again, how certain are we about our inference? We study both the
    necessary theory and many economic examples
    • Last six weeks: multivariate data. Here, we extend our results to cases where there is
    not just one, but several explanatory variables. Finally, we also look at what to do if
    the statistical model we estimate is not a good representation of economic reality: How
    can we find out? And how much of our results can we salvage?
    Tutorials
     Tutorials start next week!
    o There is one two-hour tutorial session each week, starting next week.
    Participation is not mandatory, but is strongly encouraged. Tutorials are a
    good opportunity to raise any questions you may have
    o Use tutorials to raise questions about the material
    o Exercises will be set each week. Do try to solve them!
     The answers will be posted later, but before the mid-semester or final exam
     In even-numbered weeks, tutorial sessions are held in regular classrooms.
    These sessions are intended to become more familiar with the material
    covered in class, as well as providing exam practice
     In odd-numbered weeks, tutorial sessions are held in computer labs. These
    sessions are intended to apply the material covered in class to real-world
    economic problems, as well as learning the basics of the Stata software
    package, which is widely used in later courses here at uni, as well as in
    many jobs in the industry.
     Computer Labs
    o Week 3/5/7/9/11/13
    o Computer Exercises
    o Use of an econometrics or statistical package:
     STATA
    STATA
     Throughout this unit you will be required to use a computer and specialised
    econometric software. (Computer Labs/Tutorials /Online Quizzes)
     The statistics and data analysis program STATA will be taught as part of this unit
    – and will be regularly demonstrated during the lectures.
     This software is available through the Virtual Desktop so you can use it in any of
    the ICT Access Labs, Learning Hubs or Libraries. (see instructions in the UoS
    outline). Also available in Labs 1-5 of Economics and Business Building (H69)
     Some of the learning and access labs are listed below: PNR Learning Hub
    ;Carslaw Learning Hub; Wentworth Learning Hub; Law Access Lab; Madsen
    Access Lab Cumberland Access Lab
     If you wish to buy your own license to use STATA on your computer
    o http://www.survey-design.com.au/buygradplan.html
    (Small Stata will be sufficient for this course)
     There is a brief introduction to Stata in Appendix A of the textbook. Generally,
    we will just introduce new commands as they are needed. Stata's help facilities are
    also pretty good.
    Mathematics
    •I appreciate many of you haven't had a lot of recent maths practice, and I'll try to
    make things smooth.
    • Calculus is not needed for this course, although it may help guide your intuitive
    understanding of some of the material. Later ECMT courses, as well as higher-division
    macro and micro units, will require it though.
    • Some familiarity with basic algebra, such as working with summations, is assumed
    • If you find that the algebra during the lectures or in the tutorials is moving too fast
    for you,
    1- please take advantage of the university's Maths Learning Centre. They have free
    drop-in classes, including some specifically tailored for economics students. Don't
    be ashamed or afraid, they're there to help!
    2- LET ME KNOW!!!!! Happy to help you!
    Chapter 1‐ Analysis of Economic Data
    14
    Use of Economic Data
     In a nutshell, econometrics is the use of statistical methods to answer
    economic questions
     Describing the economic “landscape”
    o What is the annual growth rate of GDP ? Has unemployment risen over
    past year?
    o Do people with higher levels of education earn more?
    o Descriptive statistics motivate economic theory
     Testing or attempting to distinguish between economic theories
    o Is it true that stock returns are unpredictable?
     Evaluating government and business policy
    o Did those incredibly low interest rates in recent years really help stimulate
    the economy?
    Chapter 1‐ Analysis of Economic Data
    15
    RECAP ECMT 1010
    Chapter 1‐ Analysis of Economic Data
    16
    RECAP ECMT 1010:
    Chapter 1‐ Analysis of Economic Data
    17
    Types of Data
    There are a variety of different types of data that you will encounter in economics. The
    ways in which we categorize types of data include the following:
     Value: numerical data, categorical data
     Unit of observation: cross-section data, time series data, panel data
     Number of variables: univariate data, bivariate data, multivariate data
    Chapter 1‐ Analysis of Economic Data
    18
    Types of Data / Value / Numerical Data (Quantitative)
    Numerical data are data that are naturally recorded and interpreted as numbers. They
    can be continuous or discrete. Examples of numerical data include:
     Annual income (continuous)
     Hours worked (discrete)
     Annual GDP (continuous)
     Number of times a person has visited dentist (discrete)
    Discrete numerical data take only integer values.
    Types of Data / Value / Categorical Data
    Categorical data are data that are recorded as belonging to one or more groups. They
    can be recorded as numbers but these numbers have no inherent meaning. Examples of
    categorical data include:
     Gender ; Religion; Birth Place …
    Chapter 1‐ Analysis of Economic Data
    19
    Chapter 1‐ Analysis of Economic Data
    20
    Types of Data / Units of Observation
    Economics data are most often observational data, meaning they are based on
    observations of actual behavior in an uncontrolled environment.
    Types of Data/ Units of Observation / Cross-section data
     Cross-section data are data on different entities collected at a common point in
    time.
    o Sample of individuals, households, firms, countries, other units taken at a
    point in time (“snapshot”).
     Notation: ? ? ,? ? 1,…,?
    o i specifies a particular individual for an observation
    o n is the total number of individuals observed ( typically called the sample
    size)
    o x is the value of whatever variable we are observing.
     Examples: a single year of census data, unemployment rates by state for a
    particular year
    Chapter 1‐ Analysis of Economic Data
    21
    Examples of a cross-sectional data set:
    Data set on hourly wages of individuals in 2014
    observation  hourly wage
    1  17.15
    2  35.54
    3  51.05
    498  16.87
    499  19.00
    500  41.35
    ? ? ,? ? 1,…,500 → ? ? ? 51.05 ; ? ??? ? 19.00
    Note that the order of the observations (observation number) is not important.
    Chapter 1‐ Analysis of Economic Data
    22
    Types of Data/Units of Observation / Time-series data
     Time-series data are data on the same quantity at different points in time.
     Notation: ? ? ,? ? 1,…,?
    o t specifies time period of an observation
    o T is the total number of time periods
    o x is the value of whatever variable we are observing.
     Examples: GDP of a country overtime, daily averages of the S&P,monthly
    unemployment rate.
    Example: data on minimum wages (Australia , 1950 to 1987)
    Year  hourly wage
    1950  0.20
    1951  0.21
    1952  0.23
    .  .
    .  .
    1987  3.35
    Chapter 1‐ Analysis of Economic Data
    23
    Types of Data/ Units of Observation / Panel data
    Panel data are data on different individuals with each individual observed at multiple
    points in time.
     Notation: ? ?,? ,? ? 1,…,?; ? ? 1,…,?
     Panel data is a mixture of cross-section and time series data
     Examples: earnings of USyd graduates over time; life expectancy by country over
    time
     Data set on hourly wages of individuals in 2013-14
    observation  person  year  hourly wage
    1  1  2013  16.42
    2  1  2014  17.15
    3  2  2013  37.41
    4  2  2014  35.54
    .  .  .  .
    .  .  .  .
    499  250  2013  40.22
    500  250  2014  41.35
    Chapter 1‐ Analysis of Economic Data
    24
    Types of Data / Number of Variables / Univariate Data
    Univariate data is a single data series containing observations of only one variable.
     Notation: ? ? ??? ????? ??????? ???? ; ? ? ??? ???? ?????? ????
     Examples: Earnings of uni.graduates in 2012; inflation rate from 1960 to 2014
    Types of Data / Number of Variables / Bivariate Data
    Bivariate data is composed of two potentially related data series.
     Notation: ?? ? ,? ? ? ??? ????? ??????? ???? ;?? ? ,? ? ? ??? ???? ?????? ????
     We are often interested in the relationship between x and y.
     Examples: Education and earnings of individuals; inflation and unemployment
    rates over time.
    Chapter 1‐ Analysis of Economic Data
    25
    Types of Data / Number of Variables / Multivariate Data
    Bivariate data is composed of three or more potentially related data series.
     Notation: ?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ????? ??????? ???? ;
    ?? ?,? ,? ?,? ,…,? ?,? ,? ? ? ??? ???? ?????? ???? ;
     We are often interested in how ? ? ,…? ? ??? ??????? ?? ?
     Examples: Inputs and outputs and profits for a firm over time;
    Education, experience, gender and income for a cross-section of individuals.
    Chapter 1‐ Analysis of Economic Data 
    26
    What do we do with economic data?
    The basic steps of data analysis:
    1- Data Summary
    2- Statistical Inference
    3- Interpretation
    Chapter 1‐ Analysis of Economic Data 
    27
    Steps of Data Analysis: Data Summary
     To summarize data, we typically use a combination of visual representations of
    the data and statistics
     Visual representations include a variety of graphs and charts (scatterplots,
    histograms, maps, etc.)
     Statistics can measure characteristics of a single variable (mean, median, variance,
    etc.) or relationships between multiple variables (covariance, correlation, linear
    regression, etc.)
     The choice of summary statistics and graphs depends on both the type of data
    available and what the researcher is interested in
    Chapter 1‐ Analysis of Economic Data 
    28
    Steps of Data Analysis: Statistical Inference
     The basic idea of statistical inference is to draw conclusions about a relationship
    we cannot observe
     We typically cannot reach definitive conclusions because we only get to observe a
    sample rather than the population
     Statistical inference requires using what we know about the sample and about
    probability to reach a conclusion about the probable characteristics of variables
    and relationships between them at the population level
    RECAP - ECMT 1010
    Chapter 1‐ Analysis of Economic Data 
    29
    Chapter 1‐ Analysis of Economic Data 
    30
    Reminder: Statistics (1)
    •  Statistics is using data to figure out as much as we can about a parameter that we cannot
    observe
    • Statistical model describes a population that we cannot observe. (Mainly because it
    would be too much work -the education and salary of every person on Earth - or we have
    a population that has infinite points “assume X follows a normal distribution…”)
    • This model generally has one or a few parameters, describing the thing we're interested
    in: the correlation between education and salary.
    • We then assume that our dataset is a sample taken from the population we have
    described. From this dataset, we calculate an estimator for the true but unknown
    parameter: often something like a sample correlation, or a sample mean
    • Standard practice in statistics is to use Greek letters for population quantities
    ? ? ,? ,?,?? ) and Latin letters for sample quantities ??̅,? ,?,? ). The textbook largely
    follows this rule
    Chapter 1‐ Analysis of Economic Data 
    31
    • Finally, inference happens. Our estimator is probably not exactly equal to the
    parameter, but can we say something about how far off it is likely to be? This is where
    confidence intervals show up
    • More formalities about sampling and inference later in the course, starting next week.
    For today, we focus on the sample itself
    Chapter 1‐ Analysis of Economic Data 
    32
    Focus of this course: Regression Analysis
     ECMT1010 focuses on data on a single variable considered in isolation (such as
    coin toss)
     In this class, we start analyzing univariate data – studying a single data series
    (similar to ECMT1010)
     Most economic data analysis is focused on measuring the relationship between
    two or more variables.
    o We want to understand the inter-relationships (and perhaps causality) ( such
    as effect of minimum wage laws on unemployment)
    o The main statistical method is called “regression analysis”.
     Bivariate data (two related series) – Chapter 8 to12
     Multivariate data (three or more related series ) – Chapter 13 to 17
    Chapter 2‐ Univariate Data Summary 
    33
    Chapter 2 - Univariate Data Summary
     Univariate data are a single series of data that are observations on one variable.
     A numerical data example is annual earnings for each person in a sample of
    women.
     A categorical data example is expenditures in each of a number of categories.
    Our main focus :
     (1) Summary Statistics for Numerical Data
     (2) Charts for Numerical Data
    Chapter 2‐ Univariate Data Summary 
    34
    Summary Statistics for Univariate Data
     Graphs are nice for giving people a quick glimpse of data
     However, there is a lot of ambiguity about interpreting graphs and comparing one to
    another.
     Where is the mean? What is a wide distribution and what is a narrow one? Are tails
    big or small? Etc.
     Summary statistics give us a standardized way of summarizing univariate data
     People know what the numbers mean and they can be compared across different
    samples
    Chapter 2‐ Univariate Data Summary 
    35
    Types of Summary Statistics
     We're often interested in describing the following characteristics of the
    distribution of a data series:
    o Central tendency – where is the center of the distribution of the data?
    What is a typical Australian employee's salary, whatever “typical” means?
    o Dispersion –how spread out is the data?
    How much inequality is there in our income distribution?
    o Skewness (asymmetry) – how symmetric (or asymmetric) is the distribution?
    How many millionaires are there, compared to minimum-wage workers?
    o Kurtosis (Peakedness) –how fat are the tails, how tall is the peak ?
    How rare are minimum-wage workers and millionaires, compared to typical
    earners?
    Chapter 2‐ Univariate Data Summary 
    36
    A little Math Review:
    If X takes n values, ? ? ,? ? … ? ??? ,? ? their sum is
    ?? ?
    ?
    ???
    ? ? ? ? ? ? ? ? ? ? ⋯? ? ??? ? ? ?
     If g(x) is a function of x, then
    ???? ? ?
    ?
    ???
    ? ??? ? ? ? ??? ? ? ? ??? ? ? ? ⋯? ??? ? ?
     If “a” and “b” are constant, then
    o ∑ ? ? ? ∗ ?
    ?
    ???
    o ∑ ?? ?
    ?
    ???
    ? ?∑ ? ?
    ?
    ???
    o ∑ ?? ? ?? ? ? ? ?? ? ?∑
    ? ?
    ?
    ???
    ?
    ???
    o ∑ ?? ?
    ?
    ???
    ? ? ? ? ? ∑ ? ?
    ?
    ???
    ? ∑ ? ?
    ?
    ???
    o ∑ ?? ?
    ?
    ???
    ∗ ? ? ? ? ∑ ? ?
    ?
    ???
    ∗ ∑ ? ?
    ?
    ???
    Chapter 2‐ Univariate Data Summary 
    37
    Types of Summary Statistics – Empirical Example
    To go over these different types of summary statistics, we will use the following
    example:
    This is the distribution of annual earnings of a sample of 171 women who are 30 years
    of age in 2010. The data are in “EARNINGS.dta” in BB.
    0 5 10 15 20 25
    Frequency
    0 25000 50000 75000 100000 125000 150000 175000 200000
    Earnings
    Chapter 2‐ Univariate Data Summary 
    38
    Measures of Central Tendency
    A measure of central tendency / central location describes the center of the
    distribution in the data
     Tells us whether center of distribution is
     Answer the question, “What is a typical value in this sample?”
     Several measures
    o Sample mean
    o Sample median
    o Sample midrange
    o Sample mode
    Chapter 2‐ Univariate Data Summary 
    39
    The Sample Mean
     Most common way to measure central tendency
     It is also called as sample average
     Definition:
    ?̅ ?
    1
    ? ?? ?
    ?
    ???
     Weights all observations equally!
     STATA command mean variable_name
    sum variable_name
    tabstat variable_name, stat(mean)
    Chapter 2‐ Univariate Data Summary 
    40
    The Sample Median
     Value that divides the sample into two halves (50% of observations are above
    value and 50% are below)
     Order data from lowest to highest value the median is that value that divides the
    ordered data into two halves (is the one that ends up in the middle).
     When n (number of observation) is an odd number, median is the middle value,
    when n is an even number, use the average of the two middle observations.
     Less sensitive to outliers than the sample average
    (An outlying observation, or outlier is an observation that is unusually large or
    small)
     Other quantiles can be used
     STATA command sum variable_name , detail
    tabstat variable_name, stat(median)
    Chapter 2‐ Univariate Data Summary 
    41
    The Mean Vs. The Median
    What is the typical Australian worker’s wages?
    Among full time workers, the average wage is $72,000 per year in 2011
    , the median wage is $57,400 per year in 2011
     Note that mean is over 25% larger than the median.
     Why is there such a big difference? Which of these numbers is more relevant.
    Chapter 2‐ Univariate Data Summary 
    42
    The Sample Midrange
     The sample midrange is the average of the smallest and largest observations.
     Not a very commonly used measure
     Extremely sensitive to outliers
     STATA command sum variable_name , detail (see 2 nd column)
    The Sample Mode
     The most frequently occurring value in sample
     Useful with discrete data and cases where particular values are meaningful (4
    years of high school,40 hours of work each week, ...).
     STATA command tab
    Chapter 2‐ Univariate Data Summary 
    43
    Quartiles , Deciles and Percentiles
     Median is the point that equally divides an ordered sample.
     Lower Quartile is that point where ¼ (¾) of sample lies below (above)
     Upper Quartile is that point where ¾ (¼) of sample lies below (above)
     STATA command sum variable_name , detail (see 2 nd column)
    Finer divisions:
     p th percentile is the value for which p percent of the observed values are equal to
    or less than the value.
     Median – 50 th ; Upper Quartile- 75 th ; Lower Quartile- 25 th percentiles.
     Deciles split the ordered sample into tenths.
     Quantile is a percentile reported as a fraction of one rather than percentage.
    (0.56 quantile =56 th percentile)
    STATA command tabstat variable_name , stat(p1 p5 ..)
    Chapter 2‐ Univariate Data Summary 
    44
    These four measures of central tendency can give very different answers to the
    question, what is a typical salary? Which one to use depends on which question you
    are trying to answer.
    Chapter 2‐ Univariate Data Summary 
    45
    Measures of Dispersion
     Characterize the spread or width of the distribution: How far away do observations
    tend to be from the mean?
     Different measures:
    o Sample variance
    o Sample standard deviation
    o Sample coefficient of variation
    o Sample range and inter-quartile range
     Like measures of central tendency, the different measures have different benefits
    and drawbacks
     STATA command sum variable_name , detail (see 2 nd column)
    tabstat variable_name , stat (… )
    Chapter 2‐ Univariate Data Summary 
    46
    Sample Variance
    How far away do observations tend to be from the mean?
    Simply calculating  ?
    ?  ∑
    ?? ? ? ?̅?
    ?
    ???
    is not useful: positive and negative differences
    cancel out and the result is always zero
    So we worked with squared deviations instead. The sample variance is defined
    ? ? ?
    1
    ? ? 1 ??? ?
    ? ?̅? ?
    ?
    ???
     The division by n - 1 rather than n is a “degrees of freedom” correction, which is
    necessary because we are using a sample mean ?̅ rather than the population
    mean ?
     When we start working with multivariate data, you'll often see n – k popping up
    for much the same reason. This is worth remembering: in general,
    “degrees of freedom = observations - estimated parameters”
    Chapter 2‐ Univariate Data Summary 
    47
    Sample Variance
     Approximately equal to the average squared deviation from mean:
    ? ? ?
    1
    ? ? 1 ??? ?
    ? ?̅? ?
    ?
    ???
     As the sample variance increases, the spread of the data gets wider
     STATA command sum variable_name , detail (see 3 rd column)
    tabstat variable_name , stat(variance)
     One problem with variances is that they're hard to interpret. If x is measured in
    dollars, ? ? is in squared dollars - whatever that means
    Chapter 2‐ Univariate Data Summary 
    48
    Sample Standard Deviation
     Standard deviation is just the square root of the variance:
    ? ? ? ? ? ? ?
    1
    ? ? 1 ??? ?
    ? ?̅? ?
    ?
    ???
     Roughly the average deviation of the data from its mean.
     It has the same units as the data ( not the case in variance)
     If one sample has a larger sample standard deviation than another, then we view
    the sample as having greater variability.
     STATA command sum variable_name (see 3 rd column)
    tabstat variable_name, stat(sd)
    Chapter 2‐ Univariate Data Summary 
    49
    Interpretation of the Standard Deviation
    A useful way to interpret the standard deviation is to use results for the normal
    distribution (see ECMT 1010).
     The probability of being within one, two standard deviations of mean is 0.68 and
    0.95
     For other distributions  we know that at least ¾ of a random sample is within
    the two standard deviation (Chebychev’s inequality)
    Chapter 2‐ Univariate Data Summary 
    50
    Recap:
    Many things are approximately normally distributed. For normal distributions, we can
    interpret the standard deviation as follows:
     68% of the observations will be less than one standard deviation away from the
    mean
     95%, less than two standard deviations
     Almost 100%, less than three standard deviations
     Even if the distribution is not normal, we still have some bounds .
    At least 75% within two sd, at least 88.89% within three sd
     In general, at least a fraction 1 ? 1/? ? within c sd. This result is called
    Chebychev's inequality (NO NEED TO MEMORIZE)
    Chapter 2‐ Univariate Data Summary 
    51
    Sample Coefficient of Variation
     Sample standard deviation relative to sample mean
    ?? ?
    ?
     Standardized measure: no units, can be compared across series.
     STATA command sum variable_name, detail
    (use the info in the second and third columns)
    tabstat variable_name , stat(cv)
    Chapter 2‐ Univariate Data Summary 
    52
     Sample Range
     Difference between the largest and smallest values in the sample
     Simplest measure of dispersion but also the least interesting
     Very sensitive to outliers
     STATA command sum variable_name (last two columns).
    tabstat variable_name , stat(range)
    Sample Inter-Quartile Range
     Variation on sample range that is less sensitive to outliers
     Equal to difference between 75 th and 25 th percentile of the distribution
     STATA command tabstat variable_name , stat(iqr)
    Average Absolute Deviation
     Another measure that is more resistant to outliers
    ?
    ? ?|? ?
    ? ?̅|
    ?
    ???
    Chapter 2‐ Univariate Data Summary 
    53
    Symmetry
     A distribution is symmetric if its shape is the same when reflected around the
    median. A common example is the normal distribution
    Chapter 2‐ Univariate Data Summary 
    54
    Measuring Symmetry (or Asymmetry)
     Typically use skewness to measure symmetry
     Right- skewed: Distribution has a long right tail and data are concentrated to the
    left
     Left-skewed: Distribution has a long left tail and data are concentrated to the right
    Where are the mean and medians?
    0 200 400 600 800
    Frequency
    0 2 4 6
    x
    Symmetric
    0 500 1000 1500
    Frequency
    0 2 4 6
    y
    Right-skewed
    0 200 400 600 800 1000
    Frequency
    0 2 4 6
    z
    Left-skewed
    Chapter 2‐ Univariate Data Summary 
    55
     One way to test for right- or left- skewed is to compare median to mean.
    Symmetric: ?̅ ? ?????????
    Right-skewed: : ?̅ ? ?????????
     Left-skewed: : ?̅ ? ?????????
     Formal Measure of Asymmetry is skewness test:
    ???? ?
    1
    ? ∑
    ??
    ? ? ?̅? ?
    ?
    ???
    ? 1
    ? ∑
    ??
    ? ? ?̅? ?
    ?
    ???
    ?
    ?/?
    Interpretation of static : symmetric = 0; right-skewed >0 ; left skewed <0.
    STATA command tabstat variable_name, stat(skewness)
    Chapter 2‐ Univariate Data Summary 
    56
    Distribution of arrival delays for United Airline flights into San Francisco
    International Airport, January 2014
    Mean = 11.39; Median = 0 ; Skewness: 5.66
    0 100 200 300 400 500
    Frequency
    -25 0 25 50 75 100 125 150
    Arrival Delay (minutes)
    Chapter 2‐ Univariate Data Summary 
    57
    Distribution of 500 fastest 100m times as of December 2014
    Mean = 9.90; Median = 9.92 ; Skewness:-1. 52
    0 20 40 60 80 100
    Frequency
    9.58 9.63 9.68 9.73 9.78 9.83 9.88 9.93 9.98
    x
    Chapter 2‐ Univariate Data Summary 
    58
    Kurtosis
     Measures the relative importance of the observations in the tail of the distribution.
    (How fat the tails of distribution are.)
     Simplest measure is:
    ???? ?
    1
    ? ∑
    ??
    ? ? ?̅? ?
    ?
    ???
    ? 1
    ? ∑
    ??
    ? ? ?̅? ?
    ?
    ???
    ?
    ?
    Note: different computer programs can use slightly different formulae.
    STATA command tabstat variable_name, stat(kurt)
    Chapter 2‐ Univariate Data Summary 
    59
    How to interpret:
     Normal distribution with Kurtosis=3 is the benchmark.
     Excess Kurtosis measures kurtosis relative to the normal distribution
    ?????????? ≅ ???? ? ?
    If Excess Kurtosis is equal to 0, the distribution has the shape of normal distribution.
    Positive Excess Kurtosis, the distribution has fat tails  greater area in the tails than
    for the normal distribution with the same mean and variance.
    Negative Excess Kurtosis, the distribution has skinny tails.
    Chapter 2‐ Univariate Data Summary 
    60
    Chapter 2‐ Univariate Data Summary 
    61

    ECMT 1020: Introduction to Econometrics  计量经济学 代写
    How to present key summary statistics for the data?
     Tables (see for example, Table 2.1 in your book)
    Annual earnings of 30 year old female full time workers in 2010
     Box and Whisker Plots (Box Plot)
     All box-and-whisker plots give the lower quartile, median and upper quartile;
    these form the “box.”
     Simple box-and-whisker plots additionally give the minimum and maximum;
    these form the “whiskers.”
     More complicated box-and-whisker plots additionally plot outlying values.
    Chapter 2‐ Univariate Data Summary 
    62
     In complicated box and whiskers, whiskers are data-determined lower and upper
    bounds. Outlying observations are the values that exceed these bounds.
     This is a complicated form of
    Box Plot.
     In this case, upper bar equals to
    upper quartile + 1.5 times inter-
    quantile range.
     The six dots are the outliers.
     Lower bound is the minimum
    sample value.
     No outliers in the below lower
    bound.
     No values that are lower than.
    (25k -1.50*(50k-25k) = -12.5k
     Right-skewed data
    Chapter 2‐ Univariate Data Summary 
    63
    Chapter 2‐ Univariate Data Summary 
    64
    Graphical Representations of Univariate Data
    Chapter 2‐ Univariate Data Summary 
    65
    Graphical Representations of Univariate Data
    With univariate data, we have a few different options for graphing the data. The most
    common are:
     Histograms - graphs showing the frequency of occurrence of different values
     Line charts - plots of the variable value against the observation number
     Pie charts, bar charts, column charts - various ways to present observations
    that are measured in different categories
    Chapter 2‐ Univariate Data Summary 
    66
    A Histogram example using absolute frequencies
    o Absolute frequency - just the number of times a particular value is observed in the
    data  should be problematic if n is large (i.e. hard to read “y” axis)
     STATA command  histogram variable_name, frequency
    0 10 20 30 40 50
    Frequency
    0 50000 100000 150000 200000
    earnings
    Chapter 2‐ Univariate Data Summary 
    67
    A Histogram example using relative frequencies
    o Relative frequency - the number of times a value is observed as a percentage of all
    observations
     STATA command  histogram variable_name, percent
    0 10 20 30
    Percent
    0 50000 100000 150000 200000
    earnings
    Chapter 2‐ Univariate Data Summary 
    68
    Histograms
    There are a few choices to make when constructing a histogram.
     Whether to use absolute frequency or relative frequency for the vertical axis
    o Absolute frequency - just the number of times a particular value is observed
    in the data

    ECMT 1020: Introduction to Econometrics  计量经济学 代写
    o Relative frequency - the number of times a value is observed as a percentage
    of all observations
    o Either choice will lead to the same shape for the histogram
     How large to make the bin sizes
    o If the data take on many different values, you'll want to group data into bins
    o In general, the more observations you have, the more bins you use.
    o A common default choice is √?
    Chapter 2‐ Univariate Data Summary 
    69
    Histograms
    Number of bins:
     Few bins   not enough information | too many bins  hard to read
     Rule of thumb is
    √?
    , in our example is  √171 ? 13 ????
    o The width of the bin (172,000 -1,050)/13 =13,150.
    Stem and Leaf display (A variation of Histogram)
    Chapter 2‐ Univariate Data Summary 
    70
    Smoothed Histograms
     Data that take many different values, such as earnings data, have an underlying
    continuous probability density function rather than a discrete probability mass
    function. (We are going to talk more about these next weeks).
     This form of data can be better presented by a smooth graph, than discrete bins.
     A smoothed histogram smooths the histogram in two ways.
    o First, it uses rolling bins (or windows) that are overlapping rather than distinct.
    o Second, in counting the fraction of the sample within each bin it gives more
    weight to observations that are closest to the center of the window and less to
    those near the ends of the window.
     A well-known example is a kernel density estimate. (choice of window ,similar to
    bin size)
    Chapter 2‐ Univariate Data Summary 
    71
    Kernel Density – Example of Earnings data
    kdensity earnings  kdensity earnings, bwidth(10000)
    0 5.000e-06.00001 .000015 .00002 .000025
    Density
    0 50000 100000 150000 200000
    earnings
    kernel = epanechnikov, bandwidth = 5.0e+03
    Default window width
    0 5.000e-06 .00001 .000015 .00002
    Density
    0 50000 100000 150000 200000
    earnings
    kernel = epanechnikov, bandwidth = 1.0e+04
    Wider window width
    Chapter 2‐ Univariate Data Summary 
    72
    Line Charts
    When the observations in a univariate dataset have a natural order, it often makes sense
    to use a line chart
      A line chart plots successive values of the data against the successive index values
      This offers an easy way to visualize whether values are getting larger or smaller
      Line charts are most common with time series data
      STATA command  tsline variable_name
    Data is real GSP per capita in US 
    Chapter 2‐ Univariate Data Summary 
    73
    Categorical Data – Pie and Bar Charts
     Histograms are good for representing numerical univariate data. For categorical
    univariate data, we typically use pie charts or bar/column charts.
    o Pie charts are perhaps the easiest way for people to visualize percentages
    o Bar/column charts have the advantage of being able to show both relative and
    absolute frequencies
    o Bar/column charts will become more useful as we start adding more variables
    For more on STATA graph commands:
    (a) use drop-down “graphics” menu on the top-right corner
    (b) type help graph in command window.
    (c) Or just google…
    Chapter 2‐ Univariate Data Summary 
    74
    Some other examples of Visual Presentation of Data
    Google Trends data (http://www.google.com.au/trends/) for the word “ cricket ” (blue
    line) and the word “ football ”(red line) – only for Australia.
    Chapter 2‐ Univariate Data Summary 
    75
    Some other examples of Visual Presentation of Data
    Wordle generated from Obama's 2009 State of the Union address (after start of
    recession)
    ECMT 1020: Introduction to Econometrics  计量经济学 代写