Economics- Economists use statistical information in making forecasts about the future of the economy or some aspect of it

It provides information about how data is spread over the interval from smallest to largest. Often used with admission test scores for colleges.

Why would an accountant use statistical data?

Accounting- Used for statistical sampling procedures when conducting audits for clients

Why would an economist use statistical data?

Why would a marketing team use statistical data?

Electronic point of sale scanners at retail check-out points are used to collect data for a variety of marketing research applications

How would statistical data be useful for production?

A variety of statistical quality control charts are used to monitor the output of a production process

How is statistical information useful in finance?

Financial advisors use price earnings ratios and dividend yields to guide their investment reccomendations

What is data?

The facts and figures collected, summarized, analyzed and interpreted

What is a data set?

The data collected in a particular study

What are elements?

The entities on which data are collected.

What is a variable?

A charecteristic of interest for elements

What is an observation?

The set of measurements collected for a particular element

How do you find the total number of data values in a complete data set?

Multiply the number of elements by the number of variables

Name four scales of measurement

Nominal, ordinal, interval, ratio

What does the scale determine?

The amount of information contained in the data

What does the scale indicate?

The data summarization and statistical analysis that are most appropriate.

What are the characteristics of nominal data?

The data are labels or names used to identify an attribute. Nonnumeric labels or numeric code may be used.

Ex. Students being classified by the school they are enrolled in (1-Business, 2-Humanities, 3-Art, 4-Science)

What are the characteristics of ordinal data?

The data has properties of nominal data but the order or rank is meaningful. Nonnumeric labels of numeric code may be used.

Ex. Students being classified by their class standing (1-Freshman, 2-Sophmore, 3-Junior, 4-Senior)

What are the characteristics of interval data?

The data has properties of ordinal data but the interval between observations is expressed in terms of a fixed unit of measure. Data is always numeric.

Ex. Comparing SAT scores

What are the characteristics of ratio data?

The data has all the properties of interval data but the ratio of the two values is meaningful. Includes variables such as distance, height, weight, and time. The scale must contain a 0 value that indicates nothing exists for the variable at the 0 point.

Ex. Comparing credit hours

There are more alternatives for statistical analysis when the data is...

Quantitative

What are the characteristics of qualitative data?

-Labels or names are used.

-Also referred to as categorical data.

-Can use either nominal or ordinal scales.

-Can be numeric or nonnumeric.

-Statistical analyses options are rather limited.

What are the characteristics of quantitative data?

-Data indicates how many (discrete) or how much (continuous)

-Always numeric

-Ordinary arthimetic operations are meaningful

How many types of ways can qualitative data be classified?

Numerical---->Nominal or ordinal

Non-numerical--->Nominal or ordinal

How many types of ways can quantitative data be classified?

Numerical--->Interval or ratio

What is cross-sectional data?

When data is collected at the same or approximately the same point in time

What is time series data?

When data is collected over several time periods

Identify two potential sources of data?

Existing (within a firm, business database services, government agencies, industry associations, special-interest organizations, internet)

Statistical studies

What are the two types of statistical studies?

Experimental and observational

What are the characteristics of an experimental study?

The variable of interest is first identified. Then one or more variables are identified and controlled so that data can be obtained about how they influence the variable of interest.

What are the characteristics of an observational study?

No attempt is made to control or influence the variable of interest

Ex. survey

Name the three data acquisition considerations

-Time requirement (searching for information is time consuming and may no longer be useful)

-Cost of acquisition (organizations often charge to acquire information)

-Data errors (using data that was acquired with little care can be misleading)

What are descriptive statistics

The tabular (frequency chart), graphical (histogram), and numerical methods used to summarize and present data

What is the most common numerical descriptive statistic?

The average (mean)

What is a population?

The set of all the elements of interest in a particular study

What is a sample?

A subset of the population

What is statistical inference?

The process of using data obtained from a sample to make estimates and test hypothesis about the characteristics of a population

What is a census?

Collecting data for a population

What is a sample survey?

Collecting data for a sample

What is the typical size of data used for statistical analyses?

Very large, often handled by computers

What are five ways to summarize qualitative data?

Frequency distributions, relative frequency distributions, percent frequency distributions, bar graphs, and pie charts

What is a frequency distribution?

A tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. The objective is to provide insights about data that cannot be quickly obtained from the original data.

What is a relative frequency and a relative frequency distribution?

The fraction or proportion of the total number of data items belonging to the class.

What is a percent frequency distribution?

The relative frequency times 100 shown in a tabular summary.

What is a bar graph?

A graphical device for depicting qualitative data.

-On one axis labels are specified for each of the classes.

-Frequency, RF, and PF can be used for the other axis

-Bars are fixed width and seperated to show that each bar is its own class

What is a pie chart?

A graphical device for presenting RFD for qualititative data

What are five ways to summarize quantitative data?

FD, RFD, PFD, histogram, cumulative distribution, and ogive

What are the guidelines for selecting the number of classes?

-Use between 5 and 20

-Data sets with larger elements usually require more classes

-Use enough classes to show variation in data

-Do not use so many classes that some contain only a few data items

What are the guidelines for selecting the width of classes?

-Use classes of equal width

- Largest data value - smallest data value

_______________________________________ = Approximate class width

Number of classes

What is a histogram?

-Common graphical presentation

-Variable of interest is placed on the horizontal axis

-Rectangle is drawn above each class interval with its height corresponding to the intervals frequency, RF, or PF.

-Has no natural seperation like a bar graph

Whats the difference between a symmetric, skewed left, and skewed right histogram?

Left tail is mirror image of right tail when symmetric

Skwed left is when there is a long tail to the left (can be moderate or highly skwed)

Skewed right has a long tail to the right (can also be moderate or highly skewed)

Describe the three types of cumulative distributions

CFD-Shows the number of items with values less than or equal to upper limit of each class

CRFD-Shows the proportion of items

CPFD-Shows the percentage of items

What are two methods for summarizing the data for two variables simultaneously?

Crosstabulaton and a scatter diagram

Describe crosstabulation

Can be used when

-One variable is qualitative and the other is quantitative

-both variables are qualitative or quantitative

Left and top margin labels define the classes for two variables

Describe a scatter diagram

Graphical presentation of relationship between two quantitative variables

-General pattern of points suggests overall relationship between variables

What is a trendline?

The approximation of the relationship between two variables on a scatter diagram. Can be positive or negative (can also have no relationship)

What are sample statistics

Measures computed for data from a sample

What are population parameters?

Measures computed for data from a population

What is a point estimator?

The sample stastic of the corresponding population parameter

Name the five measures of location

Mean, median, mode, percentiles, and quartiles

What is the mean

The average of all data values. The sample mean is the point estimator of the population mean.

What is the median?

The value in the middle when data items are arranged in ascending order. When data set has extreme values, it is is offered preferred measure of central location. Most often reported for annual income and property value data.

What is the mode?

The value that occurs with the greatest frequency. Can occur at two or more values. Called bimodal when it has exactly two modes and multimodal when more than two

Why are percentiles used?

How is the percentile defined?

The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100-p) percent of the items take on less value or more.

How do you find percentiles?

1. Arrange data in ascending order

2. Compute the index, the position of the pth percentile (i = (p/100)n)

3. If i is not an integer, round up. The pth percentile is the value in the ith position.

4. If i is an integer, the pth percentle is the average of the value in this position +1

What is a quartile

They are specific percentiles (25th, 50th(median), and 75th)

What are the five measures of variability?

Range, interquartile range, variance, standard deviation, coefficient of variation

What is the range?

The difference between the largest and smallest data values. Very sensitive to the smallest and largest values.

What is the simplest measure of variability?

Range

What is the interquartile range?

The difference between the third quartile and the first quartile. This is the range for the middle 50% of the data. Overcomes sensitivity to extreme data values.

What is the variance?

A measure of variability that utilizes all the data. Based on thde difference between the value of each observation and the mean. Also the average of squared differences between each value and the mean.

What is standard deviation?

The positive square root of the variance. Measured in same units as the data, making it easier to interpret than variance.

What is the coefficient of variation?

Indicates how large the standard deviation is in relation to the mean.

An important measure of a distribution's shape is the

Skewness.

What is the relationship between the mean and median when a distribution is symmetric?

They are equal

What are the charecteristics of a skewed left distribution?

Considered negative. Mean is usually less than the median.

What are the characteristics of a skewed right distributuion?

Considered to be positive. Mean is usually more than the median. In highly right skewed distributions the skewness is often above 1.0

What is the z-score?

Often called the standardized value. Denotes the number of deviations that a data value is from the mean. Measures relative location in the data set.

What would a data value with a z-score less than 0 mean and vice versa?

The data value is less than the sample mean if z-score is less than 0 and vice versa. If z-score is 0 than the value is equal to the mean.

What does Chebyshev's Theorem say?

At least (1-1/z^{2}) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. In other words

-75% of data within z=2 SD of mean

-89% of data within z=3 SD of mean

-94% of data within z=4 SD

What is the Empirical Rule?

68.26% of values of a normal random variable are within +/-1 SD of mean.

95.44% of values are within +/-2 SD

99.72% of values are within +/- 3 SD

What is an outlier?

Unusually small or large value in a data set. Usually data with z-scores less than -3 or greater than +3 might be considered an outlier.

What are possible causes of outliers?

Incorrectly recorded data values

Data value that was incorrectly included in set

Correctly recorded value that belongs in set

What are two types of exploratory data analysis?

Five number summary and the box plot

What are the components of the five number summary?

1) smallest value 2)first quartile 3)median 4)third quartile 5)largest value

What is a box plot

A box drawn with ends located on first and third quartiles with a vertical line drawn at location of the median.

How do we locate the limits on a box plot?

Lower limit is located 1.5(IQR) below Q1. Upper limit is located 1.5(IQR) above Q3.

What are the two measures of association between two variables?

Covariance and correlation coefficient

What is the correlation coefficient?

It is the measure of linear association but not neccesarilly causation. It can take on values between -1 and +1.

What do the values of the correlation coefficient indicate?

Values near -1 indicate strong negative correlation while values near +1 indicate strong positive correlation.

What three points are needed for statistical measure?

Location (center), variability (spread), and shape

