Accounting- Used for statistical sampling procedures when conducting audits for clients
Economics- Economists use statistical information in making forecasts about the future of the economy or some aspect of it
The data are labels or names used to identify an attribute. Nonnumeric labels or numeric code may be used.
Ex. Students being classified by the school they are enrolled in (1-Business, 2-Humanities, 3-Art, 4-Science)
The data has properties of nominal data but the order or rank is meaningful. Nonnumeric labels of numeric code may be used.
Ex. Students being classified by their class standing (1-Freshman, 2-Sophmore, 3-Junior, 4-Senior)
The data has properties of ordinal data but the interval between observations is expressed in terms of a fixed unit of measure. Data is always numeric.
Ex. Comparing SAT scores
The data has all the properties of interval data but the ratio of the two values is meaningful. Includes variables such as distance, height, weight, and time. The scale must contain a 0 value that indicates nothing exists for the variable at the 0 point.
Ex. Comparing credit hours
-Labels or names are used.
-Also referred to as categorical data.
-Can use either nominal or ordinal scales.
-Can be numeric or nonnumeric.
-Statistical analyses options are rather limited.
-Data indicates how many (discrete) or how much (continuous)
-Ordinary arthimetic operations are meaningful
Numerical---->Nominal or ordinal
Non-numerical--->Nominal or ordinal
When data is collected at the same or approximately the same point in time
Existing (within a firm, business database services, government agencies, industry associations, special-interest organizations, internet)
No attempt is made to control or influence the variable of interest
-Time requirement (searching for information is time consuming and may no longer be useful)
-Cost of acquisition (organizations often charge to acquire information)
-Data errors (using data that was acquired with little care can be misleading)
The fraction or proportion of the total number of data items belonging to the class.
A graphical device for depicting qualitative data.
-On one axis labels are specified for each of the classes.
-Frequency, RF, and PF can be used for the other axis
-Bars are fixed width and seperated to show that each bar is its own class
-Use between 5 and 20
-Data sets with larger elements usually require more classes
-Use enough classes to show variation in data
-Do not use so many classes that some contain only a few data items
-Use classes of equal width
- Largest data value - smallest data value
_______________________________________ = Approximate class width
Number of classes
-Common graphical presentation
-Variable of interest is placed on the horizontal axis
-Rectangle is drawn above each class interval with its height corresponding to the intervals frequency, RF, or PF.
-Has no natural seperation like a bar graph
Left tail is mirror image of right tail when symmetric
Skwed left is when there is a long tail to the left (can be moderate or highly skwed)
Skewed right has a long tail to the right (can also be moderate or highly skewed)
CFD-Shows the number of items with values less than or equal to upper limit of each class
CRFD-Shows the proportion of items
CPFD-Shows the percentage of items
Can be used when
-One variable is qualitative and the other is quantitative
-both variables are qualitative or quantitative
Left and top margin labels define the classes for two variables
Graphical presentation of relationship between two quantitative variables
-General pattern of points suggests overall relationship between variables
It provides information about how data is spread over the interval from smallest to largest. Often used with admission test scores for colleges.
1. Arrange data in ascending order
2. Compute the index, the position of the pth percentile (i = (p/100)n)
3. If i is not an integer, round up. The pth percentile is the value in the ith position.
4. If i is an integer, the pth percentle is the average of the value in this position +1
At least (1-1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1. In other words
-75% of data within z=2 SD of mean
-89% of data within z=3 SD of mean
-94% of data within z=4 SD
68.26% of values of a normal random variable are within +/-1 SD of mean.
95.44% of values are within +/-2 SD
99.72% of values are within +/- 3 SD
Incorrectly recorded data values
Data value that was incorrectly included in set
Correctly recorded value that belongs in set