Scopes and Methods in Political Science Final Review Michael Abbott Spring 2011 Multiple Choice Questions: Know what population means Population is defined as a set of units of analysis or elements in statistics (potentially involving more than one), however it can be referred to anyone or anything, rather than only people. Example, a population may be all adults living in a geographical area such a country or state, or working in an organization, or even could be a set of counties, corporations, government agencies, events magazine articles, or years. Thus, one must carefully define the unit of analysis and population so it is relevant to the research question (must have linearity from the research question to data inferences). Know the definition of sample A Sample is any subset of units collected in some manner from a population in which inferences are made and would be generally reflective to that whole population that is of interest. Know the definition of sample statistic Sample statistics are used to approximate the corresponding population values or parameters or percentages that are we use sample statistics to estimate population characteristic (parameters). Know the difference between probability sample and non-probability sample Probability samples-a sample in which each element in the total population has a known probability of being included in the sample (representativness/generalizable). This allows a researcher to calculate how accurately the sample reflects the population from which it is drawn (not %100, but the higher the probability, the greater the chance sample reflect the whole population in which was drawn from, the lower the margin of error). Non-probability sample-in which each element in the population has an unknown probability of being selected, this rules out the use of statistical theory to make inferences and increase the chances that the sample will not be unrepresentative to the large population it was drawn from, increases the chances of error. See Purposive (Nonprobability) samples as well as convenience and snowballing samplings get an idea of this type This is why in scientific field, probability samples are much preferred, and again there are different types of them (probability samples): Simple random samples, systematic samples, stratified samples (both proportionate and disproportionate), cluster samples, and telephone samples what happens to the standard error and dispersion when you have a smaller sample size (wider, larger narrower, what?) ****Key to remember: The smaller the sample size, the larger the standard error and wider the dispersion or distribution, and the larger the sample size means the smaller standard error and narrower/clustered the distribution (variability or ranges of sample estimates decreases or reduces). What happens to the standard error and dispersion variant when you have a larger sample size Know the difference between type 1 error and type 2 error (a) When you reject the null hypothesis and it is actually true, a TYPE I Error has been committed, the incorrect or mistaken rejection of a true null hypothesis. (b) When someone fails or do not reject the null hypothesis when it is false, basically by accepting a false null hypothesis when the result does not fall in the critical value region, a type II error mistake has been committed. Be familiar with the central limit theorem (know what it means) AKA Central Limited Theorem Meaning taking an infinite number of independent and random samples from the target population of N (like N=10, which denotes sample size) repeatedly and calculated the proportion or averages of independent in each sample, and afterwards take the extended list of sample proportions or percentages and averages to compare to the Population’s proportion or average. The more infinite amounts of independent sample taken from the target population, the more the sample proportions or averages (mean) will mirror, equal, or get closer to the corresponding true population parameter (percentages or proportion) value, no matter the sample size. Frequence Tables, be familiar with levels of measurement Nominal-Variable values are unordered names or labels, like ethnicity, gender (depending on the coding remember Dichotomous and it becomes ordinal), country, or origin. Ordinal-Variable values are labels having a hidden or hidden, but unspecified or measured order/ranking. Numbers may be assigned or coded to categories to show ordering/ranking, high/greater/stronger to low/lesser/weaker (Example: scale of ideology). Interval/Ratio-Numbers are assigned to objects such that interval differences are constant across the scale while Ratio have scales that have a meaningful zero value (Interval have no true or meaningful zero point) like years of education, and income. Know the difference between nominal, ordinal and integral/ratio levels of measurement Know dichotomous ordinal There is one thing to mention, Remember the variable “Gender,” when looking at the coding, if it is assigned with a (0) Male; (1) Female, you may assumed that it was nominal because it involves gender and does not seem take on any comparison attributes. However, if you are working with dichotomous data (codings with 0 and 1), then it becomes dichotomous ordinal-level measures Typical dichotomous responses are sometimes defined as (0) no/don’t like/oppose; (1) yes/like/support because such 0 and 1 involves a comparison in which the former, i.e.- (male) is lesser than the latter (female). Be able to know the central tendencies (how do you locate your mode) A measure of central Tendency is locating the middle or center of a distribution. Often meaning what is the Average or Mean, Median, and Mode The Mean: This is the most familiar measure of Central Tendency, the Mean or Average is basically the summation or addition of a batch of numbers or values of a variable and dividing it by the total number of values. The mean or average is appropriate for interval and ratio (quantitative) variables, but also applied sometimes to ordinal scales in which the categories are assigned number or coding. The mean or average should not be the only statistical indicator that is emphasized it can lead to misleading results that may overestimate or underestimate results about the sample. Thus mean or averages can have illustrations of few extreme or very large or small values can affect or skew the numerical magnitude of the mean and other statistics. The Median- A measure of Central Tendency that is fully applicable to ordinal as well as interval and ratio data. The Median (frequently denoted as M), is a value that divides a distribution in half. That is half of the observations lie above the median, the other half below it. In other words, the median is found by locating the middle of the distribution. One can find the middle of an odd number of observations by arranging them in order from lowest to highest and count the same number of observations from top and bottom to find the middle like if have seven values from lowest to highest 3, 5, 6, 8, 9, 10, 13, count three and count three, which leaves one out the fourth one in the middle one is number 8, the median (the median index that the three values lie below 8, and three above it, so the median divides the distribution in half. If you are dealing with a distribution with many observations, an easy way to find the middle is the formula: midobs = (N+1) 2 This formula will not provide the exact median number, but locate where to find it like look above there are a total of 7 cases so 7+1=8/2 gives you 4, so it is the fourth number and that is 8. Yet, what if dealing with an even number of observations? To illustrate, Table 11-1, there are 12 European countries and arrange from smallest to largest: 9, 9, 10, 11, 11, 11, 14, 15, 21, 22, 28, and 35 and if use the above formula it would be (12+1)/2=6.5, we take that Sixth and Seventh number (11+14)/2=M and that gives us the median of 12.5. The median is a resistant measure in that extreme values (outliers) do not overwhelm its computation. Figures 11-1 on page 376 shows the calculation of the median for a hypothetical example. However when dealing with SPSS outputs, and looking at the frequency distribution, the Median can be obtain through frequency statistics or by looking at the Cumulative Percent (most preferable) and locate the median (50th number). Whereas, averages can tend to be over-estimated or underestimated if there are extreme scores, yet medians might over counter that. The Mode: This is a common measure of Central Tendency, especially for nominal and categorical ordinal data. The mode or modal category is the category with the greatest frequency of observations, or most occurring number. Look at Table 1-4 and pg. 356, which show the distribution of responses to a party identification question from the 2004 NES. The modal (most frequent) answer was “independent-leaning Democratic,” with 208 responses. Helpful in descriptions of the shape of distributions of all kinds of variables. When one category or range of values has many more cases than all the others do, we describe the distribution as unimodal, and it has a single peak. When there are more than two dominant peaks or spikes I the distribution, we call this multimodal distribution. Remember: Average/Mean and Median-applicable to Ordinal, Interval, and Ratio. The Modal or Mode applicable to Nominal variables. *Under what column and what will it represent *mode is often found under the frequency column * Know what the difference is between the percent and the valid percent column Also, know that the percentages of the valid response are not the same as total percentages since again the missing data is excluded in the valid percentage section. Cumulative Percentages, take all the percentages and add them up like 42% of the sample either “agree or neither agree nor disagree” Thus if you are going to exclude the missing data, you must mention this like in this case according to the valid percent, 29.4% of those (1,059) respondents with substantive or valid responses agreed that a working woman can establish “just as warm and secure a relationship” with the family as a stay-at-home mom. Whereas, if you use the total percent, it was 25.7 (including the missing data). If given data number, then find that number and find the column and data set it represents Correlation matrix given, identify the independent and the dependent variable, based on that correlation Non Multiple Choice Questions: Essay Given same correlation matrix, give the correlation and strength and direction for the independent/dependent variable Given bi variant model summary table, asked to give the adjusted r squared number Tell if it’s a good model fit summary or not (if there’s a lot of unexplained variance) Give a bi variant regression table, asked to give independent/dependent variable When identifying which is which, look at the footnote of the table Going to frame a research question based on the two previous variables Once framed in proper format, must provide a theory (explanatory answer to your question, you just gave) should be about a paragraph, then provide hypothesis (short directional statement about how x infleunces y) then give the null hypothesis. The null hypotheses: “there is no relationship or statistical difference between the variables” Then give the unit of analysis (population, organization, country, etc) Given same bi variant regression table you are going to find the slope # , then interpret what that means in your own words (the more likely then determine if the relationship Universal slope sentence: “For every one unit moved , there is a increase or decrease If it’s a negative slope, you use the lowest coding, if a positive slope, then use the highest coding T-Statistic= beyond +/- 1.96 Observed Sig= must be less than .05 to reject the null hypothesis Confidence interval = must not contain a zero between the confidence interval ranges to reject the null hypothesis *Always state that you are rejecting the null hypothesis by 95% and that you are accepting the alternative hypothesis Or *By accepting the null hypothesis you are rejecting the alternative hypothesi (You should never conflict your answers, if you reject or accept the null in one, you must do it on all) We will be looking at standardized beta, look at strength and correlation handout on blackboard Independtly determine their strength and direction, then compare the two and tell which is Tell if it’s a negative or positive Anything close to 0 is weak around 50 is good around 100 like 70+ is Going to give the threshold for statistical significance, for two statistics two observe t statistics two sigs two confidence intervals then tell if independent or and dependent will be statistical significant given MAD frequency charts, like in the homework, find the mean medium mode, etc calculate mean, standard deviation