The "Is There a Difference" Portfolio! Throughout the packet, which was mostly focusing on standard deviation and Chi Squared values (I will use shorthand for chi squared, displayed as x^2). The problem sets that I used will be referenced and will be put up on my DP ASAP.
1). Sample Problem Set: To Market, To Market.
Sample and population are different but the same. A population is the total number of items that the hypothesis would affect, but is often times to big for it to be reasonable to collect data from every one of the items. For example, in "To Market, To Market", the population is all males and females. This population is to big to take samples from every person, but a sample is what is pulled from the population to get data that can then be scaled up, so that you can have accurate results without collecting data from every person. In "To Market, To Market", the sample is 150 people, 90 men and 60 women. which was pulled from the bigger population of all males and females. A hypothesis is what you are investigating in the problem. In "To Market, To Market", the hypothesis is "There is a difference between the appeal of the new soda in males and females.", because the scenario wants to see if more men prefer the new soda compared to women.. The hypothesis is what you are collecting data for, either to prove or disprove. If the hypothesis is disproved, or rejected, then we will accept the null hypothesis, which is a statement saying that there is no difference between the two things that are being compared. An example of this, for the hypothesis in "To market, To Market", would be "There is no difference between the appeal of the new soda in men and women.".
2). Sample Problems: Bacterial Culture and Mean and Standard Deviation Problem Set
Standard Deviation is a skill that we had to build throughout the unit, as it plays a big role in comparing data sets like these problems. Standard Deviation is mainly used for deciding the Normal Distribution, something I will talk about more later. The way Standard Deviation works is by taking the mean of a sample, or set, of numbers, the amount of values in the set, and the numbers in the set and plugging them into an equation to get the Standard Deviation. The equation is shown below, where t=the amount of numbers in the set, and x(sub i) is, in turn, the first number in the set, second number, and so on and so forth. The x with the bar over it is the mean of the set of numbers. In the Mean and Standard Deviation Problem Set, the first set of numbers is 13, 21, 27, 31, 35, 24, 28, 32, and 17, with a mean of approximately 23.556. x(sub i) would be 13, then 21, then 27, and so on. The x with the bar over it would be 23.556, and t, the amount of numbers in the set, would be 9. After plugging the numbers into the equation, you get the Standard Deviation. For this problem, the Standard Deviation is 6.84. This number is what you use to get the normal distribution. The normal distribution is how far the set of numbers will deviate from the mean, which is why you need the Standard Deviation. On the graph of normal distribution, the numbers should fall under one Standard Deviation, which comes out to approximately 68%. This means that the numbers in the set will be within 68% of the area of the graph. For the Bacterial Culture problem, we had to see if one anti-bacterial spray was more efficient then the other. For Bact-Out, the product for Infect-Away to beat, the average amount of bacteria left, or the mean, was 1000, and the variation of the test results, or one Standard Deviation, was 50. Because Infect-Away's average amount of bacteria left was 825, a number that is much smaller then 1000, and a number that is much farther then one Standard Deviation away from Bact-Out's mean, this means that Infect-Away has a consistent higher performance rate when compared to Bact-Out.
3). Sample Problems: Measuring Weirdness With x^2
x^2 is another big formula to know when regarding problems form this unit. x^2 is the idea that you can measure how true the null hypothesis is by taking the expected and observed values and putting them into a formula. This determines how "weird" the difference is between a value that is to be expected, and the observed values. For example, in the Measuring Weirdness With x^2 problem set, we were given three different results for a coin flip, the first being out of 20, the second out of 100, and the third being out of 1000. I will refer to these coins as A, B, and C. We were supposed to take Coins A, B, and C and determine which coin was the most "unfair", or the coin that differed most from the expected values, and which coin was the most "fair", or the coin that stayed closest to the expected values. For a coin flip, you would expect a 50/50 split for the value of heads and the value of tails. For Coin A, the heads and tails value should both be 10, because 20/2=10. Coin B should show heads and tails values of 50, and Coin C should show heads and tails values of 500. However, this was not the case in the problem. Coin A was observed with 14 heads and 6 tails, Coin B with 55 heads and 45 tails, and Coin C with 460 heads and 540 tails. If you plug the expected and observed values into the equation, you actually find that Coin C is the most unfair, while Coin B is the most fair. This is because Coin C had the highest x^2 value, meaning that the null hypothesis was most likely false, and Coin B had the lowest x^2 value, meaning that the null hypothesis was most likely to be true. This is relevant because the hypothesis is that "There is a difference between the number of heads and tails.", which means that the coin is "unfair", while the null hypothesis is "There is no difference between the number of heads and the number of tails.", meaning that the coin is "fair". This is why Coin C is the most "unfair", and why Coin B is the most "fair".
4). Sample Problems: Late in the Day
Something that comes in handy during these problems is being able to figure out the expected values for a scenario. This is a fairly simple process, where you take the total observed value, and spread it evenly over x total samples. This can be better explained with the Late in the Day problem, where we were tasked with figuring out if more accidents happened in the last 2 hours of an 8 hour shift. We were given the total number of accidents, 168, and part of figuring out if the value was "weird" using x^2 is having an expected value. So all you have to do is take the total accidents, 168, and figure out how to spread that number over the whole shift. In this case, because the 2 hours in question is 1/4 of the 8 hour shift, I multiplied 168*0.25, and got 42. I then took 42, and subtracted it from 168, getting 126. I then had the number of accidents that would have been expected to have happened in that shift, 126 happening in the first 6 hours, and 42 in the last 2 hours.
5). Sample Problems: Is it Really Worth It?
x^2 is used in many different scenarios, but the most useful one, in my opinion, is using it to compare two different populations. You do this by getting the x^2 of two different populations and combining them, then using that test the null hypothesis. In the Is It Really Worth It problem set, we were given two different populations of cats, one was given a vaccine, one was not given the vaccine. We were tasked to see if the vaccine was worth giving to cats, the hypothesis being "There is a difference in sickness rates between vaccinated and non-vaccinated cats.", and the null hypothesis being "There is no difference in the sickness rates between vaccinated and non-vaccinated cats.". After I got the two x^2 values, the non-vaccinated cats being 0.654 and vaccinated being 3.029, I combined them, getting a x^2 value of 3.6825. I then checked this value in the table that we were given, which shows the % rates of the null hypothesis being true when you plug in the x^2 value. I got a 6% chance that the null hypothesis is true, meaning that out of 100 times of the cats being vaccinated, there will only be 6 times when the vaccine does not work.
How would I determine if there is a difference, statistically, between two samples from two populations? Or how do you know that the difference isn't sample fluctuation?
Depending on the sample size, sometimes there isn't a way to tell. In many cases, if the sample is too small, there is nothing you can do until you have a larger sample size, because you can not be sure that sample fluctuation is not a factor. In a case like that, I would say that the case is undetermined until a bigger sample size is provided.