B2.+Compute,+use+and+interpret+regression+lines.+(Residuals,+errors+in+prediction,+outliers+vs.+influential+points)

=**B2. Compute, use and interpret regression lines. (Residuals, errors in prediction, outliers vs. influential points)**=

__**1. What you need to understand.**__
A regression line (also known as a best fit line) is drawn in to create a linear equation for a set of data. The line is drawn in so that each coordinate point is as close to the line as possible. You know where the regression line belongs because its correct placement will result in SSE. The SSE, or the least squares regression line, is when you take each residual for every point, square them, and add them all together. The outcome with the smallest sum is where the regression line belongs, considering each time you move the line, the amounts will change. The distance from the actual point to the regression line is called the residual. The residual tells you the difference from the estimated data that your given from the regression line. The whereabouts on a scatterplot or the equation of a regression line belongs can be found on the program CPMP tools.



The regression line is specific to the data that is given, so if there were to be points added or taken away from the data, the slope of the regression line would change. Based in the location of the newly added or discarded point, the change to the slope would either be a big change or a change so small it would hardly be noticed. Points that have a big impact on the slope of the regression line are called influential points. Points that are far away from the data, but don't influence the regression line that much, are just called outliers, and are either //y// or //x// outliers.

__**2. Example problem:**__
Using CPMP Tools: 1.



The point that I removed in the first box was not an influential point, since it didn't drastically change the slope of the regression line. The second point that I removed, after replacing the other one, was an influential point, because it did change the slope of the line quite drastically.

__**3. Common mistakes or misunderstandings:**__
•An issue that would completely mess up a regression line, would be imputing corrupt data. Since finding the regression line on a scatterplot is a task performed by our computers, if you first plug in data that is incorrect into the table, the points will be off, the residuals will be incorrect, and the regression line will be off from where it should be. This common mistake can easily be avoided by simply just checking your data over twice before plotting it. •Remember that a regression line is a linear line that goes through the data. Not a curved line, but a straight line. •After points are added or discarded, you must re-click on the regression line action, because then it will be redrawn differently with a different slope. •From this semester, in order to make an exact regression line that includes a slope, you must use your computer. (SO don't panic if you think you have to find it by hand!)

__**4. For more information:**__
-Unit 4 in our math book is where we covered regression lines. This could be looked back upon as a reference. -CPMP tools can give you practice data where you can practice guessing the regression line. Also you can add and remove coordinate points, or even start from scratch and add in your own data.