Monday, February 4, 2013

Booting up SaS

Sassy

(*Note:  Though this class is primarily focused on learning and manipulating data using the SAS or JMP statistical packages, I will be programing and posting solutions in R.  I may try to post equivallen solutions in SAS simultaneously for those that are interested in learning both.  R is free and does not require 22 Gazigabytes. )

T-Test:
History for the nerds-
http://en.wikipedia.org/wiki/William_Sealy_Gosset

Basic t-test with calculator-
http://www.stattools.net/tTest_Exp.php

More detailed explanation-
http://simon.cs.vt.edu/SoSci/converted/T-Dist/activity.html

Regression + ANOVA = ANCOVA

Regression:


covariance = 


regression coefficient = 

(*Note:  The n or n-1 will cancel when the cov is divided by the var, thus whether the correction is applied or not is irrelevant)




Regression explained:
http://www.law.uchicago.edu/files/files/20.Sykes_.Regression.pdf

more simply:
http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm
http://easycalculation.com/statistics/learn-regression.php

And explained well:
http://www.sjsu.edu/faculty/gerstman/StatPrimer/regression.pdf

Goodness of fit explained:
http://www.mathworks.com/help/curvefit/evaluating-goodness-of-fit.html

Regression in SAS:
http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter1/sasreg1.htm
http://www.youtube.com/watch?v=Bzm8TJYFZcs

Regression in R
http://msenux.redwoods.edu/math/R/regression.php

Model I and II regressions:
http://www.mbari.org/staff/etp3/regress/about.htm


WOOOOO!







HW # 1

To those who unfortunately are reading this as opposed to vacationing in Vegas,



Andrew Jones
Biometry
2/3/13



Small Arabinose Negative Lineages vs. Large Arabinose Negative Lineages

Average of Small-

0.765635645
0.890993539
0.860948991
0.886212273
0.859489471
0.934212218
0.945863536
0.999423109
0.899233247
0.787217193
0.938261524
0.984696833
0.83820725
0.827858702

(∑Obs)/n where n = 14.
(1) Mean = .887
(2) Var = (∑(obs-µ)^2)/(n-1)
=(.00016+.000052+.00021+.00053+.00040+.00006+.00040+.00002+.00021+.00289+. 00231+.00028+.00151+.00174)/13
=.00087


Average of Large-

0.887503593
0.907561395
0.914647822
0.877401142
0.920149004
0.907823388
0.880485947
0.896073919
0.88584494
0.954043492
0.852222311
0.9171615
0.861517592
0.942045965
0.916409303

(∑Obs)/n where n = 15.
(1) Mean = .901
(2) Var = (∑(obs-µ)^2)/(n-1)

=(.01473 + .00002 + .00068 +.00000 + .00076 + .00228 + .00346 + .01263 + .00015 + .00996 + .00263 + .00954 + .00238 + .00350)/14
=.00482

(3) Mean of means= (.901 + .887)/2 = 0.895

(4) Variance of Mean of Means((.900-.89)^2 + (.887-.894)^2)/n-1 = .000085

(5) Grand Mean

0.765635645
0.890993539
0.860948991
0.886212273
0.859489471
0.934212218
0.945863536
0.999423109
0.899233247
0.787217193
0.938261524
0.984696833
0.83820725
0.827858702
0.887503593
0.907561395
0.914647822
0.877401142
0.920149004
0.907823388
0.880485947
0.896073919
0.88584494
0.954043492
0.852222311
0.9171615
0.861517592
0.942045965
0.916409303
/19

=.8945

(6) Variance

(.0165 + .00001 + .00112 + .00007 + .00122 + .00158 + .00264 + .01102 + .00002 + .01150 + .00192 + .00814 + .00316 + .00443 + .00005 + .00017 + .00041 + .00029 + .00066 + .00018 + .00020 + .00000 + .00007 + .0036 + .00178 + .00052 + .00108 + .00227 + .00048)/28

=.00268

(7) The Weird One
 Obs- .8945

-0.128817625
-0.003459731
-0.033504279
-0.008240997
-0.034963799 
0.039758948 
0.051410266
0.104969839 
0.004779977
-0.107236077 
0.043808254 
0.090243563
-0.056246020
-0.066594568
-0.006949677 
0.013108125 
0.020194552
-0.017052128 
0.025695734 
0.013370118
-0.013967323
0.001620649
-0.008608330 
0.059590222
-0.042230959 
0.022708230
-0.032935678 
0.047592695
0.021956030

(-0.128817625 + -0.003459731 + -0.033504279 + -0.008240997 + -0.034963799  + 0.039758948  + 0.051410266 + 0.104969839 + 0.004779977 + -0.107236077  + 0.043808254  + 0.090243563 + -0.056246020 + -0.066594568 + -0.006949677  + 0.013108125  + 0.020194552 + -0.017052128  + 0.025695734  + 0.013370118 -+ 0.013967323 + 0.001620649 + -0.008608330 + 0.059590222 + -0.042230959+ 0.022708230 + -0.032935678 + 0.047592695 + 0.021956030)/29

= 1.15 x 10^-17



(8) (-0.128817625- 1.15 x 10^-17)^2 + (-0.003459731- 1.15 x 10^-17) ^2 + (-0.033504279- 1.15 x 10^-17) ^2 + (-0.008240997- 1.15 x 10^-17) ^2 + (-0.034963799- 1.15 x 10^-17) ^2  + (0.039758948- 1.15 x 10^-17) ^2  + (0.051410266- 1.15 x 10^-17) ^2 + (0.104969839- 1.15 x 10^-17) ^2 + (0.004779977- 1.15 x 10^-17) ^2 + (-0.107236077- 1.15 x 10^-17) ^2  + (0.043808254- 1.15 x 10^-17) ^2  + (0.090243563- 1.15 x 10^-17) ^2 + (-0.056246020- 1.15 x 10^-17) ^2 + (-0.066594568- 1.15 x 10^-17) ^2 + (-0.006949677- 1.15 x 10^-17) ^2  + (0.013108125- 1.15 x 10^-17) ^2  + (0.020194552- 1.15 x 10^-17) ^2 + (-0.017052128- 1.15 x 10^-17) ^2  + (0.025695734- 1.15 x 10^-17) ^2  + (0.013370118- 1.15 x 10^-17) ^2 + (0.013967323- 1.15 x 10^-17) ^2 + (0.001620649- 1.15 x 10^-17) ^2 + (-0.008608330- 1.15 x 10^-17) ^2 + (0.059590222- 1.15 x 10^-17) ^2 + (-0.042230959- 1.15 x 10^-17) ^2+ (0.022708230- 1.15 x 10^-17) ^2 + (-0.032935678- 1.15 x 10^-17) ^2 + (0.047592695- 1.15 x 10^-17) ^2 + (0.021956030- 1.15 x 10^-17) ^2

All divided by 29

=0.0026

magic









 

Monday, January 28, 2013

Hypothesis Testing

Chapters 1,2,4,5, and 6 covered.  Chapter 3 is 'bonus'.  

Bayesian (boo) vs. Frequentists (yea!):

http://oikosjournal.wordpress.com/2011/10/11/frequentist-vs-bayesian-statistics-resources-to-help-you-choose/

Bayes' Theorem explained:

http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/ 

For R programmers check the following link:

http://meandering-through-mathematics.blogspot.com/2011/05/bayesian-probability.html

Resource summing up hypothesis testing:
http://www.sjsu.edu/faculty/gerstman/StatPrimer/hyp-test.pdf

Type I and Type II error:
Type I- Falsely rejecting the null hypothesis.  To accept the significance of our result mistakenly.
Type II- The opposite.  Falsely rejecting the significance of a result.  Falsely accepting the null hypothesis.

For a video on type I error:

http://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/type-1-errors

(Aside ** A link for the Bonferroni correction explained:

http://www.aaos.org/news/aaosnow/apr12/research7.asp  )


The null hypothesis for s's and g's:
http://www.null-hypothesis.co.uk/science//item/what_is_a_null_hypothesis

What is a model anyway?:
http://www.sportsci.org/resource/stats/models.html







Friday, January 25, 2013

Stats 1/25/2013

Big N little n What begins with those?
Nine new neckties and a nightshirt and a nose.



Big N = Population.    Little n = sample.

(n-1) explained:

And if you are really bored at night:

Dividing standard deviation by the mean is the coefficient of variation.  Great for analyzing variation between populations. 

c_v = \frac{\sigma}{\mu}






Standard Error of the Mean (SEM) = 

**(n-1) again for samples.**

Standard error is what is typically used instead of standard deviations.  As such, error bars in graphs are typically calculated using the standard error.  





Kurtosis:







Next week!!  Hypothesis testing and the assumption of our distributions.










Friday, January 18, 2013

Stats 1/18/2013




How to Look at Graphs: Frequency Distribuions



Bin size...can turn bins into classes
Random distribution should be a clumped distribution.  This is because one that appears evenly dispersed may be hyper-dispersed, which is a non random separation of the data. For an example, check out this site:  http://2600hertz.wordpress.com/2010/03/12/how-random-is-random/

Mean, Median, and Mode:
http://www.fgse.nova.edu/edl/secure/stats/lesson1.htm

Geometric Mean:
http://www.cliffsnotes.com/study_guide/Geometric-Mean.topicArticleId-18851,articleId-18817.html


Range show distance between most extreme values.  

And the standard (NOT AVERAGE) deviation:

NOTE** the n-1 (vs. n) is used for samples versus the entire population.  See fudge factors next week

Or the variance:


To compare deviations of two different populations that may be on different scales:

To analyze which of two samples from two different populations differs 'more' from the mean:

TYPES OF DISTRIBUTIONS:
Poisson:




m=8 is a special case called the normal distribution.

Friendly fudge factor next week!!




Wednesday, January 16, 2013

There are three kinds of lies: lies, damned lies, and statistics



There are three kinds of lies: lies, damned lies, and statistics -Marky Mark Twain




















An observation (~individual) defined:
http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Glossary:Observation_unit

A sample (~population) defined:
http://www.stats.gla.ac.uk/steps/glossary/sampling.html

"PCA principle components analysis is regression in more than two dimensions" - Francisco Moore

Repeated measures will be revisited and can be seen here:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/repmeas.PDF

Types of variables:
http://www.unesco.org/webworld/idams/advguide/Chapt1_3.htm

Meristic for fish people:
http://en.wikipedia.org/wiki/Meristics

A paper on error and philosophy:
http://www.ets.org/Media/Research/pdf/PICANG12.pdf


'Never ever ever ever ever used a derived variable in stats.  Unless you have to.  Distributions get wonky. '  paraphrased  -Francisco B.G. Moore the University of Akron Summit on Statistical Analysis 1-16-13