RSS

Tag Archives: R

Normality Testing: is it normal?


It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

 
Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science

 

Tags: , , , ,

Scala versus Python and R: software archaeology in bioinformatics


When one of the scala-user members has mentioned a bioinformatics package called GATK (Genome Analysis Toolkit) and its use of Scala recently, I’ve decided to take a further look into this matter. Thanks to the valuable Ohloh service, amateur software archaeology has never been easier! After a brief visit to https://www.ohloh.net/p/gatk I’ve learned that GATK software has had 12,871 commits made by 77 contributors  within the last 5 years, representing 99,078 lines of code.

I wanted to learn more about its source code languages, and decided to drill down by visiting https://www.ohloh.net/p/gatk/analyses/latest/languages_summary. What I have discovered was surprising. Let me share the facts I’ve found so far: The project did not have any Scala code until recently, for example in July, 2009, it had no Scala, whereas it contained 4410 lines of Python and 56 lines of R code:

beforeScala

Read the rest of this entry »

 
3 Comments

Posted by on February 16, 2014 in Programlama

 

Tags: , , , ,

R in Action: if only I had this book when I was doing ANOVA back then…


R in Action

R in Action


R in Action fills an important gap by introducing the basics of R and statistical data analysis from a very practical and pragmatic point of view. It has a broad coverage and after introducing basic data set manipulation techniques and commands, it goes on to describe many important statistical data analysis techniques from simple linear regression to more advanced methods such as ANOVA, power analysis, resampling, bootstrapping, generalized linear models, PCA, factor analysis, and handling missing values.

One of the nice features of the book is the description and discussion of many different visualization methods. The author, using many interesting and real world examples, shows how basic and more advanced visualization methods in R can be very helpful in exploring and understanding many different types of data sets.
Read the rest of this entry »

 
2 Comments

Posted by on February 18, 2012 in General, Programlama

 

Tags: ,