Manage and Repeat Experiments

It has been painful that when doing experiments I forget saving the parameters and lose track of them when the results come out. OK, the truth is that I’m lazy and I change the parameters in code and hope that my brain can remember the difference. Well it can’t and when I want to repeat any analysis, things start to bite me back. So I searched a bit online and see if there are some good strategies out there to manage the experiments. And luckily I did find some excellent post:

http://stackoverflow.com/questions/6437213/strategies-for-repeating-large-chunk-of-analysis/6550914#6550914

http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets

In these two posts, one answer mentioned that he uses JSON files to save parameters for different experiments and when reproduction is needed, he can simply import them. Quoting from the answer: “Everything in between is just code that runs with a given parametrization, but the code shouldn’t really change much, should it?”

Since I’m using R recently, I wrote a short script that help a user create a list of parameters and export them to a JSON file. It is kind of raw but  I hope someone will find it useful. It doesn’t have to be R. You can write your own scripts in a language you prefer.

Code on github:

https://github.com/kiribatu/Kiribatu-R-Toolkit/blob/master/docs/parameter_configuration.md

Enhanced by Zemanta
Advertisements
Manage and Repeat Experiments

Object-oriented Programming in R

The following are some excellent posts about OOP in R. Well OOP using S3. Be sure to read about the last post about UseMethod’s voodoo. Really cool.
I found a series of posts from John Myles White quite useful:

  1. The Most Basic Elements of Object-Oriented Programming in R
  2. Object-Oriented Programming in R: The Setter Methods
  3. Object-Oriented Programming in R: The Setter Methods
Object-oriented Programming in R

Tips for Plot in R (1) — inconsistent type of coordinate parameters

The plot function in R seems really simple. But I ran into the following problem and it took me some time to figure it out.

# suppose you have two vectors v1 and v2
v1 <- c(1,2,3)
v2 <- c(3,4,5)
# we also create a data frame using v1 and v2
df <- data.frame(v1=v1, v2=v2)
# to plot v1 against v2 (1)
plot(v1, v2)
# or we can do
plot(df$v1, df$v2)
# BUT we cannot use plot(v1, df["v2"])
# This will throw an error that 'v1' and 'df["v2"]' 
# have different length

This error confused me a bit since I think for sure v1 and df[“v2”] have the same length 3. Well it turns out they don’t.

# if you check the type of v1 and df["v2"]
class(v1) # this returns a "numeric" vector with length 3
class(df["v2"]) # this returns a "data.frame" with length 1

Ops, we got two different types of variables. We need to convert our “data.frame” to a numeric vector we can use.

# instead of using df["v2"], we could use either df$v2 or df[,"v2"].
plot(v1, df[,"v2"])
Tips for Plot in R (1) — inconsistent type of coordinate parameters

grep in R

Sometimes we need to find the indices of columns in a data frame that match a pattern and here comes grep:

v1 <- c(1,2,3)
v2 <- c(3,4,5)
tt2 <- c(5,6,7)
tt3 <- c(9,0,8)
df <- data.frame(v1=v1, v2=v2, tt2=tt2, tt3=tt3)
# suppose you want to find the columns with the pattern "tt"
ttIndices <- grep("tt", colnames(df))
grep in R