Rss Feed Tweeter button Facebook button Technorati button Reddit button Linkedin button Webonews button Delicious button Digg button Flickr button Stumbleupon button Newsvine button

A Waage Blog

Ruby, Rails, Life

Archive for the ‘Machine Learning’ tag

How I finally got Vowpal Wabbit 7.0 installed on OSX 10.6 Snow Leopard

with 3 comments

I’ve read all the other tutorials online, but none of them really worked. It was more trouble than I expected, but here’s what I had to do:

First Step: make sure you have installed Homebrew. This is the package manager I used to install all the other pre-requisites.
1. install boost

$ brew install boost
$ brew ln boost

2. install automake / autoconf

# May prompt you to overwrite
$ brew install automake
$ brew ln automake
$ brew install autoconf
$ brew ln autoconf

3. install glibtool

$ brew install libtool
$ brew ln libtool

4. symlink glibtoolize as libtoolize

# I guess homebrew installs glibtoolize, so I just had to create a symlink
$ cd /usr/local/bin
$ ln -s glibtoolize libtoolize

Now, we can finally successfully run autogen.sh and the rest of them as described in the README

$ git clone git://github.com/JohnLangford/vowpal_wabbit.git
# Checkout whatever branch you want
$ git checkout -b v7.0
$ ./autogen.sh
$ ./configure
$ make
$ make install

Written by Andrew Waage

November 8th, 2012 at 5:03 pm

R Dummy Coding for Categorical (Nominal) Data

without comments

When I’m pre-processing data as input for some classification / clustering algorithm, one of the most common things I need to do each time is convert a categorical attribute into a long, sparse binary vector. For example, if a variable is named “Color”, and the different values present in the data are “red”, “blue” and “green”, here is an easy way to create the dummy vector of attributes. It also handles creating nice column names for the new attributes, so you get 3 binary columns with nice column names like “Color_red”, “Color_blue”, and “Color_green”.

# Include these two functions in your R script or helpers file, and call it like this:
mydataframe <- replace_col_with_dummy(mydataframe, 'Color')
# create dummy coding for category data
dummy_cat<-function(column_name, column){
  idx <- sort(unique(column))
  dummy = mat.or.vec(length(column),length(idx))
  for (j in 1:length(idx)) {
    dummy[,j] <- as.integer(column == idx[j])
  }
  colnames(dummy) <- gsub("[ ]", "_", paste(column_name, idx, sep="_"))
  return(dummy)
}

replace_col_with_dummy <-function(dataframe, column_name){
  dataframe <- cbind(dummy_cat(column_name, dataframe[, column_name]), dataframe[, !(names(dataframe)  %in% c(column_name))])
  return(dataframe)
}

Written by Andrew Waage

October 25th, 2012 at 10:01 pm