Rss Feed Tweeter button Facebook button Technorati button Reddit button Linkedin button Webonews button Delicious button Digg button Flickr button Stumbleupon button Newsvine button

A Waage Blog

Ruby, Rails, Life

How I finally got Vowpal Wabbit 7.0 installed on OSX 10.6 Snow Leopard

with 3 comments

I’ve read all the other tutorials online, but none of them really worked. It was more trouble than I expected, but here’s what I had to do:

First Step: make sure you have installed Homebrew. This is the package manager I used to install all the other pre-requisites.
1. install boost

$ brew install boost
$ brew ln boost

2. install automake / autoconf

# May prompt you to overwrite
$ brew install automake
$ brew ln automake
$ brew install autoconf
$ brew ln autoconf

3. install glibtool

$ brew install libtool
$ brew ln libtool

4. symlink glibtoolize as libtoolize

# I guess homebrew installs glibtoolize, so I just had to create a symlink
$ cd /usr/local/bin
$ ln -s glibtoolize libtoolize

Now, we can finally successfully run autogen.sh and the rest of them as described in the README

$ git clone git://github.com/JohnLangford/vowpal_wabbit.git
# Checkout whatever branch you want
$ git checkout -b v7.0
$ ./autogen.sh
$ ./configure
$ make
$ make install

Written by Andrew Waage

November 8th, 2012 at 5:03 pm

R Dummy Coding for Categorical (Nominal) Data

without comments

When I’m pre-processing data as input for some classification / clustering algorithm, one of the most common things I need to do each time is convert a categorical attribute into a long, sparse binary vector. For example, if a variable is named “Color”, and the different values present in the data are “red”, “blue” and “green”, here is an easy way to create the dummy vector of attributes. It also handles creating nice column names for the new attributes, so you get 3 binary columns with nice column names like “Color_red”, “Color_blue”, and “Color_green”.

# Include these two functions in your R script or helpers file, and call it like this:
mydataframe <- replace_col_with_dummy(mydataframe, 'Color')
# create dummy coding for category data
dummy_cat<-function(column_name, column){
  idx <- sort(unique(column))
  dummy = mat.or.vec(length(column),length(idx))
  for (j in 1:length(idx)) {
    dummy[,j] <- as.integer(column == idx[j])
  }
  colnames(dummy) <- gsub("[ ]", "_", paste(column_name, idx, sep="_"))
  return(dummy)
}

replace_col_with_dummy <-function(dataframe, column_name){
  dataframe <- cbind(dummy_cat(column_name, dataframe[, column_name]), dataframe[, !(names(dataframe)  %in% c(column_name))])
  return(dataframe)
}

Written by Andrew Waage

October 25th, 2012 at 10:01 pm

Ruby on Rails Action Named “status” is Reserved

without comments

I just spent over an hour debugging a really frustrating problem. Apparently, defining a controller action as “status” is no good!

It will not break explicitly, but will create all kinds of weird chaos to occur. Please be advised!

Do NOT do this in a controller!

class MyController < ApplicationController
  ## DONT DO THIS!!!
  def status
  end
end

Save yourself some headache :)

Written by Andrew Waage

October 3rd, 2012 at 12:01 am

Rails testing with Machinist 2, Rspec, Database Cleaner Gem

with one comment

QUICK vent and advice when using Machinst2 and Database Cleaner to test in Rails:

TURN OFF MACHINIST CACHING!

Add this to your environments/test.rb file:

Machinist.configure do |config|
  config.cache_objects = false
end

Machinist tries to do some weird caching to make your tests run faster. But, it doesn’t quite work the way you’d expect. If you are running into strange problems where your objects are persisting through many tests, even though you are using DatabaseCleaner after each test, you might try this. If you run into problems where running one test at a time works, but running “rake spec” results in errors, this is also worth a shot. Don’t let Machinist caching drive you nuts! :)

Sidenote: In my experience, the best way to debug these errors that appear when running the entire test suite, but do not appear when running individual tests is to use rspec to run all but one test. Remove one at a time, and see if removing that single test helps eliminate errors.
Example:

# If this gives errors:
$ bundle exec rspec ./spec/models/user_spec.rb ./spec/models/account_spec.rb ./spec/models/favorite_spec.rb
# Try removing the first
 $ bundle exec rspec ./spec/models/account_spec.rb ./spec/models/favorite_spec.rb
# Try removing the 2nd
 $ bundle exec rspec ./spec/models/user_spec.rb ./spec/models/favorite_spec.rb
# Repeat...

Written by Andrew Waage

April 11th, 2012 at 6:59 pm

Install Comodo PositiveSSL Certificate with Node.js

with 5 comments

So today I tried to install the cheap Comodo PositiveSSL certificate to use on my Node.js / express.js server. Unfortunately, all the documentation and examples of installing an SSL certificate on a Node.js server only mention two options in the createServer() method (See my full example here) :

var https = require('https');
var fs = require("fs");

var https_options = {
  key: fs.readFileSync("/path/to/server.key"),
  cert: fs.readFileSync("/path/to/mydomain.crt")
};
var https_server = https.createServer(https_options);

However, with the PositiveSSL certificate, Comodo will actually send you 3 files:
1) PositiveSSLCA2.crt
2) AddTrustExternalCARoot.crt
3) mydomain.crt

This is quite confusing for someone who doesn’t really understand (nor want to understand) all the details of how an SSL certificate works. Which one do I use for the cert: option??

Naturally, I started with the mydomain.crt file. This led to a cryptic web browser error message:
“this certificate was signed by an unknown authority”

So, a bit of googling found that when installing the PositiveSSL cert on Apache servers, you must use a chain file (mod_ssl option: SSLCertificateChainFile). If you check the Apache mod_ssl documentation you will see that this file is a concatenation of certificate files:

“Such a file is simply the concatenation of the various PEM-encoded CA Certificate files, usually in certificate chain order.”

So, what you have to do is the following:
1) Create a “bundle” file by concatenating the PositiveSSLCA2 and AddTrustExternalCARoot certificates

cat PositiveSSLCA2.crt AddTrustExternalCARoot.crt > mydomain.ca-bundle

2) Add this certificate as the “ca” option when creating your Node.js sever:

var https = require('https');
var fs = require("fs");

var https_options = {
  ca: fs.readFileSync("/path/to/mydomain.ca-bundle"),
  key: fs.readFileSync("/path/to/server.key"),
  cert: fs.readFileSync("/path/to/mydomain.crt")
};
var https_server = https.createServer(https_options);

This should properly set up the CA chain so that browsers can verify the SSL certificate.

Written by Andrew Waage

March 4th, 2012 at 5:42 pm