Using serialisation (with pickle) with machine learning training and prediction

So I’ve been learning the various machine learning algorithms and more often than not I’m finding that I make small tweaks to the python program and wait for ages as the “train” (or fit) part of the algorithm executes before getting to the predict part. A quick optimisation we can do to save on this time is to serialise the classifier object after the training and load it up again the next time we’re running the program again. The pattern is as follows:

#!/usr/local/bin/python
import pdb
import cPickle as pickle

import recsys.algorithm
from recsys.algorithm.factorize import SVD
from recsys.datamodel.data import Data
from recsys.evaluation.prediction import RMSE, MAE
from recsys.utils.svdlibc import SVDLIBC

try:

    clf = pickle.load(open("recsyssvd.p", "rb")) # see if you can read the picked classifer from a file (here called recsyssvd but could be called anything)

except:

    clf = SVD()  # couldn't find serialised classifier so instantiate a new classifier. we're using SVD as an example

    recsys.algorithm.VERBOSE = True


    # load movielens data

    svd.load_data(filename=dat_file, sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})


    # compute svd

    k = 100

    svd.compute(k=k, min_values=10, pre_normalize=None, mean_center=True,

    post_normalize=True)

    pickle.dump(svd, open("recsyssvd.p","wb"))  # serialise the classifier


 
    # movie id's
ITEMID1 = 1      # toy story 
ITEMID2 = 1221   # godfather II
    
    # get movies similar to toy story
clf.similar(ITEMID1)
    
    # get predicted rating for given user & movie
MIN_RATING = 0.0
MAX_RATING = 5.0
USERID = 1
ITEMID = 1
    
    # get predicted rating
pred = clf.predict(ITEMID, USERID, MIN_RATING, MAX_RATING)
actual = clf.get_matrix().value(ITEMID, USERID)
print 'predicted rating = {0}'.format(pred)
print 'actual rating = {0}'.format(actual)

The code above happened to use SVD but it could be any other classifier (from the libraries or handrolled). Also, I imported cPickle instead of pickle as it’s meant to be faster.

– Sarwar Bhuiyan

Advertisements
Using serialisation (with pickle) with machine learning training and prediction

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s