Setting up your Python Environment for Machine Learning

Python has become a language for both researchers and developers. It is the ideal language for rapid prototyping of your thoughts. It has become the first choice for machine learning as well as for data science. Therefore you might be interested in python for many such reasons.

In my opinion, the first thing before learning a language is to set up its development environments meaning installing compiler, dependencies and IDE (Integrated development Environment).  There are many different ways for people with different level expertise.

Here I am going to discuss about setting up python environment with minimal effort and complexity and how to get required packages for Machine Learning and Data Science. As I am working with windows, the following steps are for Windows.

Step 1: Get the Python compiler and IDE. Currently there are two major version of Python: Python 2.7 and Python 3.4. Python 3.4 is not yet fully flourished. I mean not all packages are compatible with python 3.4. So I suggest you to start with Python 2.7.

To ease this process, I would strongly  recommend to use Canopy Express. Its free, pre-build with python 2.7 and comes with 100+ essential and useful packages that would make your life easy. The installation is just double click magic 🙂

Step 2: Its highly recommended that you set your environment variable(Path variable) for Python.

Step 3: Now start Canopy Express and run  a hello world code. Voila!!!!!! You are done with primary python setup.

Step 4: Next step is to look for the installed Python packages. Click on the Package managers  then Installed Packages. You will find that essential python packages like numpy, scipy,matplotlib etc are already installed. Actually all the major machine learning libraries are build upon these three core python packages.Make sure you have “pip” package installed. Pip is for installing python packages from online repository. So you are almost done the dependencies installation. Life is Beautiful, isn’t it??

A bit more way to go. Look at the available packages. Scroll down and which ever you find interesting just install it clicking right there.

The Packages that are mostly important for Machine Leaning, data Mining and Data Science are as follows:

  1. Scikits Learn for Machine learning Algorithms
  2. Pandas For Data Analytic tasks
  3. Crab for Recommendation system
  4. NLTK for Natural Language Processing Tool Kit

Of course not limited to these. There are lot of libraries.

The important case is if you don’t have a library under canopy but yet its important for you, how will you get it??

The answer is simple. Start Canopy, Go to Tools->Canopy Command Prompt. and a black screen like windows command prompt will appear. Just Write  “pip install Package-name”. For example to install NLTK,  “pip install -U nltk”. For most cases you find this command in package webpage.

There is a difficulty installing Crab. It has a dependency with scikit learn and visual C++. To to make life easy, Just Install Visual Studio Express which is free and you can install it from Microsoft directly. Set the path variable appropriately.

After that run the command:                         pip install -U crab.

If its not working yet, Follow the following steps: a bit notorious:

  1. Install minGW cross platform gcc. I installed codeblocks-13.12mingw-setup.exe. it has mingw32 inside. Set the environment variable.
  2. Download the Crab Code base from Github: https://github.com/muricoca/crab
  3. Unzip and rename to a suitable name say Crab.
  4. open the folder and the find the setup.cfg file.  the content of the file should be changed as follows:

    [build]

    compiler=mingw32
    [build]

    compiler=mingw32

  5. Open Windows command prompt in this folder. SHIFT+Right Click -> Open Command Line here. 
  6. Write python setup.py install
  7. By now you should have your Crab install and ready to work.
  8. If not please consult this page for other options:  http://muricoca.github.io/crab/install.html#installing-an-official-release

By now you have your python environment set for working with basic machine learning algorithms.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.