How to start machine learning using python

If you are completely new to machine learning, then probably you should read my earlier post what is machine learning!

If you are new to python programming, then I suggest you to go through our python coding tutorial first, that will help you to learn python coding syntax and how to work with database, it’s free!

Machine learning pre-requirement
Here we learn about python packages that are used in machine learning project, how to install and get familiar with model, dataset etc. , before we actually start with ML project!

In this post we learn how to start Machine Learning using Python, you will learn how to perform basic price prediction using python machine learning API.

Python Machine Learning Libraries

Before you start learning python machine learning, I suggest you should get familiar with following python libraries, because during machine learning we will be using those libraries extensively, if you know those library code syntax, you will able to focus more on machine learning flow rather than wondering about those library-codes looks like!

  • Matplotlib is an object-oriented plotting library, check this Matplotlib tutorial
    from matplotlib import pyplot
    
  • Panda provides all data management tools for data cleaning, analysing and transforming etc. Check this Pandas tutorial
    import pandas as pd
    
    url ="\\data\\taxi-fare-test.xlsx"
    df = pd.read_excel(url)
    print(df)
    
  • Sklearn

    sklearn is a Python module integrating classical machine learning algorithms in Python packages (numpy, scipy, matplotlib)

    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import roc_curve
    

    Here is a small example of how to use sklearn library

    # generate 2 class dataset
    X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
    
    # split into train/test sets
    trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)
    
    # fit a model
    model = LogisticRegression()
    model.fit(trainX, trainy)
    
    # predict probabilities
    probs = model.predict_proba(testX)
    
  • Numpy

    Numpy library is for python array, widely used in machine learning project, if you are not familiar with numpy, this numpy tutorial may help you some extent!

  • Scipy

    SciPy is a scientific computing package for Python language, here you can learn more about SciPy Library

  • TensorFlow
    import tensorflow as tf
    

    Tensor is multi-dimensional array (like numpy), Tensorflow is very popular for image classification and processing

    Learn more about Tensorflow

  • Theano
  • Keras

In our example below, we will be learning some of above libraries, so you need to install them in your local project.

What we do in first machine learning project ?

  1. Setting up environment

    First, we need to setup our python development environment by installing all required libraries as listed above.

    Here I am using visual studio code for python development, you can use any SDK.

  2. Create Dataset

    You need to create your dataset, You create dataset in any data source like any RDBMS or Excel or CSV anything.

    You can create panda data frame to create a dataset you want to work with.

    import pandas as pd
    artworks = pd.read_csv('../testdata/Artworks.csv')
    artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality'])
    filter = artworkdt["Artist"] == "Thomas Bewick"
    atr = artworkdt.where(filter)
    print(atr)
                    
  3. Loading Dataset

    At this stage, you need to load dataset into python object, so you can play with data, how to load data that will depend what data source you are working with, in my example i will load data from excel file.

    import pandas as pd
    import os
    from matplotlib import pyplot
    
    path=os.getcwd()
    url =path+ "\\data\\taxi-fare-test.xlsx"
    df = pd.read_excel(url)
    print (df)
    
  4. Analyzing Dataset

    You may need to understand data by changing order, removing columns, adding additional columns, grouping them etc. Get them ready to train and test algorithms.

    # check if data is correct by calling head function
    
    df = pd.DataFrame(data)
    _head= df.head(2)
    print(_head)
    

    Ideally we should analyze and organize dataset in SQL, that will be more convenient, once our dataset is ready, then we should bring data into our python code to process further!

  5. Visualizing Dataset.

    Now, you may want to see how visually data will look like, by plotting, charting etc. You can also save the visual representation in pdf format for future reference or reporting purpose.

    from matplotlib import pyplot
    df.hist() # will show histogram for each column pyplot.show()
    
    # then 
    
    pyplot.scatter(df['unemploymentrate'], df['indexprice'], color='green')
    pyplot.title('Index Price Vs Unemployment Rate', fontsize=14)
    pyplot.xlabel('Unemployment Rate', fontsize=14)
    pyplot.ylabel('Index Price', fontsize=14)
    pyplot.grid(True)
    pyplot.show()
    
  6. Evaluating different algorithms

    Try different algorithms to see which produce the best closest result

  7. Making predictions

    Finally, make prediction with real data.

What problem we solve!

In our example, we will predict fruit price based on previous year data.
Note: if you don’t have data you can download Taxi fare standard data for practice.

Start Python Console App

Start your SDE, we are using Visual Studio 2019 to write python console application for machine learning example. (You can use any Python SDE, code will remain same.)

First, we need to make sure that all required libraries are installed correctly, so let’s run the following code in your console.

print("We are learning Machine Learning at WebTrainingRoom")
import sys
print('Python: {}'.format(sys.version))
# install numpy
import numpy
print('numpy: {}'.format(numpy.__version__))
# install  matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# install pandas
import pandas
print('pandas: {}'.format(pandas.__version__))
# install scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))
# install scipy
import scipy
print('scipy: {}'.format(scipy.__version__))

If you are creating for first time, then you may need to install all packages one by one, to do that, expand your solution explorer, go to python environment, then right click to manage package, then run pip command, like pip install sklearn

Once you run the above code, here is the result you should see on your console screen, don’t worry if you see the different version. Core concept will remain same.

We are learning Machine Learning at WebTrainingRoom
Python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)]
matplotlib: 3.1.3
pandas: 1.0.1
sklearn: 0.22.1
scipy: 1.4.1
numpy: 1.18.1
Press any key to continue . . .

in progress

AI Machine Learning Examples