Python pandas tutorial with example

Python Pandas tutorial for beginners, learn how to install python pandas in windows and use this tool for analyzing and learning from data.

Install Python Pandas on Windows

You need to install pandas in your local development environment, run command

pip install pandas

If you are using visual studio code as editor, then go to terminal and run the command.

Why to use python pandas?

Python panda is a data-management tool, this toll is for data cleaning, transforming, and analyzing, very similar to SQL SSIS.

We can work with data RDBMS or CSV file using Python Pandas.

There are many in-built functionalities in python pandas, which helps finding average, max, or min of any column, correlation among columns etc.

It also has function like sorting, filtering rows or columns with any specified criteria

Python pandas also helps in visualizing data with help from Matplotlib charting like Plot bars, lines, histograms, bubbles etc.

Load data from excel using pandas python

Loading data from excel file in python is just two lines code, Super easy!

url =path+ "\\testdata\\world_cup_results.xlsx"
df = pd.read_excel(url)

Here is an example of how to load data from excel file and show matrix using python. Pandas tool is really useful for data visualization.

import pandas as pd
import os
from pandas.plotting import scatter_matrix
from matplotlib import pyplot

path=os.getcwd()
url =path+ "\\testdata\\world_cup_results.xlsx"
df = pd.read_excel(url)
print (df)

df.hist()
pyplot.show()

scatter_matrix(df)
pyplot.show()

Here are some graphical representation of excel data, notice, how the method pyplot.show() is used for creating a pop window.

python pandas scatter example

So whenever you think of developing analytics in python application, python pandas can be very useful.

Now let's look at how differently we can load and manipulate data using panda’s functionality.

Loading selective column from csv file or creating dataset with few selective columns, in example below we are using pd.read_csv function just like pd.read_excel. here instead of loading all columns from my csv file, I am loading just two columns by specifying them usecols=[2,5].

import numpy as np
import pandas as pd
artworks1 = pd.read_csv('../Python-VSCode/testdata/Artworks.csv',usecols=[2,5])

Alternately, we can use panda.DataFrame function to select columns we want in dataset.

import pandas as pd
artworks = pd.read_csv('../Python-VSCode/testdata/Artworks.csv')
artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality'])
print(artworkdt)

We also can write where clause, here adding filter with condition, for example we want to load data where artist name is “Thomas Bewick” .

import pandas as pd

artworks = pd.read_csv('../Python-VSCode/testdata/Artworks.csv')

artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality'])

filter = artworkdt["Artist"] == "Thomas Bewick"

atr = artworkdt.where(filter)

print(atr)

loading data from mysql into panda dataframe

As per pandas' documentation, we should use SQLAlchemy Or sqlite3 to connect mysql database to load data into panda dataframe.

Here i am using pymysql to connect mysql, fetching data from database, then adding them to a list of dictionary! then loading data from there pd.DataFrame(list_of_dicts)

import pymysql
import os
import json
import pandas as pd

myConnection = pymysql.connect(
          host=apikey['dbhost'],
          user=apikey['dbusername'],
          password=apikey['dbpassword'],
          database=apikey['database'],          
          charset='utf8mb4',
          cursorclass=pymysql.cursors.DictCursor)

mycursor = myConnection.cursor()
mycursor.execute("select * from world.country")
list_of_dicts =[]
myRecordset = mycursor.fetchall()

for r in myRecordset: 
    # convert tuple into dictionary    
    myrow=dict(r)
    list_of_dicts.append(myrow)
mycursor.close()

df = pd.DataFrame(list_of_dicts)
df_records = pd.DataFrame.from_records(list_of_dicts) 

print(df_records)

Notice, while reading from cursor, we need to convert tuple into dictionary myrow=dict(r), then to add them in a dictionary list list_of_dicts.append(myrow).

You may be interested to learn more about data analytics skills, courses etc.

You may be interested in following tutorials

Python Tutorial | Python Course | Python Interview Questions

Python pandas library

Learn python programming with free python coding tutorials.