Python Pandas tutorial for beginners, learn how to install python pandas in windows and use this tool for analyzing and learning from data.
You need to install pandas in your local development environment, run command
pip install pandas
If you are using visual studio code as editor, then go to terminal and run the command.
Python panda is a data-management tool, this toll is for data cleaning, transforming, and analyzing, very similar to SQL SSIS.
We can work with data RDBMS or CSV file using Python Pandas.
There are many in-built functionalities in python pandas, which helps finding average, max, or min of any column, correlation among columns etc.
It also has function like sorting, filtering rows or columns with any specified criteria
Python pandas also helps in visualizing data with help from Matplotlib charting like Plot bars, lines, histograms, bubbles etc.
Loading data from excel file in python is just two lines code, Super easy!
url =path+ "\\testdata\\world_cup_results.xlsx" df = pd.read_excel(url)
Here is an example of how to load data from excel file and show matrix using python. Pandas tool is really useful for data visualization.
import pandas as pd import os from pandas.plotting import scatter_matrix from matplotlib import pyplot path=os.getcwd() url =path+ "\\testdata\\world_cup_results.xlsx" df = pd.read_excel(url) print (df) df.hist() pyplot.show() scatter_matrix(df) pyplot.show()
Here are some graphical representation of excel data, notice, how the method pyplot.show()
is used for creating a pop window.
So whenever you think of developing analytics in python application, python pandas can be very useful.
Now let's look at how differently we can load and manipulate data using panda’s functionality.
Loading selective column from csv file or creating dataset with few selective columns,
in example below we are using pd.read_csv
function just like pd.read_excel.
here instead of loading all columns from my csv file, I am loading just two columns by specifying them usecols=[2,5]
.
import numpy as np import pandas as pd artworks1 = pd.read_csv('../Python-VSCode/testdata/Artworks.csv',usecols=[2,5])
Alternately, we can use panda.DataFrame function to select columns we want in dataset.
import pandas as pd artworks = pd.read_csv('../Python-VSCode/testdata/Artworks.csv') artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality']) print(artworkdt)
We also can write where clause, here adding filter with condition, for example we want to load data where artist name is “Thomas Bewick” .
import pandas as pd artworks = pd.read_csv('../Python-VSCode/testdata/Artworks.csv') artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality']) filter = artworkdt["Artist"] == "Thomas Bewick" atr = artworkdt.where(filter) print(atr)
As per pandas' documentation, we should use SQLAlchemy Or sqlite3 to connect mysql database to load data into panda dataframe.
Here i am using pymysql to connect mysql, fetching data from database, then adding them to a list of dictionary!
then loading data from there pd.DataFrame(list_of_dicts)
import pymysql import os import json import pandas as pd myConnection = pymysql.connect( host=apikey['dbhost'], user=apikey['dbusername'], password=apikey['dbpassword'], database=apikey['database'], charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor) mycursor = myConnection.cursor() mycursor.execute("select * from world.country") list_of_dicts =[] myRecordset = mycursor.fetchall() for r in myRecordset: # convert tuple into dictionary myrow=dict(r) list_of_dicts.append(myrow) mycursor.close() df = pd.DataFrame(list_of_dicts) df_records = pd.DataFrame.from_records(list_of_dicts) print(df_records)
Notice, while reading from cursor, we need to convert tuple into dictionary
myrow=dict(r)
,
then to add them in a dictionary list list_of_dicts.append(myrow)
.
You may be interested to learn more about data analytics skills, courses etc.
You may be interested in following tutorials