How to start learning data science using python

If you are completely new to data science, then you may have query about what is data science and what would be data science career path for beginners! read that post.

Before you jump into data science project, make sure you have good understanding python coding, specially working with database, python variable and data types, python array, loops, python dictionary objects, class and object and numpy array.

I have separated each component and trying to simplify for all beginners who want to learn data science using python code, each small task will help you to understand the process step by step, and you can learn yourself without help.

python data science tutotrial

Data Science Library

To work with data science lifecycle we need to use many different modules and libraries, let us understand following core python libraries, and how to use them.

Ready to start with dataset

If you have completed all above tasks successfully, then you are now familiar with all required library and objects that we work with during data science project, so let’s start with a small dataset exercise, you can download MoMA data from github.

import numpy as np
import pandas as pd

print("welecome to MoMa dataset");

artists = pd.read_csv('../Python-VSCode/testdata/artists.csv')
print(artists)

There are two datasets, Artists and Artworks, which has around 15k data in each dataset, will be good to play with!

To simplify our understanding about data science, we classify our tutorials into three categories, first reading data from different data sources like excel, xml, rdbms, json etc, then data analysis and data visualization.

  • Data Processing

    Data processing is the process of extracting data from various data source, where data are in different format, we need to write code to extract those data and fit into our standard format, so that become easy to analyse.

  • Data Analysis

    Analysing data to understand various relationships among different data based on usability, business requirement, analysed data should help in making business decision for stakeholders.

    • Measuring Data Variance
      import statistics
      dataset = [17, 19, 11, 21, 23, 46, 29]
      output = statistics.variance(dataset) 
      print(output)
      
    • Normal and Binomial Distribution
    • Poisson and Bernoulli Distribution
    • Data Correlation
    • Linear Regression
      import pandas as pd
      from matplotlib import pyplot
      from sklearn.linear_model import LogisticRegression
      from sklearn.linear_model import LinearRegression
      
      data = {'year': [2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2023,2023,2023,2023,2023,2023,2023,2023],
              'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
              'interest_rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
              'unemployment_rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
              'index_price': [1574,1257,1432,1303,1256,1754,1804,1175,1201,1189,1130,1075,1047,915,933,958,971,949,874,882,876,802,804,785]        
              }
      
      df = pd.DataFrame(data) 
      
      • Simple Regression

        Based on one input variable, our predict value change!

        model = LinearRegression()
        
        X=df['unemploymentRate'].values.reshape(-1,1) # it's 2D
        Y=df['indexPrice'].values.reshape(-1,1) # it's 2D
        lr= model.fit(X,Y)
        _p= lr.predict(X)
        print(_p)
        
      • Multiple Regression

        Multiple regression is like linear regression, but with more than one independent value, meaning that we try to predict a value based on two or more variables.

        For example index price change based on interest rate and unemployment rate, where the index_price is the dependent variable, and the 2 independent/input variables are: interest_rate and unemployment_rate

      • Logistic Regression
    • Data P-Value

      p-value is probability value. check this site to learn probability or check this for free course on probability

      model = LinearRegression()
      
      X=df['unemployment_rate'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
      Y=df['index_price'].values.reshape(-1,1) # this  has shape (XXX, 1) - it's 2D
      lr= model.fit(X,Y)
      pValue= lr.predict(X)
      print(pValue)
      
  • Data Visualization

    In data visualization process, we need to create some graphical representation of data that will be easy to understand during presentation, we use different type of charts, colours etc.

    • Chart Properties and Styling

      Matplotlib library in python is used for

    • Plot and Scatter Plots

      We can use either of these two methods to display data in line pyplot.plot(df['interest_rate'], df['index_price']) or in dotted pyplot.scatter(df['unemployment_rate'], df['index_price'], color='green') form.

      from matplotlib import pyplot
      
      pyplot.plot(df['interest_rate'], df['index_price']) #OR
      #pyplot.scatter(df['unemployment_rate'], df['index_price'], color='green')
      pyplot.title('Index Price Vs Interest Rate', fontsize=14)
      pyplot.xlabel('Interest Rate', fontsize=14)
      pyplot.ylabel('Index Price', fontsize=14)
      pyplot.grid(True)
      pyplot.show()
      
    • Python Heat Maps
    • Python 3D Charts
    • Geographical Data and Time Series

You may be interested to read:

Is SQL required for learning data science?
Knowing SQL will be very helpful, I suggest you should learn some RDBMS first, you learn many things like table structure, different type of relationships, data types, different type of joins, where clause, which will help you creating dataset and different type sql queries like select, update, insert, delete etc. Check this free sql tutorials. Therefore, Sql query knowledge will help you in learning data science effectively.
 
Python Data Science
Learn python programming with free python coding tutorials.
Other Popular Tutorials
Data Science Tutorial
Python programming examples | Join Python Course