If you are completely new to data science, then you may have query about what is data science and what would be data science career path for beginners! read that post.
Before you jump into data science project, make sure you have good understanding python coding, specially working with database, python variable and data types, python array, loops, python dictionary objects, class and object and numpy array.
I have separated each component and trying to simplify for all beginners who want to learn data science using python code, each small task will help you to understand the process step by step, and you can learn yourself without help.
To work with data science lifecycle we need to use many different modules and libraries, let us understand following core python libraries, and how to use them.
If you have completed all above tasks successfully, then you are now familiar with all required library and objects that we work with during data science project, so let’s start with a small dataset exercise, you can download MoMA data from github.
import numpy as np import pandas as pd print("welecome to MoMa dataset"); artists = pd.read_csv('../Python-VSCode/testdata/artists.csv') print(artists)
There are two datasets, Artists and Artworks, which has around 15k data in each dataset, will be good to play with!
To simplify our understanding about data science, we classify our tutorials into three categories, first reading data from different data sources like excel, xml, rdbms, json etc, then data analysis and data visualization.
Data processing is the process of extracting data from various data source, where data are in different format, we need to write code to extract those data and fit into our standard format, so that become easy to analyse.
Analysing data to understand various relationships among different data based on usability, business requirement, analysed data should help in making business decision for stakeholders.
import statistics dataset = [17, 19, 11, 21, 23, 46, 29] output = statistics.variance(dataset) print(output)
In data visualization process, we need to create some graphical representation of data that will be easy to understand during presentation, we use different type of charts, colours etc.