how to learn data science

How to learn Data Science?

In 2015 when I came to Bangalore, I was looking to start my own start-up and boy, when I explored the ecosystem, I knew it is going to be tough.

The initial plan was to create a property website where I can do the bulk selling of local apartments by collecting data of interested buyers but I was not confident enough to make it as my only option since, the figure 9 out of 10 start-ups fail was somewhere hunting me.

So, as part of plan B, I started learning Data Science (of course from Andrew Ng). I saw many people referring it as the holy grail of data science. But after 4 years, training and more than 150 students across 30 different corporate, I realized that it was a drop in the ocean.

In this article, I will try to share my experience on how to learn data science from scratch and what is the right career path for data scientist.

Choose Python!

Learning path for data science starts with a programming language. A programming language is a medium to transfer human logics to machines. The two preferred languages are R and Python.

I prefer Python over R, as it is easy to learn and there are some awesome deep learning libraries in Python, which you can’t find in R.

Data science career path can be confusing unless you have a clear roadmap and strategy.

Decide a timeline for how much time you need to spend on Python and stick to it, you can always learn more of it with real projects while going through our Data Science Course in Bangalore.

I kept 20 hours max time and learnt new tricks, with the core subjects. Got a fair understanding about strings, lists, dictionaries, sets, tuples, loops, string formatting, functions, regular expressions, virtual environment and classes.

You can practice small Python codes at HackerRank. The basics you will learn will build your confidence to write complex codes!

By this time, you can get started with an IDE. I prefer Jupyterlab. Remember, practice is the key to become a good coder.

Only coding can be frustrating

only coding can be frustrating
Lets Code!

Congrats! You know Python, the most popular programming language of this decade. You might also get tempted to learn advance skills like Numpy, Scipy, and Pandas etc. I will share my thoughts on repeating same thing over time, it is going to be boring.

No matter how exciting it seems initially, you will always end up procrastinating! It’s the time for change, it’s time for some math and stats. Prepare a schedule and learn statistics for at least an hour for next 2 months.

Start with descriptive statistics, basics like mean, median, linear algebra basics and simultaneously try to implement it with Numpy and Scipy library. Learning with coding will give you a good way to master these libraries.

After the descriptive statistics, try to learn basics of probability distributions and Bayes theorem. This is a good time to get started with visualization tools like Matplotlib and Seaborn.

Spend more time on Normal or Gaussian distributions (learn formulas for both univariate and multivariate), Binomial distribution, Multinomial distribution and Poisson distribution as these are techniques you will use most of the time.

Under Inferential statistics, focus on concepts like Hypothesis testing, T-distribution, Confidence interval and Chi-Square distributions thoroughly. Statistics will take 45 to 60 hours of your time.

Kaggle time

Now you have small Jupyter notebooks sitting in your local system and you know how to write python codes. It is time to learn Exploratory Data Analysis which is widely known as EDA.

To get started with EDA, register yourself with Kaggle and download some simple datasets. These data will be mostly in .csv extensions (try to work with datasets with more likes). The goal is to clean the data, understand the data and derive meaningful insights from it.

The tool you need for EDA is Pandas, it will help you in data wrangling and you need visualization tools like Matplotlib, Plotly and Seaborn frequently.

Don’t look at other’s solutions unless you gave your best shot. Students often ask the question, how to learn data science effectively? But they mostly skip this important step.

You know data, now it’s time to learn ML

Machine learning is a sub section of Data science. There are 5 types of analytics that you will see in the learning path for data science.

Predictive analysis helps you to answer questions like what can happen and is the key reason why we use Machine Learning. ML helps to automate decision making process and can produce all kinds of forecasting.

In last 5 years, ML has evolved and now capable of predicting even images (Computer Vision tasks) and text data (NLP models). But let’s start with basics.

Start with basics of regression techniques (7 types of regression) and classification techniques such as Logistic Regression, Support Vector Machines (SVM), Random Forrest, Xgboost, etc. You can learn all these techniques with python code from my favourite book “PythonDataScienceHandbook“.

This book will help you in your career path of data scientist and enable you to crack tough machine learning interviews as well. If you need the mathematical explanation about machine learning models, then you can learn it from videos of University of British Columbia from here.

Remember to check Scikit-Learn official documentation time to time, for updates and changes. Take a very good overview of model optimization like scoring techniques, gradient descent, confusion matrix, SMOTE, ROC and AUC etc.

Deep Learning is waiting

Companies now-a-days are working on Image processing and Natural Language Processing (NLP) extensively. It is possible that, your dream company will ask you projects in deep learning.

You can learn deep learning from many free sources available but the one course that really helped me to learn and get started was deep learning with PyTorch by Facebook and Udacity.

You can also master deep learning with Tensorflow and Keras. Deep learning or Neural Networks is no more an optional subject. In your learning path for data science, add 80 hours for deep learning. Concentrate on concepts like LDA, transfer learning, machine translation with attention and object detection.

Do you need Big Data and DevOps?

Well, its controversial.

SQL is a must!

Personally, I did spend some time to learn big data processes like HDFS, MongoDB, Pyspark, etc for my products. I will not suggest you learn the coding part of it but understanding of its working is must. Spend some time to learn on Microservices and DevOps. Knowledge about Docker and Kubernettes are boon these days.

You should also learn the web application architectures and cloud systems if possible. There are lots of free courses on internet, where you can learn how to implement this big data pipeline in AWS or Azure.

Well, that’s it folks. In case, I forgot something to mention regarding how to start learning data science, write it in comment. I will update this post with your feedback. Learning path for data scientist is not easy but it is very exciting though. Best of luck for your data scientist career path and thanks for reading and yes that website never worked 🙂