Data scientist advised how to learn data science from scratch, that the important thing is to focus on only one thing at a time. But what should we focus on?
The first thing to do is take a really deep breath! If you’re committed, you can do this. I’m reminded of this old saying:
Just Pick One!
Firstly, you’re on the right track, but don’t make life more difficult than it needs to be. Just learn one thing at a time. Both Python and R are perfectly good choices, but stop trying to learn both. Pick one, use it.
Personally, I like to advocate for learning Python – it’s a more versatile language, so you’ll be able to do more than just data science with it, but which you choose is much less important than picking one and sticking with it (for now). For the rest of this answer anywhere I say Python you can easily substitute R.
Start from the Basics
First, learn the basics of the language. Once you’ve got them down, start to learn about the basic packages that are used for data analysis in Python. The idea is that each new thing you try should build on the things you have before, so rather than ‘just’ learning something new, you’re also practicing the things you’ve already learned.
What I found that worked for me was doing some basic statistics and probability at the same time. I would use the stats to give me a break from the python, and vice versa. For that, I would recommend.
Build some Projects
The other thing that is going to help you is finding somewhere to apply what you’ve learned. The most obvious way is to build projects. It’s an approach we use at , where I work. We have an online course that teaches data science, and we’ve found that our projects are where our students learn the most.
The projects you build might be as simple as building a simple calculator in python eg, that you pass a parameter to of height and weight and it calculates the BMI (body mass index).
If you’re a little further in, you might find an interesting datasets (there are some sources here:) and practice reading in the data and doing some basic cleaning.
As you progress you might start exploring the data, doing visualizations, and eventually some end to end machine learning projects.
The key here is that you build something of your own using the skills you’ve learned. It does not have to take hours or be the best thing ever, it’s just a way of consolidating skills as you go. You’ll find there’s things you want to do that you don’t know how to – google them and work out how.
Find an application that interests you
Pick something that interests you – it might be the stock market, it might be sports statistics, it might be movies or technology. Pick projects that you’re interested in as it will help your motivation.
After you feel comfortable with Python (or R, if you chose it), the next thing you should attack is databases and SQL. Even though it’s been around since the 1980’s, it’s still so important for working with data, and if you can’t write a query you’ll find a lot of roadblocks in the longer term.
Other than that, the other tools will come to you – when you start working with larger datasets, you’ll need something to help you there, and so you can explore spark or something similar, depending on what you’re already working with.
Lastly, you might like to find somewhere that has built a curriculum for you. At Dataquest, ourwas built to take the worry out of what to learn. Much like the strategy above, We start by teaching you the basics of Python, and then onto data analysis and visualization libraries, then onto other tools like the command line, SQL, APIs, Data Scraping, Probability, Statistics, before rounding out with Machine Learning and modules on working with larger datasets.
One of our core philosophies is learning through doing, so our guided projects along the way help you solidify your skills, as well as start on building a data science portfolio.
I hope this is helpful, if you have any questions please comment below or send me a message, I would be more than happy to help.