Data science is one of the most buzzed about fields right now, and . And with good reason – data scientists are doing everything from to . Given all the interesting applications, it makes sense that data science is a very sought-after career.

数据科学是当今最热门的领域之一, 。 有充分的理由-数据科学家正在做所有事情,从到 。 考虑到所有有趣的应用程序,数据科学是一个非常受欢迎的职业,这是有道理的。

Data science is applied in many field, including in developing self-driving cars.


If you’re reading this post, I’m assuming that you’d like to learn how to become a data scientist. If you’ve already done some research, you’ve probably read dozens of guides that start with “learn linear algebra”, and end 5 years later with “learn Spark”. When I was learning, I tried to follow these guides, but I ended up bored, without any actual data science skills to show for my time. The guides were like a teacher at school handing me a bunch of books and telling me to read them all – a learning approach that’s never appealed to me.

如果您正在阅读这篇文章,我假设您想学习如何成为一名数据科学家。 如果您已经做过一些研究,则可能已经阅读了数十本以“学习线性代数”开始,并在5年后以“学习Spark”结束的指南。 在学习时,我尝试遵循这些指南,但最终无聊,没有任何实际的数据科学技能可供我展示。 指南就像学校的老师一样,递给我一堆书,并告诉我全部阅读,这是一种从未吸引过我的学习方法。

The unfortunate part about all the “become a data scientist in 5 easy years” guides is that they’re written by people who’re already expert data scientists. They look at themselves and say “what would someone need to learn to do what I do every day?” They forget what it’s like to struggle to learn something on your own, and what it’s like to need motivation to push you over the next hurdle.

关于所有“在5年内成为一名数据科学家”指南的不幸部分是,这些指南是由已经是专家数据科学家的人撰写的。 他们看着自己说:“某人每天需要学习做我要做的事情?” 他们忘记了自己学习一些东西的感觉,以及需要动力将您推向下一个障碍的感觉。

As I learned data science, I realized that I learn most effectively when I’m working on a problem I’m interested in. Instead of learning a checklist of skills, I decided to focus on building projects around real data. Not only did this learning method motivate me, it also closely mirrors the work you’ll do in a data scientist role.

在学习数据科学时,我意识到当我处理自己感兴趣的问题时,我会最有效地学习。我决定学习专注于真实数据的项目,而不是学习技能清单。 这种学习方法不仅激励了我,而且还密切反映了您在数据科学家角色中将要进行的工作。

In this post, I’ll share a few steps that will help you in your journey to becoming a data scientist. The journey won’t be easy, but it will be infinitely more motivating than following the conventional wisdom.

在本文中,我将分享一些步骤,这些步骤将帮助您成为数据科学家。 旅途并不容易,但是它将比遵循传统的智慧更有动力。

1.质疑一切 (1. Question Everything)

The appeal of data science is that you get to answer interesting questions using actual data and code. These questions can range from “can I predict whether any flight will be on time?” to “how much does the US spend per student on education?”. To be able to ask and answer these questions, you need to develop an analytical mindset.

数据科学的吸引力在于,您可以使用实际数据和代码来回答有趣的问题。 这些问题的范围可以是“我可以预测是否有航班准时到达吗?” 到“美国每名学生在教育上花费多少?”。 为了能够提出和回答这些问题,您需要发展一种分析思维方式。

The best way to develop this mindset is to start doing it with news articles. Find articles, like and . Think about:

发展这种思维方式的最佳方法是从新闻文章开始。 查找有关文章,以及 。 想一想:

  • How they reach their conclusions given the data they discuss
  • How you might design a study to investigate further
  • What questions you might want to ask if you had access to the underlying data
  • 根据他们讨论的数据,他们如何得出结论
  • 您如何设计研究以进一步调查
  • 如果您有权访问基础数据,可能要问什么问题

Some articles, like and actually have the underlying data available for download. When you can do this:

一些文章,例如文章和文章,实际上都有可供下载的基础数据。 当您可以这样做时:

  • Download the data, and open it in Excel or an equivalent tool
  • See what patterns you can find in the data by eyeballing it
  • Do you think the data supports the conclusions of the article? Why or why not?
  • What additional questions do you think you can use the data to answer?
  • 下载数据,然后在Excel或等效工具中打开数据
  • 通过眼球观察可以在数据中找到哪些模式
  • 您认为数据是否支持本文的结论? 为什么或者为什么不?
  • 您认为您还可以使用这些数据回答哪些其他问题?

Here are some good places to find data-driven articles:


After you’ve read articles for a few weeks, reflect on whether you enjoyed coming up with questions and answering them. Becoming a data scientist is a long road, and you need to be very passionate about the field to make it all the way. Data scientists constantly come up with questions and answer them using mathematical models and data analysis tools.

在阅读了几周的文章后,请反思您是否喜欢提出问题并回答它们。 成为数据科学家是一条漫长的路,您需要对这一领域充满热情,才能一路过关斩将。 数据科学家不断提出问题,并使用数学模型和数据分析工具回答这些问题。

If you don’t enjoy the process of reasoning about data and asking questions, you should think about trying to find the overlaps between data and things that you do enjoy. For example, maybe you don’t enjoy the process of coming up with questions in the abstract, but maybe you really enjoy analyzing health data or education data. I personally was very interested in stock market data, which motivated me to build a model to predict the market.

如果您不喜欢数据推理和提问的过程,则应考虑尝试找到数据与您喜欢的事物之间的重叠部分。 例如,也许您不喜欢抽象提出问题的过程,但是您可能真的喜欢分析健康数据或教育数据。 我个人对股票市场数据非常感兴趣,这促使我建立了预测市场的模型。

Before you move on to the next step, make sure that there’s something about the process of data science that you’re passionate about. I can’t emphasize this point enough. If your goal is to become a data scientist, but you don’t have a specific passion, you’re probably not going to put in the months of hard work that you’ll need to learn.

在继续下一步之前,请确保您对数据科学过程充满热情。 我不能足够强调这一点。 如果您的目标是成为一名数据科学家,但是您没有特定的热情,那么您可能就不会花很多时间来学习。

An infographic from FiveThirtyEight.


2.学习基础 (2. Learn The Basics)

Once you’ve figured out how to come up with questions, you’re ready to start learning the technical skills to start answering them. I’d start by learning the basics of programming in Python. Python is a programming language that has consistent syntax, and is often recommended for beginners. Luckily, it also has the versatility to enable you to do extremely complex data science and machine learning related work, such as deep learning.

一旦弄清楚如何提出问题,就可以开始学习技术技能以开始回答问题。 我将从学习Python编程的基础开始。 Python是一种具有一致语法的编程语言,通常建议初学者使用。 幸运的是,它还具有多功能性,使您能够进行极其复杂的数据科学和与机器学习相关的工作,例如深度学习。

A lot of people worry about language choice, but the keys points to remembers are:


  • Data science is about being able to answer questions and drive business value, not about tools
  • Learning the concepts is more important than learning the syntax
  • Building projects and sharing them is what you’ll do in an actual data science role, and learning this way will give you a head start
  • 数据科学是关于能够回答问题并提高业务价值的,而不是关于工具的。
  • 学习概念比学习语法更重要
  • 在实际的数据科学领域中,您将要做的就是构建项目并共享它们,而学习这种方式将为您提供一个良好的开端。

As the above points illustrate, the key isn’t to learn all the data science tools. It’s to learn enough of the technical side to start building projects. Some good places to do this are:

如以上几点所示,关键不是学习所有数据科学工具。 要学习足够的技术知识以开始构建项目。 一些这样做的好地方是:

  • – Dataquest teaches you the fundamentals of Python and data science through analyzing interesting datasets, like data on NBA scoring or CIA covert actions.
  • – Codecademy teaches you the basics of Python, and how to build programs.
  • – Dataquest通过分析有趣的数据集(例如NBA得分数据或CIA秘密行动数据),教您Python和数据科学的基础知识。
  • – Codecademy教您Python的基础知识以及如何构建程序。

The key is to learn the basics, and start answering some of the questions you came up with in the past few weeks as you learn. This will help you solidify your learning, and start building a portfolio.

关键是要学习基础知识,并在学习过程中开始回答您在过去几周提出的一些问题。 这将帮助您巩固学习,并开始建立投资组合。

Enjoying this post? Learn data science with Dataquest!

  • Learn from the comfort of your browser.
  • Work with real-life data sets.
  • Build a portfolio of projects.
  • 从舒适的浏览器中学习。
  • 处理实际数据集。
  • 建立项目组合。

3.建设项目 (3. Build Projects)

As you’re learning the basics of coding, you should start building projects that answer interesting questions and showcase your data science skills. Projects don’t have to be extremely complex. For example, you could analyze to find patterns. The key is to find interesting datasets, ask questions about the data, then answer those questions with code. If you need help finding datasets, check out for a good list of places to find them.

在学习编码基础时,您应该开始构建回答有趣问题的项目,并展示您的数据科学技能。 项目不必太复杂。 例如,您可以分析“ 以找到模式。 关键是找到有趣的数据集,询问有关数据的问题,然后用代码回答这些问题。 如果您在查找数据集方面需要帮助,请查看 ,以找到查找它们的好地方。

As you’re building projects, remember that:


  • Most data science work is data cleaning.
  • The most common machine learning technique is linear regression.
  • Everyone starts somewhere. Even if you feel like what you’re doing isn’t impressive, it’s still worth working on.
  • 大多数数据科学工作是数据清理。
  • 最常见的机器学习技术是线性回归。
  • 每个人都从某个地方开始。 即使您觉得自己所做的并不令人印象深刻,仍然值得继续努力。

Not only does building projects help you understand real data science work and practice your skills, it also helps you build a portfolio to show to potential employers. Here are some more detailed guides on building projects on your own:

建立项目不仅可以帮助您了解实际的数据科学工作和实践技能,还可以帮助您建立向潜在雇主展示的投资组合。 以下是一些有关自行构建项目的更详细的指南:

Once you’ve built some smaller projects, it’s good to find one interest area that you can go deep in. For me, this was trying to predict the stock market. The nice thing about predicting the stock market is that you can start with very little knowledge of Python and try to make trades every month or week. As your skills grow, you can make the problem more complicated, by adding nuances like minute by minute prices and more accurate predictions.

一旦构建了一些较小的项目,最好找到一个可以深入的兴趣领域。对我而言,这是在预测股市。 预测股市的好处是,您可以从很少的Python知识开始,然后尝试每月或每周进行交易。 随着技能的提高,您可以通过添加每分钟价格和更准确的预测等细微差别使问题变得更加复杂。

Some other examples of projects that you can develop iteratively are:


  • Health tracking. You can start by manually entering and analyzing your data, and keep adding more correlations and predictive elements as time goes on.
  • Predicting NBA game winners. You can start by manually entering scores and making predictions with a heuristic, but you can keep acquiring more data and making more accurate predictions over time.
  • 健康跟踪。 您可以从手动输入和分析数据开始,并随着时间的推移不断添加更多的相关性和预测性元素。
  • 预测NBA比赛的获胜者。 您可以先手动输入分数并进行启发式预测,但是随着时间的推移,您可以不断获取更多数据并做出更准确的预测。

An example of a data science project — this map shows racial diversity in the US.


4.分享您的工作 (4. Share Your Work)

Once you’ve built a few projects, you should share them with others! It’s a good idea to upload them to , where others can view them. You can read a good post on uploading projects to Github , and more about assembling a portfolio . Uploading projects will:

一旦构建了几个项目,就应该与他人共享它们! 将它们上传到是一个好主意,其他人可以在其中查看它们。 你可以阅读上传项目Github上好的帖子 ,和更多的组装组合 。 上载项目将:

  • Force you to think about how to best present them, which is what you’d do in a data science role
  • Allow your peers to view your projects and comment
  • Allow employers to view your projects
  • 迫使您考虑如何最好地展示它们,这就是您在数据科学领域的工作
  • 允许您的同行查看您的项目并发表评论
  • 允许雇主查看您的项目

Along with uploading your work to Github, you should also think about publishing a blog. When I was learning data science, writing blog posts helped me:

除了将您的作品上传到Github之外,您还应该考虑发布博客。 当我学习数据科学时,写博客文章可以帮助我:

  • Get inbound interest from recruiters
  • Learn concepts more thoroughly (the process of teaching really helps you learn)
  • Connect with peers
  • 获得招聘人员的入驻兴趣
  • 更彻底地学习概念(教学过程确实可以帮助您学习)
  • 与同伴联系

You can read a good guide on how to publish a blog . Some good topics for blog posts are:

您可以在阅读有关如何发布博客的良好指南。 博客文章的一些不错的主题是:

  • Explaining data science and programming concepts
  • Discussing your projects and walking through your findings
  • Discussing the process of learning data science, and how you’re doing it
  • 解释数据科学和编程概念
  • 讨论您的项目并浏览您的发现
  • 讨论学习数据科学的过程以及您的工作方式

An infographic from [my blog](http://www.vikparuchuri.com/blog/how-do-simpsons-characters-feel-about-each-other/) that shows how much each Simpsons character likes the others.


5.向他人学习 (5. Learn From Others)

After you’ve started to build an online presence, it’s a good idea to start engaging with other data scientists. You can do this in-person, or on online communities. Some good online communities are:

在开始建立在线形象之后,开始与其他数据科学家互动是一个好主意。 您可以亲自或在线社区进行。 一些好的在线社区是:

I personally was very active on Quora and Kaggle when I was learning, which helped me immensely. Engaging in online communities is a good way to:

当我学习时,我个人对Quora和Kaggle非常活跃,这极大地帮助了我。 参与在线社区是实现以下目标的好方法:

  • Find other people to learn with
  • Enhance your profile, and find opportunities
  • Strengthen your knowledge by learning from others
  • 寻找其他可以学习的人
  • 增强您的个人资料,找到机会
  • 通过向他人学习来增强您的知识

You can also engage with people in-person through . In-person engagement can help you meet and learn from more experienced data scientists in your area.

您还可以通过“ 与他人进行面对面交流。 面对面的交流可以帮助您与您所在地区经验丰富的数据科学家见面并向他们学习。

6.突破界限 (6. Push Your Boundaries)

Companies want to hire data scientists who find those critical insights that save them money or make their customers happier. You have to apply the same process to learning – keep searching for new questions to answer, and keep answering harder and more complex questions. If you look back on your projects from a month or two ago, and aren’t embarrassed about something you did, you probably aren’t pushing your boundaries enough. You should be making strong progress every month, and it should be reflected in your work.

公司希望聘请数据科学家,他们可以找到那些关键的见解来节省金钱或使他们的客户更快乐。 您必须将相同的过程应用于学习–继续寻找新的问题来回答,并继续回答更难,更复杂的问题。 如果您回顾一个月或两个月前的项目,并且对所做的事情不感到尴尬,则可能是您的界限不够。 您应该每个月都取得长足的进步,这应该反映在您的工作中。

Some ways to push your boundaries are:


  • Try working with a larger dataset than you’re comfortable with
  • Start a project that requires knowledge you don’t have
  • Try making your project run faster
  • See if you can teach what you did in a project to someone else
  • 尝试使用比自己喜欢的更大的数据集
  • 开始一个需要您没有知识的项目
  • 尝试使您的项目运行更快
  • 看看您是否可以将项目中的工作教给其他人

你有这个 (You’ve Got This)

Learning data science isn’t easy, but the key is to stay motivated and enjoy what you’re doing. If you’re consistently building projects and sharing them, you’ll build your expertise, and get the data scientist job that you want.

学习数据科学并不容易,但关键是要保持动力并享受自己的工作。 如果您一直在构建项目并共享它们,那么您将积累专业知识,并获得所需的数据科学家工作。

I haven’t given you an exact roadmap to learning data science, but if you follow this process, you’ll get farther than you imagined you could. Anyone, including you and I, can become a data scientist if you’re motivated enough.

我还没有为您提供学习数据科学的确切路线图,但是如果您遵循此过程,您将获得比您想象的更远的距离。 如果您有足够的动力,包括您和我在内的任何人都可以成为数据科学家。



