本文共 14173 字,大约阅读时间需要 47 分钟。
Data science is one of the most buzzed about fields right now, and . And with good reason – data scientists are doing everything from to . Given all the interesting applications, it makes sense that data science is a very sought-after career.
数据科学是当今最热门的领域之一, 。 有充分的理由-数据科学家正在做所有事情,从到 。 考虑到所有有趣的应用程序,数据科学是一个非常受欢迎的职业,这是有道理的。
Data science is applied in many field, including in developing self-driving cars.
数据科学被应用于许多领域,包括开发自动驾驶汽车。
If you’re reading this post, I’m assuming that you’d like to learn how to become a data scientist. If you’ve already done some research, you’ve probably read dozens of guides that start with “learn linear algebra”, and end 5 years later with “learn Spark”. When I was learning, I tried to follow these guides, but I ended up bored, without any actual data science skills to show for my time. The guides were like a teacher at school handing me a bunch of books and telling me to read them all – a learning approach that’s never appealed to me.
如果您正在阅读这篇文章,我假设您想学习如何成为一名数据科学家。 如果您已经做过一些研究,则可能已经阅读了数十本以“学习线性代数”开始,并在5年后以“学习Spark”结束的指南。 在学习时,我尝试遵循这些指南,但最终无聊,没有任何实际的数据科学技能可供我展示。 指南就像学校的老师一样,递给我一堆书,并告诉我全部阅读,这是一种从未吸引过我的学习方法。
The unfortunate part about all the “become a data scientist in 5 easy years” guides is that they’re written by people who’re already expert data scientists. They look at themselves and say “what would someone need to learn to do what I do every day?” They forget what it’s like to struggle to learn something on your own, and what it’s like to need motivation to push you over the next hurdle.
关于所有“在5年内成为一名数据科学家”指南的不幸部分是,这些指南是由已经是专家数据科学家的人撰写的。 他们看着自己说:“某人每天需要学习做我要做的事情?” 他们忘记了自己学习一些东西的感觉,以及需要动力将您推向下一个障碍的感觉。
As I learned data science, I realized that I learn most effectively when I’m working on a problem I’m interested in. Instead of learning a checklist of skills, I decided to focus on building projects around real data. Not only did this learning method motivate me, it also closely mirrors the work you’ll do in a data scientist role.
在学习数据科学时,我意识到当我处理自己感兴趣的问题时,我会最有效地学习。我决定学习专注于真实数据的项目,而不是学习技能清单。 这种学习方法不仅激励了我,而且还密切反映了您在数据科学家角色中将要进行的工作。
In this post, I’ll share a few steps that will help you in your journey to becoming a data scientist. The journey won’t be easy, but it will be infinitely more motivating than following the conventional wisdom.
在本文中,我将分享一些步骤,这些步骤将帮助您成为数据科学家。 旅途并不容易,但是它将比遵循传统的智慧更有动力。
The appeal of data science is that you get to answer interesting questions using actual data and code. These questions can range from “can I predict whether any flight will be on time?” to “how much does the US spend per student on education?”. To be able to ask and answer these questions, you need to develop an analytical mindset.
数据科学的吸引力在于,您可以使用实际数据和代码来回答有趣的问题。 这些问题的范围可以是“我可以预测是否有航班准时到达吗?” 到“美国每名学生在教育上花费多少?”。 为了能够提出和回答这些问题,您需要发展一种分析思维方式。
The best way to develop this mindset is to start doing it with news articles. Find articles, like and . Think about:
发展这种思维方式的最佳方法是从新闻文章开始。 查找有关文章,以及 。 想一想:
Some articles, like and actually have the underlying data available for download. When you can do this:
一些文章,例如文章和文章,实际上都有可供下载的基础数据。 当您可以这样做时:
Here are some good places to find data-driven articles:
这里是一些寻找数据驱动文章的好地方:
After you’ve read articles for a few weeks, reflect on whether you enjoyed coming up with questions and answering them. Becoming a data scientist is a long road, and you need to be very passionate about the field to make it all the way. Data scientists constantly come up with questions and answer them using mathematical models and data analysis tools.
在阅读了几周的文章后,请反思您是否喜欢提出问题并回答它们。 成为数据科学家是一条漫长的路,您需要对这一领域充满热情,才能一路过关斩将。 数据科学家不断提出问题,并使用数学模型和数据分析工具回答这些问题。
If you don’t enjoy the process of reasoning about data and asking questions, you should think about trying to find the overlaps between data and things that you do enjoy. For example, maybe you don’t enjoy the process of coming up with questions in the abstract, but maybe you really enjoy analyzing health data or education data. I personally was very interested in stock market data, which motivated me to build a model to predict the market.
如果您不喜欢数据推理和提问的过程,则应考虑尝试找到数据与您喜欢的事物之间的重叠部分。 例如,也许您不喜欢抽象提出问题的过程,但是您可能真的喜欢分析健康数据或教育数据。 我个人对股票市场数据非常感兴趣,这促使我建立了预测市场的模型。
Before you move on to the next step, make sure that there’s something about the process of data science that you’re passionate about. I can’t emphasize this point enough. If your goal is to become a data scientist, but you don’t have a specific passion, you’re probably not going to put in the months of hard work that you’ll need to learn.
在继续下一步之前,请确保您对数据科学过程充满热情。 我不能足够强调这一点。 如果您的目标是成为一名数据科学家,但是您没有特定的热情,那么您可能就不会花很多时间来学习。
An infographic from FiveThirtyEight.
FiveThirtyEight中的信息图。
Once you’ve figured out how to come up with questions, you’re ready to start learning the technical skills to start answering them. I’d start by learning the basics of programming in Python. Python is a programming language that has consistent syntax, and is often recommended for beginners. Luckily, it also has the versatility to enable you to do extremely complex data science and machine learning related work, such as deep learning.
一旦弄清楚如何提出问题,就可以开始学习技术技能以开始回答问题。 我将从学习Python编程的基础开始。 Python是一种具有一致语法的编程语言,通常建议初学者使用。 幸运的是,它还具有多功能性,使您能够进行极其复杂的数据科学和与机器学习相关的工作,例如深度学习。
A lot of people worry about language choice, but the keys points to remembers are:
许多人担心语言选择,但要记住的关键点是:
As the above points illustrate, the key isn’t to learn all the data science tools. It’s to learn enough of the technical side to start building projects. Some good places to do this are:
如以上几点所示,关键不是学习所有数据科学工具。 要学习足够的技术知识以开始构建项目。 一些这样做的好地方是:
The key is to learn the basics, and start answering some of the questions you came up with in the past few weeks as you learn. This will help you solidify your learning, and start building a portfolio.
关键是要学习基础知识,并在学习过程中开始回答您在过去几周提出的一些问题。 这将帮助您巩固学习,并开始建立投资组合。
As you’re learning the basics of coding, you should start building projects that answer interesting questions and showcase your data science skills. Projects don’t have to be extremely complex. For example, you could analyze to find patterns. The key is to find interesting datasets, ask questions about the data, then answer those questions with code. If you need help finding datasets, check out for a good list of places to find them.
在学习编码基础时,您应该开始构建回答有趣问题的项目,并展示您的数据科学技能。 项目不必太复杂。 例如,您可以分析“ 以找到模式。 关键是找到有趣的数据集,询问有关数据的问题,然后用代码回答这些问题。 如果您在查找数据集方面需要帮助,请查看 ,以找到查找它们的好地方。
As you’re building projects, remember that:
在构建项目时,请记住以下几点:
Not only does building projects help you understand real data science work and practice your skills, it also helps you build a portfolio to show to potential employers. Here are some more detailed guides on building projects on your own:
建立项目不仅可以帮助您了解实际的数据科学工作和实践技能,还可以帮助您建立向潜在雇主展示的投资组合。 以下是一些有关自行构建项目的更详细的指南:
Once you’ve built some smaller projects, it’s good to find one interest area that you can go deep in. For me, this was trying to predict the stock market. The nice thing about predicting the stock market is that you can start with very little knowledge of Python and try to make trades every month or week. As your skills grow, you can make the problem more complicated, by adding nuances like minute by minute prices and more accurate predictions.
一旦构建了一些较小的项目,最好找到一个可以深入的兴趣领域。对我而言,这是在预测股市。 预测股市的好处是,您可以从很少的Python知识开始,然后尝试每月或每周进行交易。 随着技能的提高,您可以通过添加每分钟价格和更准确的预测等细微差别使问题变得更加复杂。
Some other examples of projects that you can develop iteratively are:
您可以迭代开发的项目的其他一些示例是:
An example of a data science project — this map shows racial diversity in the US.
数据科学项目的一个示例-此地图显示了美国的种族多样性。
Once you’ve built a few projects, you should share them with others! It’s a good idea to upload them to , where others can view them. You can read a good post on uploading projects to Github , and more about assembling a portfolio . Uploading projects will:
一旦构建了几个项目,就应该与他人共享它们! 将它们上传到是一个好主意,其他人可以在其中查看它们。 你可以阅读上传项目Github上好的帖子 ,和更多的组装组合 。 上载项目将:
Along with uploading your work to Github, you should also think about publishing a blog. When I was learning data science, writing blog posts helped me:
除了将您的作品上传到Github之外,您还应该考虑发布博客。 当我学习数据科学时,写博客文章可以帮助我:
You can read a good guide on how to publish a blog . Some good topics for blog posts are:
您可以在阅读有关如何发布博客的良好指南。 博客文章的一些不错的主题是:
An infographic from [my blog](http://www.vikparuchuri.com/blog/how-do-simpsons-characters-feel-about-each-other/) that shows how much each Simpsons character likes the others.
来自[我的博客](http://www.vikparuchuri.com/blog/how-do-simpsons-characters-feel-about-each-other/)的信息图表显示了每个Simpsons角色喜欢其他角色的程度。
After you’ve started to build an online presence, it’s a good idea to start engaging with other data scientists. You can do this in-person, or on online communities. Some good online communities are:
在开始建立在线形象之后,开始与其他数据科学家互动是一个好主意。 您可以亲自或在线社区进行。 一些好的在线社区是:
I personally was very active on Quora and Kaggle when I was learning, which helped me immensely. Engaging in online communities is a good way to:
当我学习时,我个人对Quora和Kaggle非常活跃,这极大地帮助了我。 参与在线社区是实现以下目标的好方法:
You can also engage with people in-person through . In-person engagement can help you meet and learn from more experienced data scientists in your area.
您还可以通过“ 与他人进行面对面交流。 面对面的交流可以帮助您与您所在地区经验丰富的数据科学家见面并向他们学习。
Companies want to hire data scientists who find those critical insights that save them money or make their customers happier. You have to apply the same process to learning – keep searching for new questions to answer, and keep answering harder and more complex questions. If you look back on your projects from a month or two ago, and aren’t embarrassed about something you did, you probably aren’t pushing your boundaries enough. You should be making strong progress every month, and it should be reflected in your work.
公司希望聘请数据科学家,他们可以找到那些关键的见解来节省金钱或使他们的客户更快乐。 您必须将相同的过程应用于学习–继续寻找新的问题来回答,并继续回答更难,更复杂的问题。 如果您回顾一个月或两个月前的项目,并且对所做的事情不感到尴尬,则可能是您的界限不够。 您应该每个月都取得长足的进步,这应该反映在您的工作中。
Some ways to push your boundaries are:
突破界限的一些方法是:
Learning data science isn’t easy, but the key is to stay motivated and enjoy what you’re doing. If you’re consistently building projects and sharing them, you’ll build your expertise, and get the data scientist job that you want.
学习数据科学并不容易,但关键是要保持动力并享受自己的工作。 如果您一直在构建项目并共享它们,那么您将积累专业知识,并获得所需的数据科学家工作。
I haven’t given you an exact roadmap to learning data science, but if you follow this process, you’ll get farther than you imagined you could. Anyone, including you and I, can become a data scientist if you’re motivated enough.
我还没有为您提供学习数据科学的确切路线图,但是如果您遵循此过程,您将获得比您想象的更远的距离。 如果您有足够的动力,包括您和我在内的任何人都可以成为数据科学家。
翻译自:
转载地址:http://giqwd.baihongyu.com/