Skills Required | Data Scientists



What Employers are looking for?

Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. Within those areas there are dozens of languages, frameworks, and technologies data scientists could learn. How should data scientists who want to be in-demand by employers spend their learning budget?

I scoured job listing websites to find which skills are most in-demand for data scientists. I looked at general data science skills and at specific languages and tools separately.

I searched job listings on LinkedIn, Indeed, SimplyHired, Monster, and AngelList on June 10, 2022. Here’s a chart showing how many data scientist jobs each website listed.


General Skills

Here’s the chart of the most frequent general data scientist skills sought by employers. 




The results show that analysis and machine learning are at the heart of data scientist jobs. Gleaning insights from data is a primary function of data science. Machine learning is all about creating systems to predict performance and it is very in-demand.

AI and deep learning don’t show up as frequently as some other terms. However, they are subsets of machine learning. Deep learning is being used for more and more of the machine learning tasks that other algorithms were used for previously. For example, the best machine learning algorithms for most natural language processing problems are now deep learning algorithms. I expect deep learning skills will be sought more explicitly in the future and that machine learning will become more synonymous with deep learning.

Technology Skills

Below are the top 20 specific languages, libraries, and tech tools employers are looking for data scientists to have experience with.


Let’s briefly look at the most common tech skills.



Python
Python is the most in-demand language. The popularity of this open-source language has been widely observed. It’s beginner friendly, with many support resources. The vast majority of new data science tools are compatible with it. Python is the primary language for data scientists.




R

R is not far behind Python. It once was the primary language for data science. I was surprised to see how in-demand it still is. The roots of this open source language are in statistics, and it’s still very popular with statisticians.


SQL

SQL is also in high demand. SQL stands for Structured Query Language and is the primary way to interact with relational databases. SQL is sometimes overlooked in the data science world, but it’s a skill worth demonstrating mastery of if you’re planning to hit the job market.

Coming up is our  Hadoop and Spark, both open source tools from Apache for big data.


Hadoop

Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.


Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.

I expect many fewer job candidates have these skills than Python, R, and SQL. If you have or can gain experience with Hadoop and Spark it should give you a leg up on the competition.





Tableau is next in-demand. This analytics platform and visualization tool is powerful, easy to use, and growing in popularity. It has a free public version, but will cost you money if you want to keep your data private.

If you aren’t familiar with Tableau, it’s definitely worth taking a quick class such as Tableau on Tutorialspoint. I don’t get a commission for the suggestion— I just took the class and found it to be a great value.

The chart below shows an even bigger list of the most in-demand languages, frameworks, and other data science software tools.


Historical Comparison

GlassDoor did an analysis of the 10 most common software skills for data scientists from January 2017 through July 2017 on their site. Here’s a comparison of how frequently the terms appeared on their site compared to the average on LinkedIn, Indeed, SimplyHired, and Monster in October 2018.




In my analysis I found Python, R, and SQL to be the most in-demand. We also found the same top nine technology skills, albeit in slightly different orders.

The results suggest that compared to the first half of 2017, R, Hadoop, Java, SAS, and MatLab are now less in-demand and Tableau is more in-demand. This is what I would expect given the complementary results from sources such as the KDnuggets developer survey. There, R, Hadoop, Java, and SAS all show clear multi-year downward usage trends and Tableau shows a clear upward trend.

Recommendations

Based on the results of these analyses, here are some general recommendations for current and aspiring data scientists concerned with making themselves widely marketable.

  • Demonstrate you can do data analysis and focus on becoming really skilled at machine learning.
  • Invest in your communication skills. I recommend reading the book Made to Stick to help your ideas have more impact.
  • Master a deep learning framework. Being proficient with a deep learning framework is a larger and larger part of being proficient with machine learning.
  • If you are choosing between learning Python and R, choose Python. If you have Python down cold, consider learning R. You’ll definitely be more marketable if you also know R.
  • When an employer is looking for a data scientist with Python skills, they are also likely to expect candidates to know the common python data science libraries: NumPy, pandas, scikit-learn, and Matplotlib.



  • DataCamp and DataQuest — they are both reasonably priced online SaaS data science eduction products where you learn as you code. They both teach a number of technology tools.
  • Data School has a variety of resources including a nice set of YouTube videos explaining data science concepts.
  • Python for Data Analysis by McKinney. This book by the primary author of the pandas library focusses on pandas and also discusses basic Python, NumPy, and scikit-learn functionality for data science.
  • Introduction to Machine Leaning with Python by MΓΌller & Guido. MΓΌller is a primary maintainer of scikit-learn. It’s an excellent book for learning machine learning with scikit-learn.
If you are looking to jump into deep learning, I suggest starting with Keras or FastAI before moving on to TensorFlow or PyTorch. Chollet’s Deep Learning with Python is a great resource for learning Keras.

Beyond these recommendations, I suggest you learn what interests you, although there are obviously many considerations when deciding how to allocate your learning time.



If you’re looking for a data scientist job through online portals, I suggest you start with LinkedIn — it consistently has the most results.

If you are looking for a job or posting positions on job sites, keywords matter. “data science” returns nearly 3x the number of results that “data scientist” does on each site. But if you are looking strictly for a data scientist job, you’re probably better off searching for “data scientist”.

Regardless of where you’re looking I suggest you make an online Blog page that demonstrates your proficiency with as many in-demand skill areas as possible. I learnt to make a portfolio and a Blogsite here.

π•Ώπ–π–†π–“π–π–˜ & π•½π–Šπ–Œπ–†π–—π–‰π–˜...

Share on Google Plus

About Freaky Analyst

A Passionate Data analyst working with large amounts of data and to turn this data into information, information into insight and insight into valuable decisions. I also have a keen interest in the field of data analysis, data visualization and am fascinated by the power to compress complex datasets into approachable and appealing graphics.
    Blogger Comment

0 comments:

Post a Comment