The top skills of data scientists are Python, R, SQL, Tableau and Spark, while the top skills asked by data engineers are SQL, Python, AWS, and Spark. We scraped over 700 data engineering job postings and 500 data science postings to find out.
Data engineers collect data from various sources that exist in a company and move that data into an analytics database by building data pipelines. Data scientists feed that data into machine learning algorithms to predict future outcomes, like whether a lead will convert to a customer, or predicting sales of various products.
In recent years, building data pipelines to collect, organize, and validate data has been seen as a pre-requisite to making predictions with data science, and so data engineering has increased in demand by employers.
We scraped over 1000 job listings from Indeed.com in April 2022 to help you find the top skills, most requested tools, average salaries, and amount of experience required for each position. So let's get to the numbers.
We scraped seven hundred data engineering job descriptions to determine the top tools and skills used by data engineers.
Here are the top tools requested, scraped from 700+ data engineering job postings on Indeed.com in April 2022.
For data engineers, SQL and Python dominate, and then a bit further down are cloud computing providers like Amazon (AWS), and then big data databases like Apache Spark, Hadoop (which Spark has is based on), Snowflake, and Redshift. Also mentioned are backend programming languages like Java and Scala.
One item that is further down that we might expect is NoSQL. NoSQL is listed in 14% of job listings and NoSQL databases like MongoDB and Cassandra. For example, Cassandra and MongoDB were each listed in 5% and 4% of the positions, respectively.
We also found the top skills requested in those 700+ data engineering job postings. Here they are.
Among data engineering skills, it’s no surprise Engineering and Analytics top the list. Many data engineering tasks are used to improve data analytics – and data engineers often are tasked with writing the calculations that perform that task.
A little further down we see ETL. ETL stands for Extract Transform and Toad — and it’s the main task of data engineers. It involves pulling data from one location (extracting) changing that data (transforming) to clean or better organize that data, and loading that data into a data warehouse.
We can also see that Big data and Data warehouses received high rankings, which makes sense given the request for big data tools like Apache Spark and mentions of data warehouses like Snowflake, and Redshift.
We scraped more than five hundred data scientist job descriptions to help determine the top tools used by data scientists.
We can see that the top skills are Python and R, which are frequently used with statistical and machine learning libraries, followed by SQL. Then a bit further down is Tableau, a data visualization tool, followed by big data and cloud computing tools like Spark and AWS.
Also listed in the artificial intelligence framework we have Tensorflow, which has both a Python and R based API. Surprisingly, tools like Pandas and Scikit-learn, popular among machine learning engineers, do not appear until later on the list. For example, SAS (a statistical analysis tool) is listed almost twice as much as these libraries.
Perhaps one explanation for this is that the data science position is becoming distinct from the machine learning engineer position, which is more Python based, and where we would expect to Python libraries for building machine learning models like Scikit-learn. Data science positions may have been more influenced by academia, where tools like R and SAS are more popular.
For data scientist roles, we can see place the skillsets into three main buckets (and roughly in the following order).
So we can see that data science involves a combination of hard skills like software engineering and statistics, as well as soft skills under business intelligence. Skills like data visualization, which did not show up in data engineering are also emphasized.
In general, by looking at the top skills from each field we can see that data engineers are involved in setting up and using the data infrastructure — and thus involves performing ETL to store data in a data warehouse, and the Python and SQL to perform these steps. To query big data, tools like Snowflake and Spark are used.
Data scientists are more asked to query and use the data infrastructure already set up. This is reflected in skills like Python, SQL, and Spark — which all can be used for data exploration, and data processing. And of course, we see the presence libraries used for machine learning and data processing like Tensorflow and pandas.
While soft skills are needed in both professions, in data science teams are also tasked with talking with stakeholders and data visualization.
One of the nice things about data engineering, is that it does provide a career path for beginner developers. Of the seven hundred jobs that we scraped, we found experience requirements for 612 positions. The average minimum experience required was 2.65 years of experience. And 34% of data engineering positions were junior level positions, asking for a minimum of 0 or 1 year of experience.
Only 2% of data engineering positions mentioned a master’s degree.
For data science job listings we see the breakdown more skewed toward experienced professionals, with 20% of positions being open to those with 0 - 1 years of experience, almost half requesting 2 - 3 years of experience, and one third asking for over four years of experience.
For data science job listings, 4% mentioned a master’s degree and 14% of positions mentioned a Phd.
The average listed salary of data engineers is $121k.
Breaking this down by years of experience, we see that the average salary for junior engineers is $99k - $136k and the average salary for positions requesting 3 - 5 years of experience is $119k - $149k.
The data science positions we scraped had broader salary ranges. The average salary of data scientists is $125k.
Breaking it by years of experience, we see that the average salary is between $94k to $137k for those with 0 to 1 years of experience and between $107k to $138k for those with 2 to 3 years of experience, and for those with four or more years of experience it’s $115k to $147k.
So looking at the salaries by years of experience, the salary ranges are pretty comparable. The salary ranges are within $5k of each other for each level of experience.
In this blog post, we compared data engineers with data scientists. We saw that with data engineers, the skills required are principally engineering, analytics, and ETL, and that top tools used are SQL, Python, and AWS. Data scientists by contrast have analytics closer to the stop of the list, as well as statistics and machine learning. For tooling, Python, R, and SQL rank at the top, followed by a steep drop off to a data visualization tool of tableau.
Data scientists edge out data engineers for those with over four years experience, and the salary ranges are comparable for all other levels.