Join us at TNW Conference 2022 for insights into the future of tech →

The hottest new jobs in tech

powered by

Get these Python questions right to ace your data science job interview

Six must-know technical concepts and two types of questions to test them

Get these Python questions right to ace your data science job interview
.cult
Story by

.cult

.cult by Honeypot is a Berlin-based community platform for developers. We write about all things career-related, make original documentaries .cult by Honeypot is a Berlin-based community platform for developers. We write about all things career-related, make original documentaries and share heaps of other untold developer stories from around the world.

If you want to have a career in data science, knowing Python is a must. Python is the most popular programming language in data science, especially when it comes to machine learning and artificial intelligence.

To help you in your data science career, I’ve prepared the main Python concepts tested in the data science interview. Later on, I will discuss two main interview question types that cover those concepts you’re required to know as a data scientist. I’ll also show you several example questions and give you solutions to push you in the right direction.

Technical Concepts of Python Interview Questions

This guide is not company-specific. So if you have some data science interviews lined up, I strongly advise you to use this guide as a starting point of what might come up in the interview. Additionally, you should also try to find some company-specific questions and try to solve them too. Knowing general concepts and practicing them on real-life questions is a winning combination.

I’ll not bother you with theoretical questions. They can come up in the interview, but they too cover the technical concepts found in the coding questions. After all, if you know how to use the concepts I’ll be talking about, you probably know to explain them too.

Technical Python concepts tested in the data science job interviews are:

Data types

Built-in data structures

User-defined data structures

Built-in functions

Loops and conditionals

External libraries (Pandas)

1. Data Types

Data types are the concept you should be familiar with. This means you should know the most commonly used data types in Python, the difference between them, when and how to use them. Those are data-types such as integers (int), floats (float), complex (complex), strings (str), booleans (bool), null values (None).

2. Built-in Data Structures

These are list, dictionary, tuple, and sets. Knowing these four built-in data structures will help you organize and store data in a way that will allow easier access and modifications.

3. User-defined Data Structures

On top of using the built-in data structures, you should also be able to define and use some of the user-defined data structures. These are arrays, stack, queue, trees, linked lists, graphs, HashMaps.

4. Built-in Functions

Python has over 60 built-in functions. You don’t need to know them all while, of course, it’s better to know as many as possible. The built-in functions you can’t avoid are abs(), isinstance(), len(), list(), min(), max(), pow(), range(), round(), split(), sorted(), type().

5. Loops and Conditionals

Loops are used in repetitive tasks when they perform one piece of code over and over again. They do that until the conditionals (true/false tests) tell them to stop.

6. External Libraries (Pandas)

While there are several external libraries used, Pandas is probably the most popular. It is designed for practical data analysis in finance, social sciences, statistics, and engineering.

Python Interview Types of Questions

All those six technical concepts are mainly tested by only two types of interview questions. Those are:

Data manipulation and analysis

Algorithms

Let’s have a closer look at each of them.

1. Data Manipulation and Analysis

These questions are designed to test the above technical concept by solving the ETL (extracting, transforming, and loading data) problems and performing some data analysis.

Here’s one such example from Facebook:

QUESTION: Facebook sends SMS texts when users attempt to 2FA (2-factor authenticate) into the platform to log in. In order to successfully 2FA they must confirm they received the SMS text message. Confirmation texts are only valid on the date they were sent. Unfortunately, there was an ETL problem with the database where friend requests and invalid confirmation records were inserted into the logs, which are stored in the ‘fb_sms_sends’ table. These message types should not be in the table. Fortunately, the ‘fb_confirmers’ table contains valid confirmation records so you can use this table to identify SMS text messages that were confirmed by the user.

Calculate the percentage of confirmed SMS texts for August 4, 2020.

ANSWER:

import pandas as pd
import numpy as np
df = fb_sms_sends[["ds","type","phone_number"]]
df1 = df[df["type"].isin(['confirmation','friend_request']) == False]
df1_grouped = df1.groupby('ds')['phone_number'].count().reset_index(name='count')
df1_grouped_0804 = df1_grouped[df1_grouped['ds']=='08-04-2020']
df2 = fb_confirmers[["date","phone_number"]]
df3 = pd.merge(df1,df2, how ='left',left_on =["phone_number","ds"], right_on = ["phone_number","date"])
df3_grouped = df3.groupby('date')['phone_number'].count().reset_index(name='confirmed_count')
df3_grouped_0804 = df3_grouped[df3_grouped['date']=='08-04-2020']
result = (float(df3_grouped_0804['confirmed_count'])/df1_grouped_0804['count'])*100

One of the questions asked to test your data analysis skills is this one from Dropbox:

QUESTION: Write a query that calculates the difference between the highest salaries found in the marketing and engineering departments. Output just the difference in salaries.

ANSWER:

import pandas as pd
import numpy as np
df = pd.merge(db_employee, db_dept, how = 'left',left_on = ['department_id'], right_on=['id'])
df1=df[df["department"]=='engineering']
df_eng = df1.groupby('department')['salary'].max().reset_index(name='eng_salary')
df2=df[df["department"]=='marketing']
df_mkt = df2.groupby('department')['salary'].max().reset_index(name='mkt_salary')
result = pd.DataFrame(df_mkt['mkt_salary'] - df_eng['eng_salary'])
result.columns = ['salary_difference']
result

2. Algorithms

When it comes to Python algorithm interview questions, they test your problem-solving using the algorithms. Since algorithms are not limited to only one programming language, these questions test your logic and thinking, as well as coding in Python.

For example, you could get this question:

QUESTION: Given a string containing digits from 2-9 inclusive, return all possible letter combinations that the number could represent. Return the answer in any order.

A mapping of digit to letters (just like on the telephone buttons) is given below. Note that 1 does not map to any letters.

ANSWER:

class Solution:
def letterCombinations(self, digits: str) -> List[str]:
# If the input is empty, immediately return an empty answer array
if len(digits) == 0:
return []

# Map all the digits to their corresponding letters
letters = {"2": "abc", "3": "def", "4": "ghi", "5": "jkl",
"6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz"}

def backtrack(index, path):
# If the path is the same length as digits, we have a complete combination
if len(path) == len(digits):
combinations.append("".join(path))
return # Backtrack
# Get the letters that the current digit maps to, and loop through them
possible_letters = letters[digits[index]]
for letter in possible_letters:
# Add the letter to our current path
path.append(letter)
# Move on to the next digit
backtrack(index + 1, path)
# Backtrack by removing the letter before moving onto the next
path.pop()
# Initiate backtracking with an empty path and starting index of 0
combinations = []
backtrack(0, [])
return combinations

Or it could get even more difficult with the following question:

QUESTION: “Write a program to solve a Sudoku puzzle by filling the empty cells. A sudoku solution must satisfy all of the following rules:

Each of the digits 1-9 must occur exactly once in each row.

Each of the digits 1-9 must occur exactly once in each column.

Each of the digits 1-9 must occur exactly once in each of the 9 3×3 sub-boxes of the grid.

The ‘.’ character indicates empty cells.”

ANSWER:

from collections import defaultdict
class Solution:
def solveSudoku(self, board):
"""
:type board: List[List[str]]
:rtype: void Do not return anything, modify board in-place instead.
"""
def could_place(d, row, col):
"""
Check if one could place a number d in (row, col) cell
"""
return not (d in rows[row] or d in columns[col] or \
d in boxes[box_index(row, col)])
def place_number(d, row, col):
"""
Place a number d in (row, col) cell
"""
rows[row][d] += 1
columns[col][d] += 1
boxes[box_index(row, col)][d] += 1
board[row][col] = str(d)
def remove_number(d, row, col):
"""
Remove a number which didn't lead
to a solution
"""
del rows[row][d]
del columns[col][d]
del boxes[box_index(row, col)][d]
board[row][col] = '.'
def place_next_numbers(row, col):
"""
Call backtrack function in recursion
to continue to place numbers
till the moment we have a solution
"""
# if we're in the last cell
# that means we have the solution
if col == N - 1 and row == N - 1:
nonlocal sudoku_solved
sudoku_solved = True
#if not yet
else:
# if we're in the end of the row
# go to the next row
if col == N - 1:
backtrack(row + 1, 0)
# go to the next column
else:
backtrack(row, col + 1)
def backtrack(row = 0, col = 0):
"""
Backtracking
"""
# if the cell is empty
if board[row][col] == '.':
# iterate over all numbers from 1 to 9
for d in range(1, 10):
if could_place(d, row, col):
place_number(d, row, col)
place_next_numbers(row, col)
# if sudoku is solved, there is no need to backtrack
# since the single unique solution is promised
if not sudoku_solved:
remove_number(d, row, col)
else:
place_next_numbers(row, col)
# box size
n = 3
# row size
N = n * n
# lambda function to compute box index
box_index = lambda row, col: (row // n ) * n + col // n
# init rows, columns and boxes
rows = [defaultdict(int) for i in range(N)]
columns = [defaultdict(int) for i in range(N)]
boxes = [defaultdict(int) for i in range(N)]
for i in range(N):
for j in range(N):
if board[i][j] != '.':
d = int(board[i][j])
place_number(d, i, j)
sudoku_solved = False
backtrack()

 

This would be quite a complex algorithm and good for you if you knew how to solve it!

Conclusion

For a data science interview, the six technical concepts I’ve mentioned are a must. Of course, it’s recommended you dive even deeper into Python and broaden your knowledge. Not only theoretically but also practicing by solving as many as possible both data manipulation and analysis and algorithm questions.

For the first one, there are plenty of examples on StrataScratch. You could probably find the questions from the company where you applied for a job. And LeetCode is a good choice when you decide to practice writing Python algorithms before your interviews.

Also tagged with