{"id":25654,"date":"2024-11-06T16:07:08","date_gmt":"2024-11-06T10:37:08","guid":{"rendered":"https:\/\/internshala.com\/blog\/?p=25654"},"modified":"2024-11-06T16:07:10","modified_gmt":"2024-11-06T10:37:10","slug":"data-science-coding-interview-questions","status":"publish","type":"post","link":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/","title":{"rendered":"Top 36 Data Science Coding Interview Questions and Answers"},"content":{"rendered":"\n<p>Data science has become one of the most revolutionizing fields that help companies make more informed and profitable business decisions. As a result, almost every tech and non-tech companies hire data scientists to examine and draw insights from large data sets. Since the job market is quite competitive, it is important to prepare well to secure a job in the company of your choice. This is why we curated a range of commonly asked data science coding job interview questions. The guide covers everything from basic principles to advanced concepts. With the help of the coding interview questions, you can highlight your technical expertise and make a lasting impression before your employers.<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title ez-toc-toggle\" style=\"cursor:pointer\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#Data_Science_Coding_Interview_Questions_for_Beginners\" >Data Science Coding Interview Questions for Beginners<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#Data_Science_Coding_Interview_Questions_for_Mid-level_Professionals\" >Data Science Coding Interview Questions for Mid-level Professionals<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#Data_Science_Coding_Interview_Questions_for_Experienced_Candidates\" >Data Science Coding Interview Questions for Experienced Candidates<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Science_Coding_Interview_Questions_for_Beginners\"><\/span>Data Science Coding Interview Questions for Beginners<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Data science coding job interview questions for beginners primarily focus on academic projects and fundamental concepts. Whichever company you apply to, research online resources to learn about the commonly asked <a href=\"https:\/\/trainings.internshala.com\/blog\/data-science-interview-questions\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science interview questions<\/a>. Here are some of the entry-level coding questions that can efficiently highlight your technical skills.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q1. How can you create a function in Python to reverse a string?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>You can reverse a string in <a href=\"https:\/\/trainings.internshala.com\/blog\/how-to-learn-python-language\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python <\/a>using the slicing feature. The slicing notation s[::-1] starts from the end of the string and moves to the beginning, effectively reversing it. This method is both concise and efficient. Here\u2019s an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def reverse_string(s):\r\n    return s&#91;::-1]\r<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large desktop-image\"><a href=\"https:\/\/internshala.com\/jobs\/?utm_source=is_blog&amp;utm_medium=data-science-coding-interview-questions&amp;utm_campaign=candidate-web-banner\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"203\" src=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1024x203.jpg\" alt=\"Find and Apply Banner\" class=\"wp-image-21795\" srcset=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1024x203.jpg 1024w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-672x133.jpg 672w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1536x305.jpg 1536w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-2048x406.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full mobile-image\"><a href=\"https:\/\/internshala.com\/jobs\/?utm_source=is_blog&amp;utm_medium=data-science-coding-interview-questions&amp;utm_campaign=candidate-mobile-banner\"><img loading=\"lazy\" decoding=\"async\" width=\"356\" height=\"256\" src=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Job-Banner-for-candidates.jpg\" alt=\"Job Banner for candidates\" class=\"wp-image-21794\"\/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Q2. What distinguishes a list from a tuple in Python?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>The primary distinction between a list and a tuple in Python is mutability. A list is mutable, meaning its contents can be changed after creation. For instance:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>my_list = &#91;1, 2, 3]\r\nmy_list.append(4)  # Now my_list is &#91;1, 2, 3, 4]\r<\/code><\/pre>\n\n\n\n<p>Conversely, a tuple is immutable. Once created, its contents cannot be altered. Tuples are defined using parentheses:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>my_tuple = (1, 2, 3)\r\n# Attempting to modify it like my_tuple.append(4) would raise an error.\r<\/code><\/pre>\n\n\n\n<p>Choosing between them depends on whether you need to modify the data. Tuples can also be slightly faster and are often used when data should remain constant.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q3. Can you write a function to determine if a number is prime?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>To check if a number is prime, you need to verify that it is only divisible by 1 and itself. Here\u2019s a simple function for this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def is_prime(n):\r\n    if n &lt;= 1:\r\n        return False\r\n    for i in range(2, int(n**0.5) + 1):\r\n        if n % i == 0:\r\n            return False\r\n    return True\r<\/code><\/pre>\n\n\n\n<p>This function first checks if the number is less than or equal to 1 (which is not prime). It then tests divisibility from 2 up to the square root of the number.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q4. What is the difference between == and is in Python?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>In Python, == checks for value equality, meaning it verifies if two variables hold the same value. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>a = &#91;1, 2, 3]\r\nb = &#91;1, 2, 3]\r\nprint(a == b)  # True, because their values are identical.\r<\/code><\/pre>\n\n\n\n<p>On the other hand, is checks for identity, meaning it determines if two variables point to the same object in memory:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print(a is b)  # False, because they are different objects.\r\nc = a\r\nprint(a is c)  # True, because c refers to the same object as a.\r<\/code><\/pre>\n\n\n\n<p>This distinction is crucial when working with mutable objects like lists.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q5. How would you implement a function to calculate the factorial of a number?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>You can calculate the factorial of a number using either iteration or recursion. Here\u2019s an example using iteration:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def factorial(n):\r\n    if n &lt; 0:\r\n        return \"Invalid input\"\r\n    result = 1\r\n    for i in range(1, n + 1):\r\n        result *= i\r\n    return result\r<\/code><\/pre>\n\n\n\n<p>This function initializes the result to 1 and multiplies it by each integer up to n. It\u2019s straightforward and avoids potential stack overflow issues with large numbers that recursion might encounter.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q6. What are generators in Python? Provide an example.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Generators are a special type of iterator in Python that allow for lazy iteration over sequences of values. They generate values on-the-fly and use less memory. You create a generator using a function along with the yield keyword. Here\u2019s an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def my_generator():\r\n    for i in range(1, 4):\r\n        yield i\r\n\r\ngen = my_generator()\r\nprint(next(gen))  # Output: 1\r\nprint(next(gen))  # Output: 2\r\nprint(next(gen))  # Output: 3\r<\/code><\/pre>\n\n\n\n<p>Using yield instead of return enables the function to produce a series of values over time while pausing and resuming as needed. This feature is particularly useful for processing large datasets or streams of data.&nbsp;<\/p>\n\n\n\n<p><strong>Pro Tip:<\/strong> As you read our blog further, you will come across several data science coding job interview questions. The purpose of our blog is to help you ace the interview. But before you explore more data science-related coding topics and questions, read our guide on <a href=\"https:\/\/internshala.com\/blog\/how-to-get-a-data-science-job\/\" target=\"_blank\" rel=\"noreferrer noopener\">how to get a data science job<\/a> and explore the best opportunities for your career.\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q7. Can you explain the differences between the map and filter functions in Python?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Both map and filter are built-in functions used for functional programming in Python but serve different purposes. The map function applies a specified function to every item in an iterable and returns a new iterable with the results. For instance:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def square(x):\r\n    return x * x\r\n\r\nnumbers = &#91;1, 2, 3, 4]\r\nsquared = map(square, numbers)\r\nprint(list(squared))  # Output: &#91;1, 4, 9, 16]\r<\/code><\/pre>\n\n\n\n<p>Conversely, the filter function applies a specified function to all items in an iterable and returns only those items for which the function returns True.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def is_even(x):\r\n    return x % 2 == 0\r\n\r\nnumbers = &#91;1, 2, 3, 4]\r\nevens = filter(is_even, numbers)\r\nprint(list(evens))  # Output: &#91;2, 4]\r<\/code><\/pre>\n\n\n\n<p>Thus, the map transforms each item while the filter selects items based on certain conditions. Both are powerful tools for efficient data processing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q8. How would you implement binary search in Python?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Binary search is an efficient algorithm used to find an item from a sorted list by repeatedly dividing the search interval in half. If the search key&#8217;s value is less than that of the middle item in the interval, narrow it down to the lower half, otherwise, narrow it down to the upper half. Here\u2019s how you can implement it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def binary_search(arr, target):\r\n    left, right = 0, len(arr) - 1\r\n    while left &lt;= right:\r\n        mid = (left + right) \/\/ 2\r\n        if arr&#91;mid] == target:\r\n            return mid\r\n        elif arr&#91;mid] &lt; target:\r\n            left = mid + 1\r\n        else:\r\n            right = mid - 1\r\n    return -1  # Target not found\r<\/code><\/pre>\n\n\n\n<p>In this function, we initialize two pointers (left and right) at the start and end of the list respectively and repeatedly check the middle element while adjusting pointers based on comparisons with the target value.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q9. Can you explain how a hash table operates? Provide an example.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>A hash table is a data structure that stores key-value pairs using a hash function to compute an index into an array of buckets or slots from which desired values can be retrieved efficiently. The main advantage of hash tables lies in their average-case constant-time complexity (O(1)) for lookups, insertions, and deletions.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s a simple example using Python&#8217;s dictionary (which functions as a hash table):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Creating a hash table (dictionary)\r\nhash_table = {}\r\n\r\n# Adding key-value pairs\r\nhash_table&#91;\"name\"] = \"Alice\"\r\nhash_table&#91;\"age\"] = 25\r\nhash_table&#91;\"city\"] = \"New York\"\r\n\r\n# Retrieving values\r\nprint(hash_table&#91;\"name\"])   # Output: Alice\r\nprint(hash_table&#91;\"age\"])    # Output: 25\r\nprint(hash_table&#91;\"city\"])   # Output: New York\r<\/code><\/pre>\n\n\n\n<p>In this example, Python&#8217;s dictionary implementation implicitly handles hashing. Keys are hashed to produce unique indices where corresponding values are stored.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q10. How would you implement bubble sort in Python?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Bubble sort is a straightforward sorting algorithm that repeatedly steps through the list comparing adjacent elements and swapping them if they\u2019re out of order. This process continues until no swaps are needed (the list is sorted).&nbsp;<\/p>\n\n\n\n<p>Here\u2019s how you can implement it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def bubble_sort(arr):\r\n    n = len(arr)\r\n    for i in range(n):\r\n        for j in range(0, n - i - 1):\r\n            if arr&#91;j] > arr&#91;j + 1]:\r\n                arr&#91;j], arr&#91;j + 1] = arr&#91;j + 1], arr&#91;j]\r\n\r\n# Example usage\r\narr = &#91;64, 34, 25, 12, 22, 11, 90]\r\nbubble_sort(arr)\r\nprint(\"Sorted array:\", arr)\r<\/code><\/pre>\n\n\n\n<p>In this implementation, we use two nested loops. The inner loop performs comparisons and swaps while the outer loop ensures that this process repeats until the entire list is sorted.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q11. Explain and demonstrate the difference between list and dictionary comprehension with an example of converting a list of temperatures from Celsius to Fahrenheit.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>List comprehension allows you to create a new list by applying an expression to each item in an existing iterable (like a list). It\u2019s a concise way to generate lists. On the other hand, dictionary comprehension is similar to list comprehension but creates a dictionary instead. It allows you to create key-value pairs from an existing iterable.<\/p>\n\n\n\n<p>Given a list of Celsius temperatures, we can use both types of comprehension to create a new list of temperatures in Fahrenheit and a dictionary that maps each Celsius temperature to its corresponding Fahrenheit value.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s an example of converting Celsius to Fahrenheit:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># List of Celsius temperatures\r\ncelsius = &#91;0, 10, 20, 30, 40]\r\n\r\n# List comprehension\r\nfahrenheit_list = &#91;((9\/5) * temp + 32) for temp in celsius]\r\n\r\n# Dictionary comprehension (celsius as key, fahrenheit as value)\r\nfahrenheit_dict = {temp: ((9\/5) * temp + 32) for temp in celsius}\r\n\r\nprint(fahrenheit_list)  # &#91;32.0, 50.0, 68.0, 86.0, 104.0]\r\nprint(fahrenheit_dict)  # {0: 32.0, 10: 50.0, 20: 68.0, 30: 86.0, 40: 104.0}\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q12. Create a generator function that yields prime numbers up to a given limit. Explain why generators are memory efficient.<\/h3>\n\n\n\n<p><strong>Sample Answer:<\/strong> Generators in Python are a convenient way to create iterators using the \u2018yield\u2019 keyword. They generate values one at a time and only when requested, which makes them memory efficient compared to lists or other data structures that store all values at once.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s a code to create a generator function that yields prime numbers up to a given limit:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def prime_generator(limit):\r\n    def is_prime(n):\r\n        \"\"\"Check if a number is prime.\"\"\"\r\n        if n &lt; 2:\r\n            return False\r\n        for i in range(2, int(n ** 0.5) + 1):\r\n            if n % i == 0:\r\n                return False\r\n        return True\r\n    \r\n    n = 2  # Start from the first prime number\r\n    while n &lt; limit:\r\n        if is_prime(n):\r\n            yield n  # Yield the prime number\r\n        n += 1\r\n\r\n# Example usage\r\nprimes = prime_generator(20)\r\nprint(list(primes))  # Output: &#91;2, 3, 5, 7, 11, 13, 17, 19]\r<\/code><\/pre>\n\n\n\n<p><strong>Pro Tip: <\/strong>Are you a recent computer science, mathematics, or statistics graduate and want to explore the data science field? You can start your professional career by applying for <a href=\"https:\/\/internshala.com\/internships\/data-science-internship\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science internships<\/a>. Read our guide to <a href=\"https:\/\/internshala.com\/blog\/data-science-internship-interview-questions\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science internship interview questions<\/a> to prepare well for the interview process and get an idea of what questions are asked.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Science_Coding_Interview_Questions_for_Mid-level_Professionals\"><\/span>Data Science Coding Interview Questions for Mid-level Professionals<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>As you grow in your career, you will naturally aim to acquire better knowledge and skills in your professional field. This section covers data science coding job interview questions for mid-level professionals to assess your understanding of foundational principles and problem-solving skills.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q13. Given a DataFrame with daily stock prices, calculate the 7-day moving average and the daily percentage change.<\/h3>\n\n\n\n<p>\u00a0<strong>Sample Answer: <\/strong>This involves using DataFrame methods like rolling() for moving averages and pct_change() for percentage changes. We&#8217;ll also handle missing values appropriately.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\r\nimport numpy as np\r\n\r\ndef analyze_stock_prices(df):\r\n    \"\"\"\r\n    df should have columns: 'date' and 'price'\r\n    \"\"\"\r\n    # Create copy to avoid modifying original\r\n    analysis_df = df.copy()\r\n    \r\n    # Calculate 7-day moving average\r\n    analysis_df&#91;'7day_ma'] = df&#91;'price'].rolling(window=7, min_periods=1).mean()\r\n    \r\n    # Calculate daily percentage change\r\n    analysis_df&#91;'daily_return'] = df&#91;'price'].pct_change() * 100\r\n    \r\n    return analysis_df\r\n\r\n# Example usage\r\ndates = pd.date_range(start='2023-01-01', periods=10)\r\nprices = &#91;100, 102, 101, 103, 104, 103, 105, 106, 107, 108]\r\ndf = pd.DataFrame({'date': dates, 'price': prices})\r\nresult = analyze_stock_prices(df)\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q14. Write a function that handles missing values in a DataFrame based on the data type of each column.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Different data types require different strategies for handling missing values. Numerical data might use mean\/median, while categorical data might use mode or a special category.<\/p>\n\n\n\n<p>The following implementation fills missing values in numerical columns with the median, in categorical columns with the mode, and uses forward fill for datetime columns.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def handle_missing_values(df):\r\n    \"\"\"\r\n    Handle missing values based on column type:\r\n    - Numeric: Fill with median\r\n    - Categorical: Fill with mode\r\n    - Datetime: Forward fill\r\n    \"\"\"\r\n    df_cleaned = df.copy()\r\n    \r\n    for column in df_cleaned.columns:\r\n        # Get column type\r\n        dtype = df_cleaned&#91;column].dtype\r\n        \r\n        # Handle numeric columns\r\n        if np.issubdtype(dtype, np.number):\r\n            median_value = df_cleaned&#91;column].median()\r\n            df_cleaned&#91;column].fillna(median_value, inplace=True)\r\n            \r\n        # Handle categorical columns\r\n        elif dtype == 'object' or dtype.name == 'category':\r\n            mode_value = df_cleaned&#91;column].mode()&#91;0]\r\n            df_cleaned&#91;column].fillna(mode_value, inplace=True)\r\n            \r\n        # Handle datetime\r\n        elif np.issubdtype(dtype, np.datetime64):\r\n            df_cleaned&#91;column].fillna(method='ffill', inplace=True)\r\n            \r\n    return df_cleaned\r\n\r\n# Example usage\r\ndata = {\r\n    'numeric': &#91;1, 2, np.nan, 4],\r\n    'categorical': &#91;'A', 'B', np.nan, 'B'],\r\n    'datetime': pd.date_range('2023-01-01', periods=4)\r\n}\r\ndf = pd.DataFrame(data)\r\ndf.loc&#91;2, 'datetime'] = pd.NaT\r\ncleaned_df = handle_missing_values(df)\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q15. Write a function to calculate the confidence interval for a mean, handling both normal and t-distributions based on sample size.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Confidence intervals provide a range of values that likely contain the population mean. We use the normal distribution for large samples (n\u226530). For smaller samples, we use the t-distribution. The function should handle both cases automatically.<\/p>\n\n\n\n<p>The following code uses the normal distribution for larger samples and the t-distribution for smaller samples.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from scipy import stats\r\nimport numpy as np\r\n\r\ndef calculate_confidence_interval(data, confidence=0.95):\r\n    \"\"\"\r\n    Calculate confidence interval for the mean.\r\n    Automatically uses t-distribution for n&lt;30, normal distribution for n\u226530\r\n    \"\"\"\r\n    n = len(data)  # Sample size\r\n    mean = np.mean(data)  # Sample mean\r\n    std_error = stats.sem(data)  # Standard error of the mean\r\n    \r\n    # Choose distribution based on sample size\r\n    if n &lt; 30:\r\n        # Use t-distribution\r\n        t_value = stats.t.ppf((1 + confidence) \/ 2, df=n-1)  # Critical t-value\r\n        margin_error = t_value * std_error  # Margin of error\r\n    else:\r\n        # Use normal distribution\r\n        z_value = stats.norm.ppf((1 + confidence) \/ 2)  # Critical z-value\r\n        margin_error = z_value * std_error  # Margin of error\r\n    \r\n    # Return the confidence interval\r\n    return (mean - margin_error, mean + margin_error)\r\n\r\n# Example usage\r\ndata = np.random.normal(loc=100, scale=15, size=25)  # Sample data\r\nci_lower, ci_upper = calculate_confidence_interval(data)  # Calculate CI\r\nprint(f\"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})\")\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q16. Implement a function that performs k-fold cross-validation using scikit-learn&#8217;s Pipeline, including preprocessing steps.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>K-fold cross-validation is essential for evaluating the performance of a machine learning model while ensuring that the model is robust and not overfitting. By using a pipeline, we can integrate preprocessing steps directly into the cross-validation process, which helps prevent data leakage.<\/p>\n\n\n\n<p>Here&#8217;s how to set it up:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.pipeline import Pipeline\r\nfrom sklearn.preprocessing import StandardScaler\r\nfrom sklearn.model_selection import cross_val_score\r\nfrom sklearn.linear_model import LogisticRegression\r\n\r\ndef create_model_pipeline(X, y, k_folds=5):\r\n    \"\"\"\r\n    Create and evaluate a pipeline with preprocessing and model training\r\n    \"\"\"\r\n    # Create a pipeline\r\n    pipeline = Pipeline(&#91;\r\n        ('scaler', StandardScaler()),  # Step to standardize features\r\n        ('classifier', LogisticRegression())  # Step for the classifier\r\n    ])\r\n    \r\n    # Perform k-fold cross-validation\r\n    scores = cross_val_score(pipeline, X, y, cv=k_folds, scoring='accuracy')\r\n    \r\n    # Calculate statistics\r\n    mean_score = scores.mean()\r\n    std_score = scores.std()\r\n    \r\n    return {\r\n        'pipeline': pipeline,\r\n        'cv_scores': scores,\r\n        'mean_score': mean_score,\r\n        'std_score': std_score\r\n    }\r\n\r\n# Example usage\r\nfrom sklearn.datasets import make_classification\r\n\r\n# Create a synthetic dataset for demonstration\r\nX, y = make_classification(n_samples=1000, random_state=42)\r\n\r\n# Create the model pipeline and evaluate it\r\nresults = create_model_pipeline(X, y)\r\n\r\n# Display the results\r\nprint(f\"Cross-validation accuracy: {results&#91;'mean_score']:.3f} \u00b1 {results&#91;'std_score']:.3f}\")\r<\/code><\/pre>\n\n\n\n<p><strong>Pro Tip:<\/strong> While preparing for data science coding job interview questions is important, you should explore companies that offer lucrative jobs in this field. Read our guide on the <a href=\"https:\/\/internshala.com\/blog\/highest-paying-companies-for-data-scientist\/\" target=\"_blank\" rel=\"noreferrer noopener\">highest-paying data science companies<\/a> and research their interview process to prepare well for the selection process.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q17. Write a SQL query to find the top 3 departments with the highest average salary, but only include departments with at least 5 employees.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>To find the top 3 departments with the highest average salary, while including only those departments that have at least 5 employees, we will follow these steps:<\/p>\n\n\n\n<ul>\n<li>Group the data by department to calculate the average salary and count of employees.<\/li>\n\n\n\n<li>Filter out departments with fewer than 5 employees using the HAVING clause.<\/li>\n\n\n\n<li>Rank the departments based on their average salary.<\/li>\n\n\n\n<li>Retrieve the top 3 departments with the highest average salary.<\/li>\n<\/ul>\n\n\n\n<p>Here\u2019s how the SQL query looks:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>WITH dept_stats AS (\r\n    SELECT \r\n        d.department_id,\r\n        d.department_name,\r\n        COUNT(e.employee_id) AS emp_count,\r\n        AVG(e.salary) AS avg_salary\r\n    FROM employees e\r\n    JOIN departments d ON e.department_id = d.department_id\r\n    GROUP BY d.department_id, d.department_name\r\n    HAVING COUNT(e.employee_id) >= 5\r\n),\r\nranked_depts AS (\r\n    SELECT \r\n        department_name,\r\n        emp_count,\r\n        avg_salary,\r\n        RANK() OVER (ORDER BY avg_salary DESC) AS salary_rank\r\n    FROM dept_stats\r\n)\r\nSELECT \r\n    department_name,\r\n    emp_count,\r\n    ROUND(avg_salary, 2) AS average_salary\r\nFROM ranked_depts\r\nWHERE salary_rank &lt;= 3\r\nORDER BY avg_salary DESC;\r<\/code><\/pre>\n\n\n\n<p>This query ranks the departments with at least 5 employees and outputs the top 3 based on average salary.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q18. Create a custom iterator class that generates Fibonacci numbers up to a specified limit.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>In Python, custom iterators need to implement __iter__ and __next__ methods. The iterator should maintain its state and know when to stop. Here\u2019s the code for creating a custom iterator class that generates Fibonacci numbers up to a specified limit:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>class FibonacciIterator:\r\n    def __init__(self, limit):\r\n        self.limit = limit\r\n        self.previous = 0\r\n        self.current = 1\r\n        \r\n    def __iter__(self):\r\n        return self\r\n        \r\n    def __next__(self):\r\n        if self.previous > self.limit:\r\n            raise StopIteration\r\n            \r\n        result = self.previous\r\n        self.previous, self.current = (\r\n            self.current,\r\n            self.previous + self.current\r\n        )\r\n        return result\r\n\r\n# Example usage\r\nfib = FibonacciIterator(100)\r\nprint(list(fib))  # &#91;0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q19. Write a decorator that measures and logs the execution time of any function it decorates.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>In Python, decorators allow us to modify or enhance the behavior of functions or methods. The decorator measures and logs the execution time of any function it decorates, which is useful for performance monitoring. The following example uses Python&#8217;s time module.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import time\r\nfrom functools import wraps\r\n\r\ndef measure_time(func):\r\n    \"\"\"\r\n    Decorator to measure the execution time of a function.\r\n\r\n    Parameters:\r\n    - func: The function to be wrapped.\r\n\r\n    Returns:\r\n    - wrapper: The wrapped function that logs execution time.\r\n    \"\"\"\r\n    @wraps(func)\r\n    def wrapper(*args, **kwargs):\r\n        start_time = time.time()  # Record the start time\r\n        result = func(*args, **kwargs)  # Call the original function\r\n        end_time = time.time()  # Record the end time\r\n        execution_time = end_time - start_time  # Calculate execution time\r\n        print(f\"{func.__name__} took {execution_time:.4f} seconds\")  # Log execution time\r\n        return result  # Return the result of the original function\r\n    return wrapper\r\n\r\n# Example usage\r\n@measure_time\r\ndef slow_function():\r\n    \"\"\"A sample function that simulates a delay.\"\"\"\r\n    time.sleep(1)  # Simulate a slow operation\r\n    return \"Done\"\r\n\r\n# Call the decorated function\r\nresult = slow_function()\r\nprint(result)  # Output the result\r<\/code><\/pre>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\"><div class=\"wp-block-group__inner-container\">\n<h3 class=\"wp-block-heading\">Q20. Write a function that resamples daily data to monthly averages and handles missing values appropriately.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>To analyze time series data effectively, resampling daily data to monthly averages can help identify trends and patterns. The code will utilize the Pandas library&#8217;s \u2018datetime\u2019 functionality and the resample() method, ensuring that missing values are handled appropriately during the calculations.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\r\nimport numpy as np\r\n\r\ndef resample_to_monthly(df, date_column, value_column):\r\n    \"\"\"\r\n    Resample daily data to monthly averages, handling missing values appropriately.\r\n\r\n    Parameters:\r\n    - df: DataFrame containing the daily data.\r\n    - date_column: The name of the column that contains date values.\r\n    - value_column: The name of the column that contains the values to be averaged.\r\n\r\n    Returns:\r\n    - DataFrame containing monthly averages, counts, minimum, and maximum values.\r\n    \"\"\"\r\n    # Ensure the date column is in datetime format\r\n    df&#91;date_column] = pd.to_datetime(df&#91;date_column])\r\n    \r\n    # Set the date column as the index for resampling\r\n    df_indexed = df.set_index(date_column)\r\n    \r\n    # Resample the data to monthly frequency and calculate statistics\r\n    monthly_stats = df_indexed&#91;value_column].resample('M').agg(\r\n        average='mean',    # Calculate average for the month\r\n        count='count',     # Count non-null values\r\n        min='min',         # Minimum value for the month\r\n        max='max'          # Maximum value for the month\r\n    )\r\n    \r\n    # Reset index to return date as a column\r\n    monthly_stats = monthly_stats.reset_index()\r\n    \r\n    return monthly_stats\r\n\r\n# Example usage\r\ndates = pd.date_range('2023-01-01', '2023-12-31', freq='D')\r\nvalues = np.random.normal(100, 10, len(dates))\r\n\r\n# Introduce some random missing values\r\nmask = np.random.choice(&#91;True, False], size=values.shape, p=&#91;0.1, 0.9])  # 10% missing values\r\nvalues&#91;mask] = np.nan\r\n\r\ndf = pd.DataFrame({'date': dates, 'value': values})\r\n\r\n# Resample the daily data to monthly averages\r\nmonthly_data = resample_to_monthly(df, 'date', 'value')\r\n\r\n# Display the resulting monthly statistics\r\nprint(monthly_data)\r<\/code><\/pre>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Q21. Create a function that generates interaction features between numerical columns and creates dummy variables for categorical columns.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Here\u2019s a function that creates interaction features between numerical columns and generates dummy variables for categorical columns:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def engineer_features(df, numerical_cols, categorical_cols):\r\n    \"\"\"\r\n    Generate interaction features and dummy variables\r\n    \"\"\"\r\n    result_df = df.copy()\r\n    \r\n    # Create interaction features\r\n    if len(numerical_cols) >= 2:\r\n        for i in range(len(numerical_cols)):\r\n            for j in range(i+1, len(numerical_cols)):\r\n                col1, col2 = numerical_cols&#91;i], numerical_cols&#91;j]\r\n                result_df&#91;f'{col1}_{col2}_interaction'] = (\r\n                    result_df&#91;col1] * result_df&#91;col2]\r\n                )\r\n    \r\n    # Create dummy variables\r\n    for col in categorical_cols:\r\n        dummies = pd.get_dummies(\r\n            result_df&#91;col], \r\n            prefix=col, \r\n            drop_first=True\r\n        )\r\n        result_df = pd.concat(&#91;result_df, dummies], axis=1)\r\n        result_df.drop(col, axis=1, inplace=True)\r\n    \r\n    return result_df\r\n\r\n# Example usage\r\ndata = {\r\n    'age': &#91;25, 30, 35],\r\n    'income': &#91;50000, 60000, 70000],\r\n    'education': &#91;'HS', 'BS', 'MS'],\r\n    'location': &#91;'urban', 'rural', 'urban']\r\n}\r\ndf = pd.DataFrame(data)\r\nengineered_df = engineer_features(\r\n    df, \r\n    numerical_cols=&#91;'age', 'income'],\r\n    categorical_cols=&#91;'education', 'location']\r\n)\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q22. Implement a function to analyze A\/B test results, including calculating p-values and confidence intervals for the conversion rates difference.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Analyzing A\/B test results is crucial for understanding the impact of changes on conversion rates. Here\u2019s a code that calculates conversion rates, performs a z-test for proportions, and computes confidence intervals for the difference in conversion rates.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\r\nimport scipy.stats as stats\r\n\r\ndef analyze_ab_test(control_conversions, control_size,\r\n                    treatment_conversions, treatment_size,\r\n                    confidence_level=0.95):\r\n    \"\"\"\r\n    Analyze A\/B test results, including conversion rates, p-values, and confidence intervals.\r\n    \r\n    Parameters:\r\n    - control_conversions: Number of conversions in the control group.\r\n    - control_size: Total size of the control group.\r\n    - treatment_conversions: Number of conversions in the treatment group.\r\n    - treatment_size: Total size of the treatment group.\r\n    - confidence_level: Confidence level for the confidence interval.\r\n    \r\n    Returns:\r\n    - Dictionary containing conversion rates, rate difference, p-value, and confidence interval.\r\n    \"\"\"\r\n    # Calculate conversion rates\r\n    control_rate = control_conversions \/ control_size\r\n    treatment_rate = treatment_conversions \/ treatment_size\r\n    \r\n    # Calculate standard errors\r\n    control_se = np.sqrt(control_rate * (1 - control_rate) \/ control_size)\r\n    treatment_se = np.sqrt(treatment_rate * (1 - treatment_rate) \/ treatment_size)\r\n    \r\n    # Calculate difference and combined standard error\r\n    rate_diff = treatment_rate - control_rate\r\n    combined_se = np.sqrt(control_se**2 + treatment_se**2)\r\n    \r\n    # Calculate z-score and p-value for the difference in conversion rates\r\n    z_score = rate_diff \/ combined_se\r\n    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))  # Two-tailed p-value\r\n    \r\n    # Calculate critical z-value for the confidence interval\r\n    z_critical = stats.norm.ppf((1 + confidence_level) \/ 2)\r\n    ci_lower = rate_diff - z_critical * combined_se\r\n    ci_upper = rate_diff + z_critical * combined_se\r\n    \r\n    return {\r\n        'control_rate': control_rate,\r\n        'treatment_rate': treatment_rate,\r\n        'rate_difference': rate_diff,\r\n        'p_value': p_value,\r\n        'confidence_interval': (ci_lower, ci_upper)\r\n    }\r\n\r\n# Example usage\r\nresults = analyze_ab_test(\r\n    control_conversions=100,\r\n    control_size=1000,\r\n    treatment_conversions=120,\r\n    treatment_size=1000\r\n)\r\n\r\n# Display results\r\nprint(\"Control Conversion Rate:\", results&#91;'control_rate'])\r\nprint(\"Treatment Conversion Rate:\", results&#91;'treatment_rate'])\r\nprint(\"Rate Difference:\", results&#91;'rate_difference'])\r\nprint(\"P-Value:\", results&#91;'p_value'])\r\nprint(\"Confidence Interval:\", results&#91;'confidence_interval'])\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q23. Implement a function that performs time series forecasting using SARIMA (Seasonal ARIMA) and evaluates the model&#8217;s performance.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>The Seasonal ARIMA (SARIMA) model is effective for time series data exhibiting seasonality. This function fits a SARIMA model to the data, generates forecasts, and evaluates model performance using error metrics and model quality indicators.<\/p>\n\n\n\n<p>Here\u2019s the code for the same:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from statsmodels.tsa.statespace.sarimax import SARIMAX\r\nfrom sklearn.metrics import mean_absolute_error, mean_squared_error\r\n\r\ndef sarima_forecast(data, order=(1,1,1), seasonal_order=(1,1,1,12)):\r\n    \"\"\"\r\n    Fit SARIMA model and generate forecasts\r\n    \"\"\"\r\n    # Fit model\r\n    model = SARIMAX(\r\n        data,\r\n        order=order,\r\n        seasonal_order=seasonal_order,\r\n        enforce_stationarity=False\r\n    )\r\n    results = model.fit()\r\n    \r\n    # Generate forecasts\r\n    forecast = results.get_forecast(steps=12)\r\n    forecast_mean = forecast.predicted_mean\r\n    forecast_ci = forecast.conf_int()\r\n    \r\n    # Calculate metrics\r\n    predictions = results.get_prediction(start=len(data)-12)\r\n    pred_mean = predictions.predicted_mean\r\n    \r\n    metrics = {\r\n        'mae': mean_absolute_error(data&#91;-12:], pred_mean&#91;-12:]),\r\n        'rmse': np.sqrt(mean_squared_error(data&#91;-12:], pred_mean&#91;-12:])),\r\n        'aic': results.aic\r\n    }\r\n    \r\n    return {\r\n        'model': results,\r\n        'forecast': forecast_mean,\r\n        'confidence_intervals': forecast_ci,\r\n        'metrics': metrics\r\n    }\r\n\r\n# Example usage\r\nimport numpy as np\r\nnp.random.seed(42)\r\ndates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='M')\r\ndata = pd.Series(np.random.normal(0, 1, len(dates)), index=dates)\r\nforecast_results = sarima_forecast(data)\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q24. Create a function that performs feature selection using mutual information.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Mutual information is an effective method for feature selection because it measures the statistical dependence between each feature and the target, capturing both linear and non-linear relationships. This function uses mutual information to rank and select the most important features and generates a bar plot of their importance scores.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s how you can create a function that performs feature selection using mutual information:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.feature_selection import mutual_info_regression\r\nimport matplotlib.pyplot as plt\r\n\r\ndef select_features_mutual_info(X, y, n_features=5):\r\n    \"\"\"\r\n    Select features using mutual information\r\n    \"\"\"\r\n    # Calculate mutual information scores\r\n    mi_scores = mutual_info_regression(X, y)\r\n    \r\n    # Create DataFrame with scores\r\n    feature_scores = pd.DataFrame({\r\n        'feature': X.columns,\r\n        'mi_score': mi_scores\r\n    })\r\n    \r\n    # Sort features by importance\r\n    feature_scores = feature_scores.sort_values('mi_score', ascending=False)\r\n    \r\n    # Plot feature importance\r\n    plt.figure(figsize=(10, 6))\r\n    plt.bar(range(len(mi_scores)), feature_scores&#91;'mi_score'])\r\n    plt.xticks(range(len(mi_scores)), feature_scores&#91;'feature'], rotation=45)\r\n    plt.title('Feature Importance (Mutual Information)')\r\n    plt.xlabel('Features')\r\n    plt.ylabel('Mutual Information Score')\r\n    plt.tight_layout()\r\n    \r\n    # Select top features\r\n    selected_features = feature_scores.head(n_features)&#91;'feature'].tolist()\r\n    \r\n    return {\r\n        'selected_features': selected_features,\r\n        'feature_scores': feature_scores,\r\n        'plot': plt.gcf()\r\n    }\r\n\r\n# Example usage\r\nfrom sklearn.datasets import make_regression\r\nX, y = make_regression(n_samples=100, n_features=10, random_state=42)\r\nX_df = pd.DataFrame(X, columns=&#91;f'feature_{i}' for i in range(X.shape&#91;1])])\r\nresults = select_features_mutual_info(X_df, y)\r<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Data_Science_Coding_Interview_Questions_for_Experienced_Candidates\"><\/span>Data Science Coding Interview Questions for Experienced Candidates<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>For more experienced roles, employers often look for candidates with an advanced understanding of technical principles and complex problem-solving skills. These are some of the advanced coding job interview questions for data scientists to asses their analytical and decision-making skills.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q25. Create a comprehensive text preprocessing pipeline for NLP tasks.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Text preprocessing prepares raw text for NLP tasks, improving data consistency and quality. The following pipeline covers essential steps, including cleaning, tokenization, stopword removal, lemmatization, and vectorization, making it ready for model training and analysis.\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from nltk.tokenize import word_tokenize\r\nfrom nltk.corpus import stopwords\r\nfrom nltk.stem import WordNetLemmatizer\r\nfrom sklearn.feature_extraction.text import TfidfVectorizer\r\nimport re\r\n\r\nclass TextPreprocessor:\r\n    def __init__(self, remove_stopwords=True, lemmatize=True):\r\n        self.remove_stopwords = remove_stopwords\r\n        self.lemmatize = lemmatize\r\n        self.lemmatizer = WordNetLemmatizer()\r\n        self.stop_words = set(stopwords.words('english'))\r\n        self.vectorizer = TfidfVectorizer()\r\n        \r\n    def clean_text(self, text):\r\n        \"\"\"Basic text cleaning\"\"\"\r\n        # Convert to lowercase\r\n        text = text.lower()\r\n        # Remove special characters\r\n        text = re.sub(r'&#91;^a-zA-Z\\s]', '', text)\r\n        # Remove extra whitespace\r\n        text = ' '.join(text.split())\r\n        return text\r\n        \r\n    def process_text(self, text):\r\n        \"\"\"Full text processing pipeline\"\"\"\r\n        # Clean text\r\n        text = self.clean_text(text)\r\n        \r\n        # Tokenize\r\n        tokens = word_tokenize(text)\r\n        \r\n        # Remove stopwords\r\n        if self.remove_stopwords:\r\n            tokens = &#91;t for t in tokens if t not in self.stop_words]\r\n            \r\n        # Lemmatize\r\n        if self.lemmatize:\r\n            tokens = &#91;self.lemmatizer.lemmatize(t) for t in tokens]\r\n            \r\n        return ' '.join(tokens)\r\n        \r\n    def fit_transform(self, texts):\r\n        \"\"\"Process texts and convert to TF-IDF vectors\"\"\"\r\n        processed_texts = &#91;self.process_text(text) for text in texts]\r\n        return self.vectorizer.fit_transform(processed_texts)\r\n        \r\n    def transform(self, texts):\r\n        \"\"\"Transform new texts using fitted vectorizer\"\"\"\r\n        processed_texts = &#91;self.process_text(text) for text in texts]\r\n        return self.vectorizer.transform(processed_texts)\r\n\r\n# Example usage\r\ntexts = &#91;\r\n    \"This is a sample text with some numbers 123!\",\r\n    \"Another example of text preprocessing in NLP tasks.\"\r\n]\r\npreprocessor = TextPreprocessor()\r\nvectors = preprocessor.fit_transform(texts)\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q26. Implement an anomaly detection system using Isolation Forest and evaluate its performance.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Isolation Forest is a robust algorithm for anomaly detection. It isolates observations by randomly selecting features and then splitting them, making it efficient and effective for detecting anomalies in large datasets.<\/p>\n\n\n\n<p>Here\u2019s how to implement an anomaly detection system with Isolation Forest and evaluate its performance:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.ensemble import IsolationForest\r\n\r\nfrom sklearn.metrics import precision_score, recall_score\r\nimport numpy as np\r\n\r\nclass AnomalyDetector:\r\n    def __init__(self, contamination=0.1, random_state=42):\r\n        \"\"\"\r\n        Initialize Isolation Forest model.\r\n        - contamination: expected proportion of anomalies in the data.\r\n        - random_state: seed for reproducibility.\r\n        \"\"\"\r\n        self.model = IsolationForest(\r\n            contamination=contamination,\r\n            random_state=random_state\r\n        )\r\n        \r\n    def fit_predict(self, X):\r\n        \"\"\"\r\n        Fit the model to the data and predict anomalies.\r\n        Returns predictions (1 for normal, -1 for anomaly), anomaly scores,\r\n        and indices of detected anomalies.\r\n        \"\"\"\r\n        predictions = self.model.fit_predict(X)\r\n        scores = self.model.score_samples(X)\r\n        \r\n        return {\r\n            'predictions': predictions,\r\n            'scores': scores,\r\n            'anomaly_indices': np.where(predictions == -1)&#91;0]\r\n        }\r\n        \r\n    def evaluate(self, y_true, y_pred):\r\n        \"\"\"\r\n        Evaluate the model\u2019s performance using precision and recall.\r\n        Convert Isolation Forest predictions to binary labels (0: normal, 1: anomaly).\r\n        \"\"\"\r\n        y_pred_binary = np.where(y_pred == -1, 1, 0)\r\n        \r\n        return {\r\n            'precision': precision_score(y_true, y_pred_binary),\r\n            'recall': recall_score(y_true, y_pred_binary)\r\n        }\r\n\r\n# Example usage\r\n# Generate synthetic data with anomalies\r\nnp.random.seed(42)\r\nnormal_points = np.random.normal(0, 1, (100, 2))\r\nanomaly_points = np.random.normal(5, 1, (10, 2))\r\nX = np.vstack(&#91;normal_points, anomaly_points])\r\n\r\n# Create true labels (0: normal, 1: anomaly)\r\ny_true = np.zeros(110)\r\ny_true&#91;100:] = 1\r\n\r\ndetector = AnomalyDetector()\r\nresults = detector.fit_predict(X)\r\nmetrics = detector.evaluate(y_true, results&#91;'predictions'])\r\n\r\nprint(\"Anomaly Detection Metrics:\")\r\nprint(\"Precision:\", metrics&#91;'precision'])\r\nprint(\"Recall:\", metrics&#91;'recall'])\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q27. Can you write a recursive function to find the nth Fibonacci number?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>The Fibonacci sequence consists of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1. Here is how to implement a recursive function to find the nth Fibonacci number:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def fibonacci(n):\r\n    if n &lt;= 0:\r\n        return \"Invalid input\"\r\n    elif n == 1:\r\n        return 0\r\n    elif n == 2:\r\n        return 1\r\n    else:\r\n        return fibonacci(n - 1) + fibonacci(n - 2)\r\n\r\n# Example usage\r\nprint(fibonacci(10))  # Output: 34\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q28. Could you explain time complexity and space complexity?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Time complexity and space complexity are fundamental metrics for assessing an algorithm&#8217;s efficiency, especially as input sizes grow.<\/p>\n\n\n\n<ul>\n<li><strong>Time Complexity:<\/strong> Time complexity indicates how the runtime of an algorithm increases as the input size (n) grows. It\u2019s expressed in Big O notation, which gives an upper bound on the time an algorithm might take to execute. For instance, a linear search has a time complexity of O(n), meaning its running time scales linearly with the input size.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Example of O(n) time complexity\r\ndef linear_search(arr, target):\r\n    for i in range(len(arr)):\r\n        if arr&#91;i] == target:\r\n            return i\r\n    return -1\r<\/code><\/pre>\n\n\n\n<ul>\n<li><strong>Space Complexity: <\/strong>This measures how much memory an algorithm uses as a function of input size, also expressed in Big O notation. For example, an algorithm that uses a constant amount of extra memory has a space complexity of O(1).<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Example of O(1) space complexity\r\ndef example_function(arr):\r\n    total = 0\r\n    for i in arr:\r\n        total += i\r\n    return total\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q29. How would you read a CSV file into a DataFrame using Pandas?<\/h3>\n\n\n\n<p><strong>Sample Answer:<\/strong> Reading a CSV file into a Pandas DataFrame is simple and efficient. The read_csv() function loads the CSV data into a structured format ideal for data manipulation and analysis.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\r\n\r\n# Reading a CSV file into a DataFrame\r\ndf = pd.read_csv('path_to_file.csv')\r\n\r\n# Displaying the first few rows of the DataFrame\r\nprint(df.head())\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q30. What distinguishes loc from iloc in Pandas?<\/h3>\n\n\n\n<p><strong>Sample Answer:<\/strong> The key difference between loc and iloc lies in how they access and select data within a DataFrame:<\/p>\n\n\n\n<ul>\n<li><strong>loc (Label-based Indexing): <\/strong>loc allows you to select data using labels or boolean arrays. This means you reference the labels of rows and columns (which could be strings, numbers, or other types), making it ideal when you know the specific labels you want to select.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Select rows from index 0 to 5 and specific columns by label \r\n\r\ndf.loc&#91;0:5, &#91;'column1', 'column2']]\r<\/code><\/pre>\n\n\n\n<ul>\n<li><strong>iloc (Position-based Indexing):<\/strong> iloc selects data based on integer positions (row and column indices). It\u2019s helpful when you are not concerned with labels but instead want to access data based on numerical positions (like slicing an array).<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Select rows and columns by integer position\r\n\r\ndf.iloc&#91;0:5, &#91;0, 1]]\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q31. How do you manage missing values in a DataFrame?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Handling missing values is crucial for effective data analysis. Pandas offers several methods to deal with missing data:<\/p>\n\n\n\n<ul>\n<li><strong>Detecting Missing Values: <\/strong>df.isnull() identifies missing values by returning a DataFrame of Boolean values, where True indicates missing data.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Detect missing values\r\nmissing_values = df.isnull()\r<\/code><\/pre>\n\n\n\n<ul>\n<li><strong>Dropping Missing Values:<\/strong> df.dropna() removes rows or columns with missing values, depending on the specified axis (axis=0 for rows and axis=1 for columns).<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Drop rows with missing values\r\ndf_cleaned = df.dropna()\r\n\r\n# Drop columns with missing values\r\ndf_cleaned = df.dropna(axis=1)\r<\/code><\/pre>\n\n\n\n<ul>\n<li><strong>Filling Missing Values: <\/strong>df.fillna(0) replaces missing values with a specified value (e.g., 0). df.fillna(df.mean()) fills missing values with the mean of the respective column.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Fill missing values with a specific value\r\ndf_filled = df.fillna(0)\r\n\r\n# Fill missing values with the mean of the column\r\ndf_filled = df.fillna(df.mean())\r<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Q32. How would you merge two DataFrames in Pandas?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>To merge two DataFrames, I would use the merge function, which operates similarly to <a href=\"https:\/\/trainings.internshala.com\/blog\/sql-roadmap\/\">SQL<\/a> joins. Here is a code to merge two DataFrames in Pandas:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Creating two DataFrames\r\ndf1 = pd.DataFrame({'key': &#91;'A', 'B', 'C'], 'value1': &#91;1, 2, 3]})\r\ndf2 = pd.DataFrame({'key': &#91;'A', 'B', 'D'], 'value2': &#91;4, 5, 6]})\r\n\r\n# Merging DataFrames on the 'key' column\r\nmerged_df = pd.merge(df1, df2, on='key', how='inner')\r\n\r\n# Displaying the merged DataFrame\r\nprint(merged_df)\r<\/code><\/pre>\n\n\n\n<p>In this case, using how=&#8217;inner&#8217; specifies that I want an inner join. Other options include \u2018left\u2019, \u2018right\u2019, or \u2018outer\u2019 for different types of joins.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q33. How do you create a NumPy array?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Creating a NumPy array is quite simple. You can utilize the array function from the NumPy library. Here\u2019s how:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\r\n\r\n# Creating a NumPy array from a list\r\nmy_array = np.array(&#91;1, 2, 3, 4, 5])\r\n\r\n# Displaying the array\r\nprint(my_array)\r<\/code><\/pre>\n\n\n\n<p>This code converts a Python list into a NumPy array. Additionally, you can create arrays with specific shapes and values using functions like np.zeros, np.ones, and np.arange.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q34. Can you explain broadcasting in NumPy? Provide an example.<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Broadcasting is a powerful feature in NumPy that allows operations on arrays of different shapes without needing to create copies of data. NumPy automatically expands smaller arrays to match larger ones during operations. Here\u2019s an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\r\n\r\n# Creating a 1D array\r\narr1 = np.array(&#91;10, 20, 30])\r\n\r\n# Creating a 2D array\r\narr2 = np.array(&#91;&#91;1], &#91;2], &#91;3]])\r\n\r\n# Broadcasting arr1 across arr2\r\noutput = arr1 * arr2\r\n\r\n# Displaying the output\r\nprint(output)\r<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;&#91; 10  20  30]\r\n &#91; 20  40  60]\r\n &#91; 30  60  90]]\r<\/code><\/pre>\n\n\n\n<p>In this case, arr1 is broadcasted to match the shape of arr2, and the multiplication is performed element-wise. Broadcasting eliminates the need for reshaping or looping, making code cleaner and more efficient.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q35. How do you transpose a NumPy array?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Transposing an array involves swapping its rows and columns. You can achieve this using the transpose method or the \u2018.T attribute\u2019. Here\u2019s how:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\r\n\r\n# Creating a 2D array\r\narray = np.array(&#91;&#91;1, 2, 3], &#91;4, 5, 6]])\r\n\r\n# Transposing the array\r\ntransposed_array = array.T\r\n\r\n# Displaying the transposed array\r\nprint(transposed_array)\r<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;&#91;1 4]\r\n &#91;2 5]\r\n &#91;3 6]]\r<\/code><\/pre>\n\n\n\n<p>This operation is especially useful in linear algebra and data manipulation contexts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Q36. How do you perform matrix multiplication using NumPy?<\/h3>\n\n\n\n<p><strong>Sample Answer: <\/strong>Matrix multiplication in NumPy can be achieved either by using the dot() function or the @ operator, both of which are suitable for this operation. Here is how you can perform matrix multiplication using NumPy:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\r\n\r\n# Creating two matrices\r\nmat1 = np.array(&#91;&#91;2, 3], &#91;4, 5]])\r\nmat2 = np.array(&#91;&#91;6, 7], &#91;8, 9]])\r\n\r\n# Matrix multiplication using the dot function\r\nresult = np.dot(mat1, mat2)\r\n\r\n# Alternatively, using the @ operator\r\nresult_alt = mat1 @ mat2\r\n\r\n# Displaying the result\r\nprint(result)\r<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;&#91;36 41]\r\n &#91;64 73]]\r<\/code><\/pre>\n\n\n\n<p>In this case, matrix multiplication is performed by combining rows from mat1 with columns from mat2. This operation is fundamental in areas such as linear algebra and <a href=\"https:\/\/trainings.internshala.com\/blog\/what-is-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">machine learning<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large desktop-image\"><a href=\"https:\/\/internshala.com\/jobs\/?utm_source=is_blog&amp;utm_medium=data-science-coding-interview-questions&amp;utm_campaign=candidate-web-banner\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"203\" src=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1024x203.jpg\" alt=\"Find and Apply Banner\" class=\"wp-image-21795\" srcset=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1024x203.jpg 1024w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-672x133.jpg 672w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-1536x305.jpg 1536w, https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Find-and-Apply-Banner-2048x406.jpg 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full mobile-image\"><a href=\"https:\/\/internshala.com\/jobs\/?utm_source=is_blog&amp;utm_medium=data-science-coding-interview-questions&amp;utm_campaign=candidate-mobile-banner\"><img loading=\"lazy\" decoding=\"async\" width=\"356\" height=\"256\" src=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Job-Banner-for-candidates.jpg\" alt=\"Job Banner for candidates\" class=\"wp-image-21794\"\/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>You must be well aware of both fundamental principles and advanced concepts to ace your data science interview. Practice the data science coding job interview questions to highlight your technical understanding, work experience, and relevant skills. Moreover, if you are a skilled data science professional, check out our guide on the <a href=\"https:\/\/internshala.com\/blog\/highest-paying-data-science-jobs\/\" target=\"_blank\" rel=\"noreferrer noopener\">highest-paying data science jobs in India<\/a> to find out the top salaries you can expect and the relevant skills required for the job. Additionally, explore our data science course assistance. The course offers industry-relevant job training and the curriculum includes modules that will help you acquire the technical skills.<\/p>\n<aside class=\"mashsb-container mashsb-main \"><div class=\"mashsb-box\"><div class=\"mashsb-count mash-medium\" style=\"float:left\"><div class=\"counts mashsbcount\">0<\/div><span class=\"mashsb-sharetext\">SHARES<\/span><\/div><div class=\"mashsb-buttons\"><a class=\"mashicon-facebook mash-medium mashsb-noshadow\" href=\"https:\/\/www.facebook.com\/sharer.php?u=https%3A%2F%2Finternshala.com%2Fblog%2Fdata-science-coding-interview-questions%2F\" target=\"_top\" rel=\"nofollow\"><span class=\"icon\"><\/span><span class=\"text\">Share&nbsp;on&nbsp;Facebook<\/span><\/a><a class=\"mashicon-subscribe mash-medium mashsb-noshadow\" href=\"#\" target=\"_top\" rel=\"nofollow\"><span class=\"icon\"><\/span><span class=\"text\">Get&nbsp;Your&nbsp;Dream&nbsp;Internship<\/span><\/a><div class=\"onoffswitch2 mash-medium mashsb-noshadow\" style=\"display:none\"><\/div><\/div>\n            <\/div>\n                <div style=\"clear:both\"><\/div><\/aside>\n            <!-- Share buttons by mashshare.net - Version: 4.0.42-->","protected":false},"excerpt":{"rendered":"<p>Data science has become one of the most revolutionizing fields that help companies make more informed and profitable business decisions. As a result, almost every tech and non-tech companies hire<\/p>\n","protected":false},"author":6475,"featured_media":25655,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":true,"footnotes":""},"categories":[4316],"tags":[8877,8878],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Top 36 Data Science Coding Interview Questions and Answers<\/title>\n<meta name=\"description\" content=\"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top 36 Data Science Coding Interview Questions and Answers\" \/>\n<meta property=\"og:description\" content=\"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\" \/>\n<meta property=\"og:site_name\" content=\"Internshala blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-06T10:37:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-06T10:37:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/11\/data-science-coding-interview-questions.png\" \/>\n\t<meta property=\"og:image:width\" content=\"390\" \/>\n\t<meta property=\"og:image:height\" content=\"255\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Aseem\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Aseem\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\"},\"author\":{\"name\":\"Aseem\",\"@id\":\"https:\/\/internshala.com\/blog\/#\/schema\/person\/9de1169b484c83702910ef75aebdeab3\"},\"headline\":\"Top 36 Data Science Coding Interview Questions and Answers\",\"datePublished\":\"2024-11-06T10:37:08+00:00\",\"dateModified\":\"2024-11-06T10:37:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\"},\"wordCount\":3246,\"publisher\":{\"@id\":\"https:\/\/internshala.com\/blog\/#organization\"},\"keywords\":[\"data science coding interview questions\",\"data science interview coding questions\"],\"articleSection\":[\"Interview Guide\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\",\"url\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\",\"name\":\"Top 36 Data Science Coding Interview Questions and Answers\",\"isPartOf\":{\"@id\":\"https:\/\/internshala.com\/blog\/#website\"},\"datePublished\":\"2024-11-06T10:37:08+00:00\",\"dateModified\":\"2024-11-06T10:37:10+00:00\",\"description\":\"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.\",\"breadcrumb\":{\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/internshala.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Job Tips\",\"item\":\"https:\/\/internshala.com\/blog\/job-tips\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Interview Guide\",\"item\":\"https:\/\/internshala.com\/blog\/job-tips\/interview-guide\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Data Science Coding Interview Questions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/internshala.com\/blog\/#website\",\"url\":\"https:\/\/internshala.com\/blog\/\",\"name\":\"Internshala blog\",\"description\":\"Your favourite senior outside college\",\"publisher\":{\"@id\":\"https:\/\/internshala.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/internshala.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/internshala.com\/blog\/#organization\",\"name\":\"Internshala blog\",\"url\":\"https:\/\/internshala.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/internshala.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2023\/08\/LOGO-1.png\",\"contentUrl\":\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2023\/08\/LOGO-1.png\",\"width\":112,\"height\":31,\"caption\":\"Internshala blog\"},\"image\":{\"@id\":\"https:\/\/internshala.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/internshala.com\/blog\/#\/schema\/person\/9de1169b484c83702910ef75aebdeab3\",\"name\":\"Aseem\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/internshala.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Aseem-96x96.jpg\",\"contentUrl\":\"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Aseem-96x96.jpg\",\"caption\":\"Aseem\"},\"description\":\"Aseem Garg is the Vice President of Engineering at Internshala, leading technology and driving innovation across web and mobile platforms. A full stack and Android engineer with over 9 years of experience, he focuses on building seamless digital experiences while implementing efficient DevOps practices. Passionate about technology, he continues to shape scalable and impactful solutions in the ed-tech space.\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/aseem-garg-46ab4a59\/\"],\"url\":\"https:\/\/internshala.com\/blog\/author\/aseem\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Top 36 Data Science Coding Interview Questions and Answers","description":"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/","og_locale":"en_US","og_type":"article","og_title":"Top 36 Data Science Coding Interview Questions and Answers","og_description":"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.","og_url":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/","og_site_name":"Internshala blog","article_published_time":"2024-11-06T10:37:08+00:00","article_modified_time":"2024-11-06T10:37:10+00:00","og_image":[{"width":390,"height":255,"url":"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/11\/data-science-coding-interview-questions.png","type":"image\/png"}],"author":"Aseem","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Aseem","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#article","isPartOf":{"@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/"},"author":{"name":"Aseem","@id":"https:\/\/internshala.com\/blog\/#\/schema\/person\/9de1169b484c83702910ef75aebdeab3"},"headline":"Top 36 Data Science Coding Interview Questions and Answers","datePublished":"2024-11-06T10:37:08+00:00","dateModified":"2024-11-06T10:37:10+00:00","mainEntityOfPage":{"@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/"},"wordCount":3246,"publisher":{"@id":"https:\/\/internshala.com\/blog\/#organization"},"keywords":["data science coding interview questions","data science interview coding questions"],"articleSection":["Interview Guide"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/","url":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/","name":"Top 36 Data Science Coding Interview Questions and Answers","isPartOf":{"@id":"https:\/\/internshala.com\/blog\/#website"},"datePublished":"2024-11-06T10:37:08+00:00","dateModified":"2024-11-06T10:37:10+00:00","description":"Practice top data science coding interview questions with our guide. Highlight relevant skills and technical expertise in answering the interview questions.","breadcrumb":{"@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/internshala.com\/blog\/data-science-coding-interview-questions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/internshala.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Job Tips","item":"https:\/\/internshala.com\/blog\/job-tips\/"},{"@type":"ListItem","position":3,"name":"Interview Guide","item":"https:\/\/internshala.com\/blog\/job-tips\/interview-guide\/"},{"@type":"ListItem","position":4,"name":"Data Science Coding Interview Questions"}]},{"@type":"WebSite","@id":"https:\/\/internshala.com\/blog\/#website","url":"https:\/\/internshala.com\/blog\/","name":"Internshala blog","description":"Your favourite senior outside college","publisher":{"@id":"https:\/\/internshala.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/internshala.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/internshala.com\/blog\/#organization","name":"Internshala blog","url":"https:\/\/internshala.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/internshala.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2023\/08\/LOGO-1.png","contentUrl":"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2023\/08\/LOGO-1.png","width":112,"height":31,"caption":"Internshala blog"},"image":{"@id":"https:\/\/internshala.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/internshala.com\/blog\/#\/schema\/person\/9de1169b484c83702910ef75aebdeab3","name":"Aseem","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/internshala.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Aseem-96x96.jpg","contentUrl":"https:\/\/internshala.com\/blog\/wp-content\/uploads\/2024\/01\/Aseem-96x96.jpg","caption":"Aseem"},"description":"Aseem Garg is the Vice President of Engineering at Internshala, leading technology and driving innovation across web and mobile platforms. A full stack and Android engineer with over 9 years of experience, he focuses on building seamless digital experiences while implementing efficient DevOps practices. Passionate about technology, he continues to shape scalable and impactful solutions in the ed-tech space.","sameAs":["https:\/\/www.linkedin.com\/in\/aseem-garg-46ab4a59\/"],"url":"https:\/\/internshala.com\/blog\/author\/aseem\/"}]}},"_links":{"self":[{"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/posts\/25654"}],"collection":[{"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/users\/6475"}],"replies":[{"embeddable":true,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/comments?post=25654"}],"version-history":[{"count":0,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/posts\/25654\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/media\/25655"}],"wp:attachment":[{"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/media?parent=25654"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/categories?post=25654"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/internshala.com\/blog\/wp-json\/wp\/v2\/tags?post=25654"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}