Learn Data Science Course in Kolkata | IIAS FuturEd

Top 20 Interview Questions on Data Science

  1. What do you know about Data Science?

Data science is a multidisciplinary field that uses technologies, descriptive statistics, and algorithms to extract knowledge from structured and unstructured data. A Data Scientist’s Skills are in-demand global skills of the hour now, applicable for every organization.

  1. Python or R – Which one would you prefer for analysing a text?

Python for Data Science is preferable because it has a panda library that gives easy access to use data structure properly.

  1. What are Interpolation and Extrapolation ?

Estimating a value from two known values is called interpolation. And a guessingy a value from two known values is called extrapolation. Data Scientist’s jobs are to estimate values through these processes.

  1. What is Long and Wide Format Data?

In long format data, each row is a one-time point per subject and in wide format, subject’s repeated response will be in a single row and in each format there will be a separate column.

  1. What can be the apt qualification to be a Data Scientist ?

Data Scientist’s Qualification should be based on science background because Data Scientist’s job is to extract data from Big Data by calculation and technology which he has learned in Data Science Courses Online or offline. He should have proper knowledge of computer, statistics, and technology.

  1. What is Overfitting and Underfitting ?
  • OVERFITTING: a statistical model which describes random error instead of the underlying relationship
  • UNDERFITTING: a statistical model which cannot capture underlying data 
  1. What kind of Sampling is needed in Big Data and Data Science ?

Cluster sampling is needed in the field of Big Data and Data Science otherwise it will be difficult to target population in a wide area.          

  1. What is Cross-Validation? 

It is a model validation technique used to evaluate how outcomes of statistical analysis will generalize to an independent dataset.

  1. What is Machine Learning?

It is a scientific study of algorithm and descriptive statistics which helps the computer system to do a specific task.  It is a field which helps the computer to act without being explicitly programmed.

  1. What is Deep Learning?

It is a paradigm of machine learning which you learn during your Data Science Certification Courses. It is a learning method based on the artificial neural network.

  1. What is an Artificial Neural Network?

It is a set of the algorithm in the field of Big Data and Data Science. It has revolutionized machine learning. The neural network can adapt to change the input to generate the best possible results. 

  1. What are the Frameworks of Deep Learning?
  1. What is Selection Bias?

Selection bias occurs when the sample obtained is not representative of the population which is intended to be analysed.

  1. What are the Kernel Functions in SVM?
  1. What is Entropy in the field of Big Data and Data Science?

It is the core algorithm for building a decision tree. A decision tree is built top-down from a root node. It involves partitioning of data into a homogeneous group

  1. What is meant by Pruning in a Decision Tres?

The removal of sub-nodes of a decision node is called pruning. It is also called the opposite process of splitting.

  1. What are the skills that a Data Scientist should have?
  • A data scientist should come from a science background, knowing algorithm, statistics and computer programs well. Data Scientist’s Skills are based on that.
  • He should know Python, R, Java, SQL well
  • Data Science Master’s program helps data scientist to get a deeper knowledge in data science
  1. What is the best way to use Hadoop and R?

Big Data Hadoop and R, are complements to each other. They are used to visualize and analyse Big Data.

  1. What does Union do?

Union removes duplicate records. Duplicate records containing all the columns are the same in value.

  1. What is Logistic Regression?

It is also known as the logit model. It is a technique to forecast the binary outcome from a linear combination of predictor variables.

