Basic data science review for interview

1/24/2024

Bivariateīivariate data involves two different variables. The patterns can be studied by drawing conclusions using mean, median, mode, dispersion or range, minimum, maximum, etc. The purpose of the univariate analysis is to describe the data and find patterns that exist within it. Univariate data contains only one variable. Differentiate between univariate, bivariate, and multivariate analysis. Use regularization techniques, such as LASSO, that penalize certain model parameters if they're likely to cause overfittingĦ.Use cross-validation techniques, such as k folds cross-validation.Keep the model simple-take fewer variables into account, thereby removing some of the noise in the training data.There are three main methods to avoid overfitting: Overfitting refers to a model that is only set for a very small amount of data and ignores the bigger picture. How can you avoid overfitting your model? Build forest by repeating steps one to four for 'n' times to create 'n' number of treesĥ.Repeat steps two and three until leaf nodes are finalized.Split the node into daughter nodes using the best split.

Among the 'k' features, calculate the node D using the best split point.Randomly select 'k' features from a total of 'm' features where k If you split the data into different packages and make a decision tree in each of the different groups of data, the random forest brings all those trees together. How do you build a random forest model?Ī random forest is built up of a number of decision trees. It is clear from the decision tree that an offer is accepted if:Ĥ. The decision tree for this case is as shown: Repeat the same procedure on every branch until the decision node of each branch is finalizedįor example, let's say you want to build a decision tree to decide whether you should accept or decline a job offer.Choose the attribute with the highest information gain as the root node.Calculate your information gain of all attributes (we gain information on sorting different objects from each other).

Calculate entropy of the target variable, as well as the predictor attributes.Explain the steps in making a decision tree. The formula and graph for the sigmoid function are as shown:ģ. The image shown below depicts how logistic regression works: Logistic regression measures the relationship between the dependent variable (our label of what we want to predict) and one or more independent variables (our features) by estimating probability using its underlying logistic function (sigmoid). The most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, and apriori algorithm.Unsupervised learning has no feedback mechanism.The most commonly used supervised learning algorithms are decision trees, logistic regression, and support vector machine.Supervised learning has a feedback mechanism.What are the differences between supervised and unsupervised learning? Here's a list of the most popular data science interview questions on the technical concept which you can expect to face, and how to frame your answers.

Mention some techniques used for sampling.īasic and Advanced Data Science Interview Questions.
How should you maintain a deployed model?.
Differentiate between univariate, bivariate, and multivariate analysis.
Explain the steps in making a decision tree.
What are the differences between supervised and unsupervised learning?.
Differentiate between Data Analytics and Data Science.
10 Most Asked Data Science Interview Questions Simply, data science means analysing data for actionable insights. Data Science is simply the application of specific principles and analytic techniques to extract information from data used in strategic planning, decision making, etc.

Data Science combines statistics, maths, specialised programs, artificial intelligence, machine learning etc.

0 Comments

Basic data science review for interview

Leave a Reply.

Author

Archives

Categories