On This Page

This set of Machine Learning (ML) Multiple Choice Questions & Answers (MCQs) focuses on Machine Learning Set 3

Q1 | In multiclass classification number of classes must be
  • less than two
  • equals to two
  • greater than two
  • option 1 and option 2
Q2 | Which of the following can only be used when training data are linearlyseparable?
  • linear hard-margin svm
  • linear logistic regression
  • linear soft margin svm
  • the centroid method
Q3 | Impact of high variance on the training set ?
  • overfitting
  • underfitting
  • both underfitting & overfitting
  • depents upon the dataset
Q4 | What do you mean by a hard margin?
  • the svm allows very low error in classification
  • the svm allows high amount of error in classification
  • both 1 & 2
  • none of the above
Q5 | The effectiveness of an SVM depends upon:
  • selection of kernel
  • kernel parameters
  • soft margin parameter c
  • all of the above
Q6 | What are support vectors?
  • all the examples that have a non-zero weight ??k in a svm
  • the only examples necessary to compute f(x) in an svm.
  • all of the above
  • none of the above
Q7 | A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.
  • true
  • false
  • sometimes – it can also output intermediate values as well
  • can’t say
Q8 | What is the purpose of the Kernel Trick?
  • to transform the data from nonlinearly separable to linearly separable
  • to transform the problem from regression to classification
  • to transform the problem from supervised to unsupervised learning.
  • all of the above
Q9 | Which of the following can only be used when training data are linearlyseparable?
  • linear hard-margin svm
  • linear logistic regression
  • linear soft margin svm
  • parzen windows
Q10 | The firing rate of a neuron
  • determines how strongly the dendrites of theneuron stimulate axons of neighboring neurons
  • is more analogous to the output of a unit in aneural net than the output voltage of the neuron
  • only changes very slowly, taking a period ofseveral seconds to make large adjustments
  • can sometimes exceed 30,000 action potentialsper second
Q11 | Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?
  • auc-roc
  • accuracy
  • logloss
  • mean-squared-error
Q12 | The cost parameter in the SVM means:
  • the number of cross-validations to be made
  • the kernel to be used
  • the tradeoff between misclassification and simplicity of the model
  • none of the above
Q13 | The kernel trick
  • can be applied to every classification algorithm
  • is commonly used for dimensionality reduction
  • changes ridge regression so we solve a d ?? dlinear system instead of an n ?? n system, given nsample points with d features
  • exploits the fact that in many learning algorithms, the weights can be written as a linearcombination of input points
Q14 | How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinaryleast squares regression?
  • ridge has larger bias, larger variance
  • ridge has smaller bias, larger variance
  • ridge has larger bias, smaller variance
  • ridge has smaller bias, smaller variance
Q15 | Which of the following are real world applications of the SVM?
  • text and hypertext categorization
  • image classification
  • clustering of news articles
  • all of the above
Q16 | How can SVM be classified?
  • it is a model trained using unsupervised learning. it can be used for classification and regression.
  • it is a model trained using unsupervised learning. it can be used for classification but not for regression.
  • it is a model trained using supervised learning. it can be used for classification and regression.
  • t is a model trained using unsupervised learning. it can be used for classification but not for regression.
Q17 | Which of the following can help to reduce overfitting in an SVM classifier?
  • use of slack variables
  • high-degree polynomial features
  • normalizing the data
  • setting a very low learning rate
Q18 | Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. Which of the following is best option would you more likely to consider iterating SVM next time?
  • you want to increase your data points
  • you want to decrease your data points
  • you will try to calculate more variables
  • you will try to reduce the features
Q19 | What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function
  • 1
  • 2
  • 1 and 2
  • none of these
Q20 | You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which is false.
  • this is an instance of overfitting
  • this is an instance of underfitting
  • the training was not well regularized
  • the training and testing examples are sampled from different distributions
Q21 | Suppose your model is demonstrating high variance across the different training sets. Which of the following is NOT valid way to try and reduce the variance?
  • increase the amount of traning data in each traning set
  • improve the optimization algorithm being used for error minimization.
  • decrease the model complexity
  • reduce the noise in the training data
Q22 | Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
  • the model would consider even far away points from hyperplane for modeling
  • the model would consider only the points close to the hyperplane for modeling
  • the model would not be affected by distance of points from hyperplane for modeling
  • none of the above
Q23 | We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other2. Some times, feature normalization is not feasible in case of categorical variables3. Feature normalization always helps when we use Gaussian kernel in SVM
  • 1
  • 1 and 2
  • 1 and 3
  • 2 and 3
Q24 | Wrapper methods are hyper-parameter selection methods that
  • should be used whenever possible because they are computationally efficient
  • should be avoided unless there are no other options because they are always prone to overfitting.
  • are useful mainly when the learning machines are “black boxes”
  • should be avoided altogether.
Q25 | Which of the following methods can not achieve zero training error on any linearly separable dataset?
  • decision tree
  • 15-nearest neighbors
  • hard-margin svm
  • perceptron