Machine Learning Set 3 MCQs

Q1 | In multiclass classification number of classes must be

less than two
equals to two
greater than two
option 1 and option 2

Answer: greater than two

Q2 | Which of the following can only be used when training data are linearlyseparable?

linear hard-margin svm
linear logistic regression
linear soft margin svm
the centroid method

Answer: linear hard-margin svm

Q3 | Impact of high variance on the training set ?

overfitting
underfitting
both underfitting & overfitting
depents upon the dataset

Answer: overfitting

Q4 | What do you mean by a hard margin?

the svm allows very low error in classification
the svm allows high amount of error in classification
both 1 & 2
none of the above

Answer: the svm allows very low error in classification

Q5 | The effectiveness of an SVM depends upon:

selection of kernel
kernel parameters
soft margin parameter c
all of the above

Answer: selection of kernel

Q6 | What are support vectors?

all the examples that have a non-zero weight ??k in a svm
the only examples necessary to compute f(x) in an svm.
all of the above
none of the above

Answer: all of the above

Q7 | A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.

true
false
sometimes – it can also output intermediate values as well
can’t say

Answer: true

Q8 | What is the purpose of the Kernel Trick?

to transform the data from nonlinearly separable to linearly separable
to transform the problem from regression to classification
to transform the problem from supervised to unsupervised learning.
all of the above

Answer: to transform the data from nonlinearly separable to linearly separable

Q9 | Which of the following can only be used when training data are linearlyseparable?

linear hard-margin svm
linear logistic regression
linear soft margin svm
parzen windows

Answer: linear hard-margin svm

Q10 | The firing rate of a neuron

determines how strongly the dendrites of theneuron stimulate axons of neighboring neurons
is more analogous to the output of a unit in aneural net than the output voltage of the neuron
only changes very slowly, taking a period ofseveral seconds to make large adjustments
can sometimes exceed 30,000 action potentialsper second

Answer: is more analogous to the output of a unit in aneural net than the output voltage of the neuron

Q11 | Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?

auc-roc
accuracy
logloss
mean-squared-error

Answer: mean-squared-error

Q12 | The cost parameter in the SVM means:

the number of cross-validations to be made
the kernel to be used
the tradeoff between misclassification and simplicity of the model
none of the above

Answer: the tradeoff between misclassification and simplicity of the model

Q13 | The kernel trick

can be applied to every classification algorithm
is commonly used for dimensionality reduction
changes ridge regression so we solve a d ?? dlinear system instead of an n ?? n system, given nsample points with d features
exploits the fact that in many learning algorithms, the weights can be written as a linearcombination of input points

Answer: exploits the fact that in many learning algorithms, the weights can be written as a linearcombination of input points

Q14 | How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinaryleast squares regression?

ridge has larger bias, larger variance
ridge has smaller bias, larger variance
ridge has larger bias, smaller variance
ridge has smaller bias, smaller variance

Answer: ridge has larger bias, smaller variance

Q15 | Which of the following are real world applications of the SVM?

text and hypertext categorization
image classification
clustering of news articles
all of the above

Answer: all of the above

Q16 | How can SVM be classified?

it is a model trained using unsupervised learning. it can be used for classification and regression.
it is a model trained using unsupervised learning. it can be used for classification but not for regression.
it is a model trained using supervised learning. it can be used for classification and regression.
t is a model trained using unsupervised learning. it can be used for classification but not for regression.

Answer: it is a model trained using supervised learning. it can be used for classification and regression.

Q17 | Which of the following can help to reduce overfitting in an SVM classifier?

use of slack variables
high-degree polynomial features
normalizing the data
setting a very low learning rate

Answer: use of slack variables

Q18 | Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. Which of the following is best option would you more likely to consider iterating SVM next time?

you want to increase your data points
you want to decrease your data points
you will try to calculate more variables
you will try to reduce the features

Answer: you will try to calculate more variables

Q19 | What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function

1
2
1 and 2
none of these

Answer: 1 and 2

Q20 | You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which is false.

this is an instance of overfitting
this is an instance of underfitting
the training was not well regularized
the training and testing examples are sampled from different distributions

Answer: this is an instance of underfitting

Q21 | Suppose your model is demonstrating high variance across the different training sets. Which of the following is NOT valid way to try and reduce the variance?

increase the amount of traning data in each traning set
improve the optimization algorithm being used for error minimization.
decrease the model complexity
reduce the noise in the training data

Answer: improve the optimization algorithm being used for error minimization.

Q22 | Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

the model would consider even far away points from hyperplane for modeling
the model would consider only the points close to the hyperplane for modeling
the model would not be affected by distance of points from hyperplane for modeling
none of the above

Answer: the model would consider only the points close to the hyperplane for modeling

Q23 | We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other2. Some times, feature normalization is not feasible in case of categorical variables3. Feature normalization always helps when we use Gaussian kernel in SVM

1
1 and 2
1 and 3
2 and 3

Answer: 1 and 2

Q24 | Wrapper methods are hyper-parameter selection methods that

should be used whenever possible because they are computationally efficient
should be avoided unless there are no other options because they are always prone to overfitting.
are useful mainly when the learning machines are “black boxes”
should be avoided altogether.

Answer: are useful mainly when the learning machines are “black boxes”

Q25 | Which of the following methods can not achieve zero training error on any linearly separable dataset?

decision tree
15-nearest neighbors
hard-margin svm
perceptron

Answer: 15-nearest neighbors