Free Certified Machine Learning Associate Sample Questions — Certified Machine Learning Associate

Free Certified Machine Learning Associate sample questions for the Certified Machine Learning Associate exam. No account required: study at your own pace.

Want an interactive quiz? Take the full Certified Machine Learning Associate practice test

Looking for more? Click here to get the full PDF with 30+ practice questions for $4 for offline study and deeper preparation.

Question 1

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

  • A. MLflow Experiment Tracking
  • B. Spark ML
  • C. Autoscaling clusters
  • D. Hyperopt
  • E. Delta Lake
Show Answer
Correct Answer:
D. Hyperopt
Question 2

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository. Which of the following explanations justifies this suggestion?

  • A. One-hot encoding is not supported by most machine learning libraries
  • B. One-hot encoding is dependent on the target variable’s values which differ for each application
  • C. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems
  • D. One-hot encoding is not a common strategy for representing categorical feature variables numerically
  • E. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms
Show Answer
Correct Answer:
C. One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems
Question 3

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature. Which of the following lines of code can the data scientist run to accomplish the task?

  • A. spark_df.summary ()
  • B. spark_df.stats()
  • C. spark_df.describe().head()
  • D. spark_df.printSchema()
  • E. spark_df.toPandas()
Show Answer
Correct Answer:
A. spark_df.summary ()
Question 4

A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process. Which of the following feature engineering tasks will be the least efficient to distribute?

  • A. One-hot encoding categorical features
  • B. Target encoding categorical features
  • C. Imputing missing feature values with the mean
  • D. Imputing missing feature values with the true median
  • E. Creating binary indicator features for missing values
Show Answer
Correct Answer:
D. Imputing missing feature values with the true median
Question 5

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

  • A. pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata
  • B. pandas API on Spark DataFrames are more performant than Spark DataFrames
  • C. pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata
  • D. pandas API on Spark DataFrames are less mutable versions of Spark DataFrames
  • E. pandas API on Spark DataFrames are unrelated to Spark DataFrames
Show Answer
Correct Answer:
C. pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

Aced these? Get the Full Exam

Download the complete Certified Machine Learning Associate study bundle with 30+ questions in a single printable PDF.