Looking for more? Click here to get the full PDF with 278+ practice questions for $10 for offline study and deeper preparation.
Question 1
A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data. The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the data. The data scientist also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data. Which solution will meet these requirements with the LEAST amount of compute resources?
A. Import the data by using the None option
B. Import the data by using the Stratified option
C. Import the data by using the First K option. Infer the value of K from domain knowledge
D. Import the data by using the Randomized option. Infer the random size from domain knowledge
Show Answer
Correct Answer:
C. Import the data by using the First K option. Infer the value of K from domain knowledge
Question 2
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end- to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS. Which approach should the Specialist use for training a model using that data?
A. Write a direct connection to the SQL database within the notebook and pull data in
B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook
C. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in
D. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access
Show Answer
Correct Answer:
B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook
Question 3
An ecommerce company sends a weekly email newsletter to all of its customers. Management has hired a team of writers to create additional targeted content. A data scientist needs to identify five customer segments based on age, income, and location. The customers' current segmentation is unknown. The data scientist previously built an XGBoost model to predict the likelihood of a customer responding to an email based on age, income, and location. Why does the XGBoost model NOT meet the current requirements, and how can this be fixed?
A. The XGBoost model provides a true/false binary output. Apply principal component analysis (PCA) with five feature dimensions to predict a segment
B. The XGBoost model provides a true/false binary output. Increase the number of classes the XGBoost model predicts to five classes to predict a segment
C. The XGBoost model is a supervised machine learning algorithm. Train a k-Nearest-Neighbors (kNN) model with K = 5 on the same dataset to predict a segment
D. The XGBoost model is a supervised machine learning algorithm. Train a k-means model with K = 5 on the same dataset to predict a segment
Show Answer
Correct Answer:
D. The XGBoost model is a supervised machine learning algorithm. Train a k-means model with K = 5 on the same dataset to predict a segment
Question 4
A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values. Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamples the data daily and exports the data for further modeling. Which solution will meet these requirements with the LEAST implementation effort?
A. Use Amazon EMR Serverless with PySpark
B. Use AWS Glue DataBrew
C. Use Amazon SageMaker Studio Data Wrangler
D. Use Amazon SageMaker Studio Notebook with Pandas
Show Answer
Correct Answer:
C. Use Amazon SageMaker Studio Data Wrangler
Question 5
The chief editor for a product catalog wants the research and development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has a set of training data. Which machine learning algorithm should the researchers use that BEST meets their requirements?
A. Latent Dirichlet Allocation (LDA)
B. Recurrent neural network (RNN)
C. K-means
D. Convolutional neural network (CNN)
Show Answer
Correct Answer:
D. Convolutional neural network (CNN)
Question 6
A machine learning (ML) specialist collected daily product usage data for a group of customers. The ML specialist appended customer metadata such as age and gender from an external data source. The ML specialist wants to understand product usage patterns for each day of the week for customers in specific age groups. The ML specialist creates two categorical features named dayofweek and binned_age, respectively. Which approach should the ML specialist use discover the relationship between the two new categorical features?
A. Create a scatterplot for day_of_week and binned_age
B. Create crosstabs for day_of_week and binned_age
C. Create word clouds for day_of_week and binned_age
D. Create a boxplot for day_of_week and binned_age
Show Answer
Correct Answer:
B. Create crosstabs for day_of_week and binned_age
Question 7
A tourism company uses a machine learning (ML) model to make recommendations to customers. The company uses an Amazon SageMaker environment and set hyperparameter tuning completion criteria to MaxNumberOfTrainingJobs. An ML specialist wants to change the hyperparameter tuning completion criteria. The ML specialist wants to stop tuning immediately after an internal algorithm determines that tuning job is unlikely to improve more than 1% over the objective metric from the best training job. Which completion criteria will meet this requirement?
A. MaxRuntimeInSeconds
B. TargetObjectiveMetricValue
C. CompleteOnConvergence
D. MaxNumberOfTrainingJobsNotImproving
Show Answer
Correct Answer:
C. CompleteOnConvergence
Question 8
A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile. Which feature engineering strategy should the ML specialist use with Amazon SageMaker?
A. Apply dimensionality reduction by using the principal component analysis (PCA) algorithm
B. Drop the features with low correlation scores by using a Jupyter notebook
C. Apply anomaly detection by using the Random Cut Forest (RCF) algorithm
D. Concatenate the features with high correlation scores by using a Jupyter notebook
Show Answer
Correct Answer:
A. Apply dimensionality reduction by using the principal component analysis (PCA) algorithm
Question 9
A Machine Learning team uses Amazon SageMaker to train an Apache MXNet handwritten digit classifier model using a research dataset. The team wants to receive a notification when the model is overfitting. Auditors want to view the Amazon SageMaker log activity report to ensure there are no unauthorized API calls. What should the Machine Learning team do to address the requirements with the least amount of code and fewest steps?
A. Implement an AWS Lambda function to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting
B. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting
C. Implement an AWS Lambda function to log Amazon SageMaker API calls to AWS CloudTrail. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting
D. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Set up Amazon SNS to receive a notification when the model is overfitting
Show Answer
Correct Answer:
B. Use AWS CloudTrail to log Amazon SageMaker API calls to Amazon S3. Add code to push a custom metric to Amazon CloudWatch. Create an alarm in CloudWatch with Amazon SNS to receive a notification when the model is overfitting
Question 10
A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced. What should the engineer do to improve the validation accuracy of the model?
A. Perform stratified sampling on the original dataset
B. Acquire additional data about the majority classes in the original dataset
C. Use a smaller, randomly sampled version of the training dataset
D. Perform systematic sampling on the original dataset
Show Answer
Correct Answer:
A. Perform stratified sampling on the original dataset
Question 11
A large company has developed a BI application that generates reports and dashboards using data collected from various operational metrics. The company wants to provide executives with an enhanced experience so they can use natural language to get data from the reports. The company wants the executives to be able ask questions using written and spoken interfaces. Which combination of services can be used to build this conversational interface? (Choose three.)
A. Alexa for Business
B. Amazon Connect
C. Amazon Lex
D. Amazon Polly
E. Amazon Comprehend
F. Amazon Transcribe
Show Answer
Correct Answer:
C. Amazon Lex
D. Amazon Polly
F. Amazon Transcribe
Question 12
A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container. Which action will provide the MOST secure protection?
A. Remove Amazon S3 access permissions from the SageMaker execution role
B. Encrypt the weights of the CNN model
C. Encrypt the training and validation dataset
D. Enable network isolation for training jobs
Show Answer
Correct Answer:
D. Enable network isolation for training jobs
Question 13
A data scientist is reviewing customer comments about a company's products. The data scientist needs to present an initial exploratory analysis by using charts and a word cloud. The data scientist must use feature engineering techniques to prepare this analysis before starting a natural language processing (NLP) model. Which combination of feature engineering techniques should the data scientist use to meet these requirements? (Choose two.)
A. Named entity recognition
B. Coreference
C. Stemming
D. Term frequency-inverse document frequency (TF-IDF)
E. Sentiment analysis
Show Answer
Correct Answer:
C. Stemming
D. Term frequency-inverse document frequency (TF-IDF)
Question 14
A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products. Which solution will meet these requirements with the MOST operational efficiency?
A. Build a custom clustering model. Create a Dockerfile and build a Docker image. Register the Docker image in Amazon Elastic Container Registry (Amazon ECR). Use the custom image in Amazon SageMaker to generate a trained model
B. Tokenize the data and transform the data into tabular data. Train an Amazon SageMaker k-means model to generate the product categories
C. Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories
D. Train an Amazon SageMaker Blazing Text model to generate the product categories
Show Answer
Correct Answer:
C. Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories
Question 15
A company wants to conduct targeted marketing to sell solar panels to homeowners. The company wants to use machine learning (ML) technologies to identify which houses already have solar panels. The company has collected 8,000 satellite images as training data and will use Amazon SageMaker Ground Truth to label the data. The company has a small internal team that is working on the project. The internal team has no ML expertise and no ML experience. Which solution will meet these requirements with the LEAST amount of effort from the internal team?
A. Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use Amazon Rekognition Custom Labels for model training and hosting
B. Set up a private workforce that consists of the internal team. Use the private workforce to label the data. Use Amazon Rekognition Custom Labels for model training and hosting
C. Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference
D. Set up a public workforce. Use the public workforce to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference
Show Answer
Correct Answer:
A. Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use Amazon Rekognition Custom Labels for model training and hosting
Question 16
A media company wants to create a solution that identifies celebrities in pictures that users upload. The company also wants to identify the IP address and the timestamp details from the users so the company can prevent users from uploading pictures from unauthorized locations. Which solution will meet these requirements with LEAST development effort?
A. Use AWS Panorama to identify celebrities in the pictures. Use AWS CloudTrail to capture IP address and timestamp details
B. Use AWS Panorama to identify celebrities in the pictures. Make calls to the AWS Panorama Device SDK to capture IP address and timestamp details
C. Use Amazon Rekognition to identify celebrities in the pictures. Use AWS CloudTrail to capture IP address and timestamp details
D. Use Amazon Rekognition to identify celebrities in the pictures. Use the text detection feature to capture IP address and timestamp details
Show Answer
Correct Answer:
C. Use Amazon Rekognition to identify celebrities in the pictures. Use AWS CloudTrail to capture IP address and timestamp details
Question 17
A medical imaging company wants to train a computer vision model to detect areas of concern on patients' CT scans. The company has a large collection of unlabeled CT scans that are linked to each patient and stored in an Amazon S3 bucket. The scans must be accessible to authorized users only. A machine learning engineer needs to build a labeling pipeline. Which set of steps should the engineer take to build the labeling pipeline with the LEAST effort?
A. Create a workforce with AWS Identity and Access Management (IAM). Build a labeling tool on Amazon EC2 Queue images for labeling by using Amazon Simple Queue Service (Amazon SQS). Write the labeling instructions
B. Create an Amazon Mechanical Turk workforce and manifest file. Create a labeling job by using the built-in image classification task type in Amazon SageMaker Ground Truth. Write the labeling instructions
C. Create a private workforce and manifest file. Create a labeling job by using the built-in bounding box task type in Amazon SageMaker Ground Truth. Write the labeling instructions
D. Create a workforce with Amazon Cognito. Build a labeling web application with AWS Amplify. Build a labeling workflow backend using AWS Lambda. Write the labeling instructions
Show Answer
Correct Answer:
C. Create a private workforce and manifest file. Create a labeling job by using the built-in bounding box task type in Amazon SageMaker Ground Truth. Write the labeling instructions
Question 18
A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question. A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model. Which solution will meet these requirements?
A. Perform one-hot encoding on every possible option for each question of the survey
B. Perform binning on all the answers each respondent selected for each question
C. Use Amazon Mechanical Turk to create categorical labels for each set of possible responses
D. Use Amazon Textract to create numeric features for each set of possible responses
Show Answer
Correct Answer:
A. Perform one-hot encoding on every possible option for each question of the survey
Question 19
A finance company needs to forecast the price of a commodity. The company has compiled a dataset of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset and must validate the efficacy of those models on the remaining 20% of the dataset. How should the data scientist split the dataset into a training dataset and a validation dataset to compare model performance?
A. Pick a date so that 80% of the data points precede the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset
B. Pick a date so that 80% of the data points occur after the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset
C. Starting from the earliest date in the dataset, pick eight data points for the training dataset and two data points for the validation dataset. Repeat this stratified sampling until no data points remain
D. Sample data points randomly without replacement so that 80% of the data points are in the training dataset. Assign all the remaining data points to the validation dataset
Show Answer
Correct Answer:
A. Pick a date so that 80% of the data points precede the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset
Question 20
A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression. During exploratory data analysis, the Specialist observes that many features are highly correlated with each other. This may make the model unstable. What should be done to reduce the impact of having such a large number of features?
A. Perform one-hot encoding on highly correlated features
B. Use matrix multiplication on highly correlated features
C. Create a new feature space using principal component analysis (PCA)
D. Apply the Pearson correlation coefficient
Show Answer
Correct Answer:
C. Create a new feature space using principal component analysis (PCA)
Aced these? Get the Full Exam
Download the complete MLS-C01 study bundle with 278+ questions in a single printable PDF.