Looking for more? Click here to get the full PDF with 73+ practice questions for $10 for offline study and deeper preparation.
Question 1
An analyst is examining data from an array of temperature sensors and sees that one sensor consistently returns values that are much higher than the values from the other sensors. Which of the following terms best describes this type of error?
A. Synthetic
B. Systematic
C. Heteroskedastic
D. Idiosyncratic
Show Answer
Correct Answer:
B. Systematic
Question 2
A data scientist wants to predict a person's travel destination. The options are: Branson, Missouri, United States Mount Kilimanjaro, Tanzania - Disneyland Paris, Paris, France - Sydney Opera House, Sydney, Australia Which of the following models would best fit this use case?
A. Linear discriminant analysis
B. k-means modeling
C. Latent semantic analysis
D. Principal component analysis
Show Answer
Correct Answer:
A. Linear discriminant analysis
Question 3
Given these business requirements: Needs to most efficiently move 3,000 boxes across a river Has one boat that holds eight boxes, travels at ten nautical miles per hour, and has a fuel economy of six nautical miles per gallon Has another boat that holds two boxes, travels at 50 nautical miles per hour, and has a fuel economy of 18 nautical miles per gallon The river is one nautical mile wide The data scientist only has access to 125 gallons of fuel Which of the following is the most likely optimization technique a data scientist would apply?
A. Constrained
B. Unconstrained
C. Non-iterative
D. Iterative
Show Answer
Correct Answer:
A. Constrained
Question 4
A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?
A. n-grams
B. NER
C. TF-IDF
D. POS
Show Answer
Correct Answer:
A. n-grams
Question 5
Which of the following types of layers is used to downsample feature detection when using a convolutional neural network?
A. Pooling
B. Input
C. Output
D. Hidden
Show Answer
Correct Answer:
A. Pooling
Question 6
During EDA, a data scientist wants to look for patterns, such as linearity, in the data. Which of the following plots should the data scientist use?
A. Violin
B. Box-and-whisker
C. Scatter
D. Q-Q
Show Answer
Correct Answer:
C. Scatter
Question 7
A data scientist is merging two tables. Table 1 contains employee IDs and roles. Table 2 contains employee IDs and team assignments. Which of the following is the best technique to combine these data sets?
A. INNER JOIN between Table 1 and Table 2
B. LEFT JOIN on Table 1 with Table 2
C. RIGHT JOIN on Table 1 with Table 2
D. OUTER JOIN between Table 1 and Table 2
Show Answer
Correct Answer:
B. LEFT JOIN on Table 1 with Table 2
Question 8
The term "greedy algorithms" refers to machine-learning algorithms that:
A. update priors as more data is seen
B. examine every node of a tree before making a decision
C. apply a theoretical model to the distribution of the data
D. make the locally optimal decision
Show Answer
Correct Answer:
D. make the locally optimal decision
Question 9
A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?
A. Ingest the data from all of the hard drives and perform exploratory data analysis to identify which business is responsible for chemical operations
B. Perform analysis on all of the data and create a summary report on the results relevant to chemical operations
C. Consult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis
D. Ingest data from the hard drive containing the most data and present sample results on the chemical operations
Show Answer
Correct Answer:
C. Consult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis
Question 10
Which of the following describes the appropriate use case for PCA?
A. Dimensionality reduction
B. Classification
C. Regression
D. Recommendation
Show Answer
Correct Answer:
A. Dimensionality reduction
Question 11
Which of the following JOINS would generate the largest amount of data?
A. RIGHT JOIN
B. LEFT JOIN
C. CROSS JOIN
D. INNER JOIN
Show Answer
Correct Answer:
C. CROSS JOIN
Question 12
Which of the following problem-solving approaches is a set of guidelines to handle highly variable and not fully apparent situations?
A. Schedule
B. Plan
C. Heuristic
D. Algorithm
Show Answer
Correct Answer:
C. Heuristic
Question 13
A data scientist is clustering a data set but does not want to specify the number of clusters present. Which of the following algorithms should the data scientist use?
A. DBSCAN
B. k-nearest neighbors
C. k-means
D. Logistic regression
Show Answer
Correct Answer:
A. DBSCAN
Question 14
A data scientist needs to: Build a predictive model that gives the likelihood that a car will get a flat tire. Provide a data set of cars that had flat tires and cars that did not. All the cars in the data set had sensors taking weekly measurements of tire pressure similar to the sensors that will be installed in the cars consumers drive. Which of the following is the most immediate data concern?
A. Granularity misalignment
B. Multivariate outliers
C. Insufficient domain expertise
D. Lagged observations
Show Answer
Correct Answer:
A. Granularity misalignment
Question 15
A data scientist is standardizing a large data set that contains website addresses. A specific string inside some of the web addresses needs to be extracted. Which of the following is the best method for extracting the desired string from the text data?
A. Regular expressions
B. Named-entity recognition
C. Large language model
D. Find and replace
Show Answer
Correct Answer:
A. Regular expressions
Question 16
A computer vision model is trained to identify cats on a training set that is composed of both cat and dog images. The model predicts a picture of a cat is a dog. Which of the following describes this error?
A. Error due to reality
B. False positive error
C. Sampling error
D. Type II error
Show Answer
Correct Answer:
D. Type II error
Question 17
Which of the following modeling tools is appropriate for solving a scheduling problem?
A. One-armed bandit
B. Constrained optimization
C. Decision tree
D. Gradient descent
Show Answer
Correct Answer:
B. Constrained optimization
Question 18
Which of the following belong in a presentation to the senior management team and/or C-suite executives? (Choose two.)
A. Full literature reviews
B. Code snippets
C. Final recommendations
D. High-level results
E. Detailed explanations of statistical tests
F. Security keys and login information
Show Answer
Correct Answer:
C. Final recommendations
Question 19
A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?
A. Literature review
B. Model performance evaluation
C. Hyperparameter tuning
D. Model selection
Show Answer
Correct Answer:
A. Literature review
Question 20
Which of the following is the naive assumption in Bayes' rule?
A. Normal distribution
B. Independence
C. Uniform distribution
D. Homoskedasticity
Show Answer
Correct Answer:
B. Independence
Aced these? Get the Full Exam
Download the complete DY0-001 study bundle with 73+ questions in a single printable PDF.