Looking for more? Click here to get the full PDF with 134+ practice questions for $10 for offline study and deeper preparation.
Question 1
A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name. They have the following incomplete code block: ____(f"SELECT customer_id, spend FROM {table_name}") What can be used to fill in the blank to successfully complete the task?
A. spark.delta.sql
B. spark.sql
C. spark.table
D. dbutils.sql
Show Answer
Correct Answer:
B. spark.sql
Question 2
A data engineer has created a new database using the following command: CREATE DATABASE IF NOT EXISTS customer360; In which location will the customer360 database be located?
A. dbfs:/user/hive/database/customer360
B. dbfs:/user/hive/warehouse
C. dbfs:/user/hive/customer360
D. dbfs:/user/hive/database
Show Answer
Correct Answer:
B. dbfs:/user/hive/warehouse
Question 3
A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary. Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?
A. They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints
B. They can set up the dashboard’s SQL endpoint to be serverless
C. They can turn on the Auto Stop feature for the SQL endpoint
D. They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint
Show Answer
Correct Answer:
C. They can turn on the Auto Stop feature for the SQL endpoint
Question 4
Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?
A. Silver tables contain a less refined, less clean view of data than Bronze data
B. Silver tables contain aggregates while Bronze data is unaggregated
C. Silver tables contain more data than Bronze tables
D. Silver tables contain a more refined and cleaner view of data than Bronze tables
E. Silver tables contain less data than Bronze tables
Show Answer
Correct Answer:
D. Silver tables contain a more refined and cleaner view of data than Bronze tables
Question 5
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells. Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?
A. It is not possible to use SQL in a Python notebook
B. They can attach the cell to a SQL endpoint rather than a Databricks cluster
C. They can simply write SQL syntax in the cell
D. They can add %sql to the first line of the cell
E. They can change the default language of the notebook to SQL
Show Answer
Correct Answer:
D. They can add %sql to the first line of the cell
Question 6
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values. Why has Auto Loader inferred all of the columns to be of the string type?
A. Auto Loader cannot infer the schema of ingested data
B. JSON data is a text-based format
C. Auto Loader only works with string data
D. All of the fields had at least one null value
Show Answer
Correct Answer:
B. JSON data is a text-based format
Question 7
Which statement regarding the relationship between Silver tables and Bronze tables is always true?
A. Silver tables contain a less refined, less clean view of data than Bronze data
B. Silver tables contain aggregates while Bronze data is unaggregated
C. Silver tables contain more data than Bronze tables
D. Silver tables contain less data than Bronze tables
Show Answer
Correct Answer:
D. Silver tables contain less data than Bronze tables
Question 8
A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level. Which of the following tools can the data engineer use to solve this problem?
A. Unity Catalog
B. Data Explorer
C. Delta Lake
D. Delta Live Tables
E. Auto Loader
Show Answer
Correct Answer:
D. Delta Live Tables
Question 9
What describes the relationship between Gold tables and Silver tables?
A. Gold tables are more likely to contain aggregations than Silver tables
B. Gold tables are more likely to contain valuable data than Silver tables
C. Gold tables are more likely to contain a less refined view of data than Silver tables
D. Gold tables are more likely to contain truthful data than Silver tables
Show Answer
Correct Answer:
A. Gold tables are more likely to contain aggregations than Silver tables
Question 10
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE. The table is configured to run in Production mode using the Continuous Pipeline Mode. What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?
A. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing
B. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing
C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped
D. All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated
Show Answer
Correct Answer:
C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped
Question 11
A dataset has been defined using Delta Live Tables and includes an expectations clause: CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW What is the expected behavior when a batch of data containing data that violates these constraints is processed?
A. Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table
B. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log
D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log
E. Records that violate the expectation cause the job to fail
Show Answer
Correct Answer:
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log
Question 12
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance. Which of the following keywords can be used to compact the small files?
A. REDUCE
B. OPTIMIZE
C. COMPACTION
D. REPARTITION
E. VACUUM
Show Answer
Correct Answer:
B. OPTIMIZE
Question 13
How can Git operations must be performed outside of Databricks Repos?
A. Commit
B. Pull
C. Merge
D. Clone
Show Answer
Correct Answer:
C. Merge
Question 14
What can be used to simplify and unify siloed data architectures that are specialized for specific use cases?
A. Delta Lake
B. Data lake
C. Data warehouse
D. Data lakehouse
Show Answer
Correct Answer:
D. Data lakehouse
Question 15
A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data. Which of the following relational objects should the data engineer create?
A. Spark SQL Table
B. View
C. Delta Table
D. Temporary view
Show Answer
Correct Answer:
D. Temporary view
Question 16
Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?
A. The ability to manipulate the same data using a variety of languages
B. The ability to collaborate in real time on a single notebook
C. The ability to set up alerts for query failures
D. The ability to support batch and streaming workloads
E. The ability to distribute complex data operations
Show Answer
Correct Answer:
D. The ability to support batch and streaming workloads
Question 17
In which scenario will a data team want to utilize cluster pools?
A. An automated report needs to be version-controlled across multiple collaborators
B. An automated report needs to be runnable by all stakeholders
C. An automated report needs to be refreshed as quickly as possible
D. An automated report needs to be made reproducible
Show Answer
Correct Answer:
C. An automated report needs to be refreshed as quickly as possible
Question 18
A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables. Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?
A. CREATE TABLE all_transactions AS SELECT * FROM march_transactions INNER JOIN SELECT * FROM april_transactions;
B. CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions;
C. CREATE TABLE all_transactions AS SELECT * FROM march_transactions OUTER JOIN SELECT * FROM april_transactions;
D. CREATE TABLE all_transactions AS SELECT * FROM march_transactions INTERSECT SELECT * from april_transactions;
Show Answer
Correct Answer:
B. CREATE TABLE all_transactions AS SELECT * FROM march_transactions UNION SELECT * FROM april_transactions;
Question 19
Which of the following commands will return the number of null values in the member_id column?
A. SELECT count(member_id) FROM my_table;
B. SELECT count(member_id) - count_null(member_id) FROM my_table;
C. SELECT count_if(member_id IS NULL) FROM my_table;
D. SELECT null(member_id) FROM my_table;
Show Answer
Correct Answer:
C. SELECT count_if(member_id IS NULL) FROM my_table;
Question 20
Which tool is used by Auto Loader to process data incrementally?
A. Checkpointing
B. Spark Structured Streaming
C. Databricks SQL
D. Unity Catalog
Show Answer
Correct Answer:
B. Spark Structured Streaming
Aced these? Get the Full Exam
Download the complete Certified Data Engineer Associate study bundle with 134+ questions in a single printable PDF.