Vcedump 100% Guareented DP-203 Questions and Answers. 100% Pass Guarantee. Latest Questions with Accurate Answers.

Exam Details

Exam Code
:DP-203
Exam Name
:Data Engineering on Microsoft Azure
Certification
:Microsoft Certifications
Vendor
:Microsoft
Total Questions
:398 Q&As
Last Updated
:Apr 07, 2025

Microsoft Microsoft Certifications DP-203 Questions & Answers

Question 221:

You are monitoring an Azure Stream Analytics job by using metrics in Azure.
You discover that during the last 12 hours, the average watermark delay is consistently greater than the configured late arrival tolerance.
What is a possible cause of this behavior?
A. Events whose application timestamp is earlier than their arrival time by more than five minutes arrive as inputs.
B. There are errors in the input data.
C. The late arrival policy causes events to be dropped.
D. The job lacks the resources to process the volume of incoming data.

Correct Answer: D
Watermark Delay indicates the delay of the streaming data processing job.
There are a number of resource constraints that can cause the streaming pipeline to slow down. The watermark delay metric can rise due to:
1.
Not enough processing resources in Stream Analytics to handle the volume of input events. To scale up resources, see Understand and adjust Streaming Units.
2.
Not enough throughput within the input event brokers, so they are throttled. For possible solutions, see Automatically scale up Azure Event Hubs throughput units.
3.
Output sinks are not provisioned with enough capacity, so they are throttled. The possible solutions vary widely based on the flavor of output service being used.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-time-handling
Question 222:

You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data to an Azure Blob Storage account.
You need to output the count of records received from the last five minutes every minute.
Which windowing function should you use?
A. Session
B. Tumbling
C. Sliding
D. Hopping

Correct Answer: D
Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
Question 223:

You have the following Azure Data Factory pipelines:
1.
Ingest Data from System1
2.
Ingest Data from System2
3.
Populate Dimensions
4.
Populate Facts
Ingest Data from System1 and Ingest Data from System2 have no dependencies. Populate Dimensions must execute after Ingest Data from System1 and Ingest Data from System2. Populate Facts must execute after Populate Dimensions pipeline. All the pipelines must execute every eight hours.
What should you do to schedule the pipelines for execution?
A. Add an event trigger to all four pipelines.
B. Add a schedule trigger to all four pipelines.
C. Create a patient pipeline that contains the four pipelines and use a schedule trigger.
D. Create a patient pipeline that contains the four pipelines and use an event trigger.

Correct Answer: C
Schedule trigger: A trigger that invokes a pipeline on a wall-clock schedule.
Reference: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
Question 224:

You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java. Which service should you recommend using to process the streaming data?
A. Azure Event Hubs
B. Azure Data Factory
C. Azure Stream Analytics
D. Azure Databricks

Correct Answer: D
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/stream-processing
Question 225:

You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns:
1.
TransactionType: 40 million rows per transaction type
2.
CustomerSegment: 4 million per customer segment
3.
TransactionMonth: 65 million rows per month
4.
AccountType: 500 million per account type
You have the following query requirements:
1.
Analysts will most commonly analyze transactions for a given month.
2.
Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type
You need to recommend a partition strategy for the table to minimize query times.
On which column should you recommend partitioning the table?
A. CustomerSegment
B. AccountType
C. TransactionType
D. TransactionMonth

Correct Answer: D
For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases.
Example: Any partitioning added to a table is in addition to the distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly partitions, and given that a dedicated SQL pool has 60 distributions, then the sales fact table should contain 60 million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the recommended minimum number of rows per partition, consider using fewer partitions in order to increase the number of rows per partition.
Question 226:

You are implementing a batch dataset in the Parquet format.
Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.
You need to minimize storage costs for the solution.
What should you do?
A. Use Snappy compression for files.
B. Use OPENROWSET to query the Parquet files.
C. Create an external table that contains a subset of columns from the Parquet files.
D. Store all data as string in the Parquet files.

Correct Answer: C
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. External tables are used to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
Question 227:

You are designing a data mart for the human resources (HR) department at your company. The data mart will contain employee information and employee transactions.
From a source system, you have a flat extract that has the following fields:
1.
EmployeeID
2.
FirstName
3.
LastName
4.
Recipient
5.
GrossAmount
6.
TransactionID
7.
GovernmentID
8.
NetAmountPaid
9.
TransactionDate
You need to design a star schema data model in an Azure Synapse Analytics dedicated SQL pool for the data mart.
Which two tables should you create? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. a dimension table for Transaction
B. a dimension table for EmployeeTransaction
C. a dimension table for Employee
D. a fact table for Employee
E. a fact table for Transaction

Correct Answer: CE
C: Dimension tables contain attribute data that might change but usually changes infrequently. For example, a customer's name and address are stored in a dimension table and updated only when the customer's profile changes. To minimize the size of a large fact table, the customer's name and address don't need to be in every row of a fact table. Instead, the fact table and the dimension table can share a customer ID. A query can join the two tables to associate a customer's profile and transactions.
E: Fact tables contain quantitative data that are commonly generated in a transactional system, and then loaded into the dedicated SQL pool. For example, a retail business generates sales transactions every day, and then loads the data into a dedicated SQL pool fact table for analysis.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
Question 228:

You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1.
You plan to create a database named DB1 in Pool1.
You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool.
Which format should you use for the tables in DB1?
A. CSV
B. ORC
C. JSON
D. Parquet

Correct Answer: D
Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.
For each Spark external table based on Parquet or CSV and located in Azure Storage, an external table is created in a serverless SQL pool database.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables
Question 229:

You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool.
Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The
reports will be produced once daily.
You need to implement a solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?
A. an ordered clustered columnstore index
B. a materialized view
C. result set caching
D. a replicated table

Correct Answer: B
Materialized views for dedicated SQL pools in Azure Synapse provide a low maintenance method for complex analytical queries to get fast performance without any query change. Incorrect Answers:
C: One daily execution does not make use of result cache caching.
Note: When result set caching is enabled, dedicated SQL pool automatically caches query results in the user database for repetitive use. This allows subsequent query executions to get results directly from the persisted cache so recomputation is not needed. Result set caching improves query performance and reduces compute resource usage. In addition, queries using cached results set do not use any concurrency slots and thus do not count against existing concurrency limits.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/performance-tuning-materialized-views https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/performance-tuning-result-set-caching
Question 230:

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.
You plan to keep a record of changes to the available fields.
The supplier data contains the following columns.
Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
A. surrogate primary key
B. effective start date
C. business key
D. last modified date
E. effective end date
F. foreign key

Correct Answer: ABE
Reference: https://docs.microsoft.com/en-us/sql/integration-services/data-flow/transformations/slowly-changing-dimension-transformation

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Microsoft exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DP-203 exam preparations and Microsoft certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Microsoft Microsoft Certifications DP-203 Questions & Answers

Question 221:

Question 222:

Question 223:

Question 224:

Question 225:

Question 226:

Question 227:

Question 228:

Question 229:

Question 230:

Related Exams:

62-193

70-243

70-355

77-420

77-427

77-725

77-726

77-727

77-728

77-731

Tips on How to Prepare for the Exams

Data Engineering on Microsoft Azure

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Microsoft Microsoft Certifications DP-203 Questions & Answers

Question 221:

Question 222:

Question 223:

Question 224:

Question 225:

Question 226:

Question 227:

Question 228:

Question 229:

Question 230:

Related Exams:

Tips on How to Prepare for the Exams