Microsoft Microsoft Certifications DP-203 Questions & Answers
Question 31:
You have an Azure Synapse Analytics dedicated SQL pool.
You plan to create a fact table named Table1 that will contain a clustered columnstore index.
You need to optimize data compression and query performance for Table1.
What is the minimum number of rows that Table1 should contain before you create partitions?
A. 100,000
B. 600,000
C. 1 million
D. 60 million
Correct Answer: A
Target size for rowgroups
For best query performance, the goal is to maximize the number of rows per rowgroup in a columnstore index. A rowgroup can have a maximum of 1,048,576 rows.
It's okay to not have the maximum number of rows per rowgroup. Columnstore indexes achieve good performance when rowgroups have at least 100,000 rows.
You have an azure subscription that contains the resources shown in the following table.
You need to read the files in storage1 by using ad-hoc queries and the OPENROWSET function. The solution must ensure that each rowset contains a single JSON record. To what should you set the FORMAT option of the OPENROWSET function?
A. JSON
B. DELTA
C. PARQUET
D. CSV
Correct Answer: A
Query JSON files using serverless SQL pool in Azure Synapse Analytics
The easiest way to see to the content of your JSON file is to provide the file URL to the OPENROWSET function, specify csv FORMAT, and set values 0x0b for fieldterminator and fieldquote. If you need to read line-delimited JSON files, then this is enough. If you have classic JSON file, you would need to set values 0x0b for rowterminator.
You have an Azure subscription that contains an Azure Synapse Analytics workspace named ws1 and an Azure Cosmos DB database account named Cosmos1. Cosmos1 contains a container named container1 and ws1 contains a
serverless SQL pool.
You need to ensure that you can query the data in container1 by using the serverless SQL pool.
Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Enable Azure Synapse Link for Cosmos1.
B. Disable the analytical store for container1.
C. In ws1, create a linked service that references Cosmos1.
D. Enable the analytical store for container1.
E. Disable indexing for container1.
Correct Answer: ACD
Query Azure Cosmos DB data with a serverless SQL pool in Azure Synapse Link
Prerequisites include:
(D, not B) Make sure that you have prepared Analytical store:
Enable analytical store on your Azure Cosmos DB containers.
(A) An Azure Cosmos DB database account that's Azure Synapse Link enabled. Etc.
Note: A serverless SQL pool allows you to analyze data in your Azure Cosmos DB containers that are enabled with Azure Synapse Link in near real time without affecting the performance of your transactional workloads. It offers a familiar TSQL syntax to query data from the analytical store and integrated connectivity to a wide range of business intelligence (BI) and ad-hoc querying tools via the T-SQL interface.
You currently publish all pipeline authoring changes directly to ADF1.
You need to implement version control for the changes made to pipeline artifacts. The solution must ensure that you can apply version control to the resources currently defined in the Azure Data Factory Studio for ADF1.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. From the Azure Data Factory Studio, run Publish All.
B. Create an Azure Data Factory trigger.
C. Create a Git repository.
D. Create a GitHub action.
E. From the Azure Data Factory Studio, select Set up code repository.
F. From the Azure Data Factory Studio, select Publish.
Correct Answer: CE
Source control in Azure Data Factory
C: By default, the Azure Data Factory user interface experience (UX) authors directly against the data factory service. This experience has the following limitations:
The Data Factory service doesn't include a repository for storing the JSON entities for your changes. The only way to save changes is via the Publish All button and all changes are published directly to the data factory service.
The Data Factory service isn't optimized for collaboration and version control.
The Azure Resource Manager template required to deploy Data Factory itself is not included.
E: To provide a better authoring experience, Azure Data Factory allows you to configure a Git repository with either Azure Repos or GitHub. Git is a version control system that allows for easier change tracking and collaboration. Connect to a Git repository There are four different ways to connect a Git repository to your data factory for both Azure Repos and GitHub. After you connect to a Git repository, you can view and manage your configuration in the management hub under Git configuration
in the Source control section Configuration method 1: Home page
1.
In the Azure Data Factory home page, select Set up code repository at the top.
You have an Azure data factory named ADF1 that contains a pipeline named Pipeline1.
Pipeline1 must execute every 30 minutes with a 15-minute offset.
You need to create a trigger for Pipeline1. The trigger must meet the following requirements:
1.
Backfill data from the beginning of the day to the current time.
2.
If Pipeline1 fails, ensure that the pipeline can re-execute within the same 30-minute period.
3.
Ensure that only one concurrent pipeline execution can occur.
4.
Minimize development and configuration effort. Which type of trigger should you create?
A. schedule
B. event-based
C. manual
D. tumbling window
Correct Answer: D
Tumbling window triggers are a type of trigger that fires at a periodic time interval from a specified start time, while retaining state. Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals. A tumbling window trigger has a one-to-one relationship with a pipeline and can only reference a singular pipeline. Tumbling window trigger is a more heavy weight alternative for schedule trigger offering a suite of features for complex scenarios.
The following table provides a comparison of the tumbling window trigger and schedule trigger:
*
Backfill scenarios Tumbling: Supported Scheduled: Not supported
*
Retry capability
Tumbling: Supported Failed pipeline runs have a default retry policy of 0, or a policy that's specified by the user in the trigger definition.
You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a pipeline that will execute a stored procedure in the dedicated SQL pool and use the returned result set as the input for a downstream activity. The solution must minimize development effort.
Which type of activity should you use in the pipeline?
A. U-SQL
B. Stored Procedure
C. Script
D. Notebook
Correct Answer: B
You can use the Stored Procedure Activity to invoke a stored procedure in one of the following data stores in your enterprise or on an Azure virtual machine (VM): Azure SQL Database
Azure Synapse Analytics SQL Server Database. Note: Create a Stored Procedure activity with UI
To use a Stored Procedure activity in a pipeline, complete the following steps:
Search for Stored Procedure in the pipeline Activities pane, and drag a Stored Procedure activity to the pipeline canvas.
Select the new Stored Procedure activity on the canvas if it is not already selected, and its Settings tab, to edit its details.
Select an existing or create a new linked service to an Azure SQL Database, Azure Synapse Analytics, or SQL Server.
Choose a stored procedure, and provide any parameters for its execution.
Incorrect:
* U-SQL
You can process data by running U-SQL scripts on Azure Data Lake Analytics with Azure Data Factory and Synapse Analytics.
You have an Azure data factory named ADF1 and an Azure Synapse Analytics workspace that contains a pipeline named SynPipeLine1. SynPipeLine1 includes a Notebook activity.
You create a pipeline in ADF1 named ADFPipeline1.
You need to invoke SynPipeLine1 from ADFPipeline1.
Which type of activity should you use?
A. Web
B. Spark
C. Custom
D. Notebook
Correct Answer: A
Call Synapse pipeline with a notebook activity. Use a Web activity to call Synapse pipeline.
You have an Azure Data Factory pipeline named pipeline1.
You need to execute pipeline1 at 2 AM every day. The solution must ensure that if the trigger for pipeline1 stops, the next pipeline execution will occur at 2 AM, following a restart of the trigger.
Which type of trigger should you create?
A. schedule
B. tumbling
C. storage event
D. custom event
Correct Answer: A
Azure Data Factory introduced the three types of Triggers that specify why the pipeline will be fired; The Schedule trigger that allows you specify the date and time when the pipeline will be executed, the Tumbling window trigger in which the pipeline will be executed on a periodic interval, with the ability to save the pipeline state, and the Event-based trigger that will execute the pipeline as a response to a blob related event.
You have an Azure Data Factory pipeline named pipeline1 that contains a data flow activity named activity1.
You need to run pipeline1.
Which runtime will be used to run activity1?
A. Azure Integration runtime
B. Self-hosted integration runtime
C. SSIS integration runtime
Correct Answer: A
Data Flow activity in Azure Data Factory and Azure Synapse Analytics Data Flow integration runtime Choose which Integration Runtime to use for your Data Flow activity execution. By default, the service will use the auto-resolve Azure Integration runtime with four worker cores.
You use Azure Data Factory to create data pipelines.
You are evaluating whether to integrate Data Factory and GitHub for source and version control.
What are two advantages of the integration? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. additional triggers
B. lower pipeline execution times
C. the ability to save without publishing
D. the ability to save pipelines that have validation issues
Correct Answer: CD
Advantages of Git integration Below is a list of some of the advantages git integration provides to the authoring experience:
Partial saves: When authoring against the data factory service, you can't save changes as a draft and all publishes must pass data factory validation. Whether your pipelines are not finished or you simply don't want to lose changes if your computer crashes, git integration allows for incremental changes of data factory resources regardless of what state they are in. Configuring a git repository allows you to save changes, letting you only publish when you have tested your changes to your satisfaction. Etc
Note: By default, the Azure Data Factory user interface experience (UX) authors directly against the data factory service. This experience has the following limitations:
(C) The Data Factory service doesn't include a repository for storing the JSON entities for your changes. The only way to save changes is via the Publish All button and all changes are published directly to the data factory service.
The Data Factory service isn't optimized for collaboration and version control.
The Azure Resource Manager template required to deploy Data Factory itself is not included.
To provide a better authoring experience, Azure Data Factory allows you to configure a Git repository with either Azure Repos or GitHub. Git is a version control system that allows for easier change tracking and collaboration.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Microsoft exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DP-203 exam preparations and Microsoft certification application, do not hesitate to visit our Vcedump.com to find your solutions here.