Designing and Implementing a Data Science Solution on Azure
Exam Details
Exam Code
:DP-100
Exam Name
:Designing and Implementing a Data Science Solution on Azure
Certification
:Microsoft Certifications
Vendor
:Microsoft
Total Questions
:564 Q&As
Last Updated
:Mar 29, 2025
Microsoft Microsoft Certifications DP-100 Questions & Answers
Question 411:
You plan to create a compute instance as part of an Azure Machine Learning development workspace.
You must interactively debug code running on the compute instance by using Visual Studio Code Remote.
You need to provision the compute instance.
What should you do?
A. Enable Remote Desktop Protocol (RDP) access.
B. Modify role-based access control (RBAC) settings at the workspace level.
C. Enable Secure Shell Protocol (SSH) access.
D. Modify role-based access control (RBAC) settings at the compute instance level.
Correct Answer: B
Question 412:
You have a dataset that contains salary information for users. You plan to generate an aggregate salary report that shows average salaries by city.
Privacy of individuals must be preserved without impacting accuracy, completeness, or reliability of the data. The aggregation must be statistically consistent with the distribution of the original data. You must return an approximation of the
data instead of the raw data.
You need to apply a differential privacy approach.
What should you do?
A. Add noise to the salary data during the analysis
B. Encrypt the salary data before analysis
C. Remove the salary data
D. Convert the salary data to the average column value
Correct Answer: D
Question 413:
You have an Azure Machine Learning (ML) model deployed to an online endpoint.
You need to review container logs from the endpoint by using Azure ML Python SDK v2. The logs must include the console log from the inference server, with print/log statements from the model's scoring script.
What should you do first?
A. Connect by using SSH to the inference server.
B. Create an instance of the MLCIient class.
C. Connect by using Docker tools to the inference server.
D. Create an instance of the OnlineDeploymentOperations class.
Correct Answer: B
Get container logs
To see log output from container, use the get_logs method as follows:
Python
ml_client.online_deployments.get_logs(
name="", endpoint_name="", lines=100
)
You can also get logs from the storage initializer container by adding container_type="storage-initializer" option.
Python
ml_client.online_deployments.get_logs(
name="", endpoint_name="", lines=100, container_type="storage-initializer" ) Note (B, not D): OnlineDeploymentOperations Class You should not instantiate this class directly. Instead, you should create an MLClient instance that instantiates it for you and attaches it as an attribute.
You need to select a feature extraction method. Which method should you use?
A. Mutual information
B. Mood's median test
C. Kendall correlation
D. Permutation Feature Importance
Correct Answer: C
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient (after the Greek letter ), is a statistic used to measure the ordinal association between two measured quantities. It is a supported method of the Azure Machine Learning Feature selection.
Scenario: When you train a Linear Regression module using a property dataset that shows data for property prices for a large city, you need to determine the best features to use in a model. You can choose standard metrics provided to measure performance before and after the feature importance process completes. You must ensure that the distribution of the features across multiple training models is consistent.
You need to select a feature extraction method. Which method should you use?
A. Mutual information
B. Pearson's correlation
C. Spearman correlation
D. Fisher Linear Discriminant Analysis
Correct Answer: C
Spearman's rank correlation coefficient assesses how well the relationship between two variables can be described using a monotonic function.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general correlation coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detail.
Incorrect Answers:
B: The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not).
You need to visually identify whether outliers exist in the Age column and quantify the outliers before the outliers are removed. Which three Azure Machine Learning Studio modules should you use? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
A. Create Scatterplot
B. Summarize Data
C. Clip Values
D. Replace Discrete Values
E. Build Counting Transform
Correct Answer: ABC
B: To have a global view, the summarize data module can be used. Add the module and connect it to the data set that needs to be visualized.
A: One way to quickly identify Outliers visually is to create scatter plots.
C: The easiest way to treat the outliers in Azure ML is to use the Clip Values module. It can identify and optionally replace data values that are above or below a specified threshold.
You can use the Clip Values module in Azure Machine Learning Studio, to identify and optionally replace data values that are above or below a specified threshold. This is useful when you want to remove outliers or replace them with a mean, a constant, or other substitute value.
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clip-values Question Set 3
Question 417:
You need to implement a scaling strategy for the local penalty detection data. Which normalization type should you use?
A. Streaming
B. Weight
C. Batch
D. Cosine
Correct Answer: C
Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of how to evaluate the population mean and variance of Batch Normalization which could be used in inference Original Paper. In CNTK, custom
networks are defined using the BrainScriptNetworkBuilder and described in the CNTK network description language "BrainScript."
Scenario:
Local penalty detection models must be written by using BrainScript.
You need to implement a feature engineering strategy for the crowd sentiment local models. What should you do?
A. Apply an analysis of variance (ANOVA).
B. Apply a Pearson correlation coefficient.
C. Apply a Spearman correlation coefficient.
D. Apply a linear discriminant analysis.
Correct Answer: D
The linear discriminant analysis method works only on continuous variables, not categorical or ordinal variables.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables.
Scenario:
Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines.
Experiments for local crowd sentiment models must combine local penalty detection data.
All shared features for local models are continuous variables.
Incorrect Answers:
B: The Pearson correlation coefficient, sometimes called Pearson's R test, is a statistical value that measures the linear relationship between two variables. By examining the coefficient values, you can infer something about the strength of the relationship between the two variables, and whether they are positively correlated or negatively correlated.
C: Spearman's correlation coefficient is designed for use with non-parametric and non-normally distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman's coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.
You need to implement a model development strategy to determine a user's tendency to respond to an ad. Which technique should you use?
A. Use a Relative Expression Split module to partition the data based on centroid distance.
B. Use a Relative Expression Split module to partition the data based on distance travelled to the event.
C. Use a Split Rows module to partition the data based on distance travelled to the event.
D. Use a Split Rows module to partition the data based on centroid distance.
Correct Answer: A
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is helpful when you need to divide a dataset into training and testing datasets using a numerical expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number could be a date/time field, a column containing age or dollar amounts, or even a percentage. For example, you might want to
divide your data set depending on the cost of the items, group people by age ranges, or separate data by a calendar date.
Scenario:
Local market segmentation models will be applied before determining a user's propensity to respond to an advertisement.
The distribution of features across training and production data are not consistent
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Microsoft exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DP-100 exam preparations and Microsoft certification application, do not hesitate to visit our Vcedump.com to find your solutions here.