You have been assigned to do a study of the daily revenue effect of a pricing model of online transactions. All the data currently available to you has been loaded into your analytics database; revenue data, pricing data, and online transaction data. You find that all the data comes in different levels of granularity. The transaction data has timestamps (day, hour, minutes, seconds), pricing is stored at the daily level, and revenue data is only reported monthly. What is your next step?
A. Report back to the business owner that the current data model does not support the business question.
B. Interpolate a daily model for revenue from the monthly revenue data.
C. Aggregate all data to the monthly level in order to create a monthly revenue model.
D. Disregard revenue as a driver in the pricing model, and create a daily model based on pricing and transactions only.
Which data type value is used for the observed response variable in a logistic regression model?
A. Any positive real number
B. Any integer
C. A binary value
D. Any real number
In data visualization, what is used to focus the audience on a key part of a chart?
A. Emphasis colors
B. Detailed text
C. Pastel colors
D. A data table
You do a Student's t-test to compare the average test scores of sample groups from populations A and B. Group A averaged 10 points higher than group B. You find that this difference is significant, with a p-value of 0.03. What does that mean?
A. There is a 3% chance that you have identified a difference between the populations when in reality there is none.
B. The difference in scores between a sample from population A and a sample from population B will tend to be within 3% of 10 points.
C. There is a 3% chance that a sample group from population A will score 10 points higher that a sample group from population B.
D. There is a 97% chance that a sample group from population A will score 10 points higher that a sample group from population B.
A data scientist is given an R data frame, "empdata", with the columns Age, Salary, Occupation, Education, and Gender. The data scientist would like to examine only the Salary and Occupation columns for ages greater than 40. Which command extracts the appropriate rows and columns from the data frame?
A. empdata[empdata$Age > 40, c("Salary", "Occupation")]
B. empdata[c("Salary", "Occupation"), empdata$Age > 40]
C. empdata[Age > 40, ("Salary", "Occupation")]
D. empdata[, c("Salary", "Occupation")]$Age > 40
You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do?
A. Ensure that the TaskTracker is running.
B. Ensure that the JobTracker is running
C. Ensure that the NameNode is running
D. Ensure that a DataNode is running
Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
The minimum support is 25%. Which rule has a confidence equal to 50%?
A. {bread, milk} => {cheese}
B. {bread} => {milk}
C. {juice} => {soda}
D. {bread} => {cheese}
You are testing two new weight-gain formulas for puppies. The test gives the results: Control group: 1% weight gain Formula A. 3% weight gain
Formula B. 4% weight gain A one-way ANOVA returns a p-value = 0.027 What can you conclude?
A. Either Formula A or Formula B is effective at promoting weight gain.
B. Formula B is more effective at promoting weight gain than Formula A.
C. Formula A and Formula B are both effective at promoting weight gain.
D. Formula A and Formula B are about equally effective at promoting weight gain.
You have completed your model and are handing it off to be deployed in production. What should you deliver to the production team, along with your commented code?
A. The production team needs to understand how your model will interact with the processes they already support. Give them documentation on expected model inputs and outputs, and guidance on error-handling.
B. The production team are technical, and they need to understand how the processes that they support work, so give them the same presentation that you prepared for the analysts.
C. The production team supports the processes that run the organization, and they need context to understand how your model interacts with the processes they already support. Give them the same presentation that you prepared for the project sponsor.
D. The production team supports the processes that run the organization, and they need context to understand how your model interacts with the processes they already support. Give them the executive summary.
Which word or phrase completes the statement? Unix is to bash as Hadoop is to:
A. Pig
B. HDFS
C. Sqoop
D. NameNode
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only EMC exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your E20-007 exam preparations and EMC certification application, do not hesitate to visit our Vcedump.com to find your solutions here.