What do lemmatization and stemming have in common?
A. Use WordNet
B. Remove common words in a natural language
C. Reduce the high dimensionality in text
D. Use a set of heuristics
You conduct a TFIDF analysis on 3 documents containing raw text and derive TFIDF ("data", document y) = 1.908. You know that the term "data" only appears in document 2.
What is the TF of "data" in document 2?
A. 2 based on the following reasoning: TFIDF = TF1DF = 1 908 You then know that IDF will equal LOG (32)=0.954 Therefore, TFIDF=TF*0.954 = 1.908 TF will then round to 2
B. 4 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal LOG (3/1 )=0.477 Therefore, TFIDF=TF'0 477 = 1.908 TF will then round to 4
C. 6 based on the following reasoning: TFIDF = TF1DF = 1.908 You then know that IDF will equal 3/1=3 Therefore, TFIDF=TF/3 = 1.908 TF will then round to 6
D. 11 based on the following reasoning: TFIDF = TF1DF = 1908 You then know that IDF will equal LOG(3/2)=0.176 Therefore, TFIDF=TF"0.176 = 1.908 TF will then round to 11
What is a typical use of a UDF in Pig?
A. Creating functionality outside of what is provided by the built-in functions
B. Providing Functional access to user-defined data in HDFS
C. Providing advanced analytics to Hadoop
D. Providing an interface from Pig to Microsoft Excel for easier data manipulation
You develop a Python script "logisticpy" to evaluate the logistic function denoted as f(y) for a given value y that includes the following Pig code:
Register 'logistic.py' using jython as udf;
z = FOREACH y GENERATE $0, udf.logistic ($0);
DUMP z;
What is the expected output when the Pig code is executed?
A. 0
B. Jython is not a supported language
C. Value of f(y) for ally
D. Tuples (y, f(y))
What is the maximum degree of a node in an undirected graph with 50 nodes'?
A. 49
B. 50
C. 1250
D. 2500
Which scenario is a proper use case for multinomial logistic regression?
A. A marketing firm wants to estimate the personal income of a group of potential customers. Using inputs such as age, education, marital status, and credit card expenditures, a data scientist is building a model that will estimate a person's income
B. A logistic distribution company wants to minimize the distance traveled by its delivery trucks. A data scientist is building a model to determine the optimal route for each of tis trucks
C. To improve the initial routing of a loan application, a financial institution plans to classify a loan application as Approve, Reject, or Possibly_Approve. Based on the company's historical loan application data, a data scientist is building a model to assign one of these three outcomes to each submitted application.
D. A manufacturer plans to determine the optimal number of workers to employ in an assembly line process. Utilizing the observed distributions of the task durations of each process step, a data scientist is building a model to mimic the interactions and dependencies between each stage in the manufacturing process.
In a connected, undirected graph of 5 nodes with 10 edges, how many more edges need to be added to make the clustering coefficient of every node equal 1 ?
A. 0
B. 5
C. 10
D. 15
What is the maximum number of edges in an undirected graph of 10 nodes?
A. 45
B. 90
C. 100
D. 9
What is an intended application of the MapReduce framework?
A. Processing can be broken into smaller pieces
B. Processing a large number of small files
C. Processing in real time is required
D. Processing a small subset of data
What is NOT a category of a NoSQL data store?
A. Columnar
B. Document
C. Key/Value
D. Flat File
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only EMC exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your E20-065 exam preparations and EMC certification application, do not hesitate to visit our Vcedump.com to find your solutions here.