oracle.testkings.1z0-449.v2019-03-15.by.marcus · you can connect to hive and manage objects using...

https://www.gratisexam.com/

1z0-449.44q

Number: 1z0-449Passing Score: 800Time Limit: 120 min

1z0-449


Oracle Big Data 2017 Implementation Essentials


Exam A

QUESTION 1You need to place the results of a PigLatin script into an HDFS output directory.

What is the correct syntax in Apache Pig?

A. update hdfs set D as ‘./output’;

B. store D into ‘./output’;

C. place D into ‘./output’;

D. write D as ‘./output’;

E. hdfsstore D into ‘./output’;

Correct Answer: BSection: (none)Explanation

Explanation/Reference:Use the STORE operator to run (execute) Pig Latin statements and save (persist) results to the file system. Use STORE for production scripts and batch modeprocessing.

Syntax: STORE alias INTO 'directory' [USING function];

Example: In this example data is stored using PigStorage and the asterisk character (*) as the field delimiter.

A = LOAD 'data' AS (a1:int,a2:int,a3:int);

DUMP A;(1,2,3)(4,2,1)(8,3,4)(4,3,3)(7,2,5)(8,4,3)

STORE A INTO 'myoutput' USING PigStorage ('*');

CAT myoutput;1*2*34*2*18*3*4


4*3*37*2*58*4*3

References: https://pig.apache.org/docs/r0.13.0/basic.html#store

QUESTION 2How is Oracle Loader for Hadoop (OLH) better than Apache Sqoop?


A. OLH performs a great deal of preprocessing of the data on Hadoop before loading it into the database.B. OLH performs a great deal of preprocessing of the data on the Oracle database before loading it into NoSQL.C. OLH does not use MapReduce to process any of the data, thereby increasing performance.D. OLH performs a great deal of preprocessing of the data on the Oracle database before loading it into Hadoop.E. OLH is fully supported on the Big Data Appliance. Apache Sqoop is not supported on the Big Data Appliance.

Correct Answer: ASection: (none)Explanation

Explanation/Reference:Oracle Loader for Hadoop provides an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database.Oracle Loader for Hadoop prepartitions the data if necessary and transforms it into a database-ready format. It optionally sorts records by primary key or user-defined columns before loading the data or creating output files.

Note: Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Incorrect Answers:A, D: Oracle Loader for Hadoop provides an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracledatabase.

C: Oracle Loader for Hadoop is a MapReduce application that is invoked as a command-line utility. It accepts the generic command-line options that are supportedby the org.apache.hadoop.util.Tool interface.


E: The Oracle Linux operating system and Cloudera's Distribution including Apache Hadoop (CDH) underlie all other software components installed on Oracle BigData Appliance. CDH includes Apache projects for MapReduce and HDFS, such as Hive, Pig, Oozie, ZooKeeper, HBase, Sqoop, and Spark.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/start.htm#BDCUG326https://docs.oracle.com/cd/E55905_01/doc.40/e55814/concepts.htm#BIGUG117

QUESTION 3Which three pieces of hardware are present on each node of the Big Data Appliance? (Choose three.)

A. high capacity SAS disksB. memoryC. redundant Power Delivery UnitsD. InfiniBand portsE. InfiniBand leaf switches

Correct Answer: ABDSection: (none)Explanation

Explanation/Reference:Big Data Appliance Hardware Specification and Details, example:Per Node:

2 x Eight-Core Intel ® Xeon ® E5-2260 Processors (2.2 GHz)64 GB Memory (expandable to 256GB)Disk Controller HBA with 512MB Battery backed write cache12 x 3TB 7,200 RPM High Capacity SAS Disks2 x QDR (Quad Data Rate InfiniBand)(40Gb/s) Ports4 x 10 Gb Ethernet Ports1 x ILOM Ethernet Port

References: http://www.oracle.com/technetwork/server-storage/engineered-systems/bigdata-appliance/overview/bigdataappliancev2-datasheet-1871638.pdf

QUESTION 4What two actions do the following commands perform in the Oracle R Advanced Analytics for Hadoop Connector? (Choose two.)

ore.connect (type=”HIVE”)ore.attach ()

A. Connect to Hive.


B. Attach the Hadoop libraries to R.C. Attach the current environment to the search path of R.D. Connect to NoSQL via Hive.

Correct Answer: ACSection: (none)Explanation

Explanation/Reference:You can connect to Hive and manage objects using R functions that have an ore prefix, such as ore.connect.

To attach the current environment into search path of R use:ore.attach()

References: https://docs.oracle.com/cd/E49465_01/doc.23/e49333/orch.htm#BDCUG400

QUESTION 5Your customer’s security team needs to understand how the Oracle Loader for Hadoop Connector writes data to the Oracle database.

Which service performs the actual writing?

A. OLH agentB. reduce tasksC. write tasksD. map tasksE. NameNode


Explanation/Reference:Oracle Loader for Hadoop has online and offline load options. In the online load option, the data is both preprocessed and loaded into the database as part of theOracle Loader for Hadoop job. Each reduce task makes a connection to Oracle Database, loading into the database in parallel. The database has to be availableduring the execution of Oracle Loader for Hadoop.

References: http://www.oracle.com/technetwork/bdc/hadoop-loader/connectors-hdfs-wp-1674035.pdf

QUESTION 6Your customer needs to manage configuration information on the Big Data Appliance.


Which service would you choose?

A. SparkPlugB. ApacheManagerC. ZookeeperD. Hive ServerE. JobMonitor

Correct Answer: CSection: (none)Explanation

Explanation/Reference:The ZooKeeper utility provides configuration and state management and distributed coordination services to Dgraph nodes of the Big Data Discovery cluster. Itensures high availability of the query processing by the Dgraph nodes in the cluster.

References: https://docs.oracle.com/cd/E57471_01/bigData.100/admin_bdd/src/cadm_cluster_zookeeper.html

QUESTION 7You are helping your customer troubleshoot the use of the Oracle Loader for Hadoop Connector in online mode. You have performed steps 1, 2, 4, and 5.

STEP 1: Connect to the Oracle database and create a target table.STEP 2: Log in to the Hadoop cluster (or client).STEP 3: Missing stepSTEP 4: Create a shell script to run the OLH job.STEP 5: Run the OLH job.

What step is missing between step 2 and step 4?

A. Diagnose the job failure and correct the error.B. Copy the table metadata to the Hadoop system.C. Create an XML configuration file.D. Query the table to check the data.E. Create an OLH metadata file.



Explanation/Reference:

QUESTION 8The hdfs_stream script is used by the Oracle SQL Connector for HDFS to perform a specific task to access data.

What is the purpose of this script?

A. It is the preprocessor script for the Impala table.B. It is the preprocessor script for the HDFS external table.C. It is the streaming script that creates a database directory.D. It is the preprocessor script for the Oracle partitioned table.E. It defines the jar file that points to the directory where Hive is installed.


Explanation/Reference:The hdfs_stream script is the preprocessor for the Oracle Database external table created by Oracle SQL Connector for HDFS.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/start.htm#BDCUG107

QUESTION 9How should you encrypt the Hadoop data that sits on disk?

A. Enable Transparent Data Encryption by using the Mammoth utility.B. Enable HDFS Transparent Encryption by using bdacli on a Kerberos-secured cluster.

C. Enable HDFS Transparent Encryption on a non-Kerberos secured cluster.D. Enable Audit Vault and Database Firewall for Hadoop by using the Mammoth utility.


Explanation/Reference:HDFS Transparent Encryption protects Hadoop data that’s at rest on disk. When the encryption is enabled for a cluster, data write and read operations onencrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it’s invisible to the application


working with the data.

The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.

Incorrect Answers:D: The cluster where you want to use HDFS Transparent Encryption must have Kerberos enabled.

References: https://docs.oracle.com/en/cloud/paas/big-data-cloud/csbdi/using-hdfs-transparent-encryption.html#GUID-16649C5A-2C88-4E75-809A-BBF8DE250EA3

QUESTION 10What two things does the Big Data SQL push down to the storage cell on the Big Data Appliance? (Choose two.)

A. Transparent Data Encrypted dataB. the column selection of data from individual Hadoop nodesC. WHERE clause evaluations

D. PL/SQL evaluationE. Business Intelligence queries from connected Exalytics servers

Correct Answer: ABSection: (none)Explanation


QUESTION 11You want to set up access control lists on your NameNode in your Big Data Appliance. However, when you try to do so, you get an error stating “the NameNodedisallows creation of ACLs.”

What is the cause of the error?

A. During the Big Data Appliance setup, Cloudera's ACLSecurity product was not installed.B. Access control lists are set up on the DataNode and HadoopNode, not the NameNode.C. During the Big Data Appliance setup, the Oracle Audit Vault product was not installed.D. dfs.namenode.acls.enabled must be set to true in the NameNode configuration.

Correct Answer: DSection: (none)


Explanation

Explanation/Reference:To use ACLs, first you’ll need to enable ACLs on the NameNode by adding the following configuration property to hdfs-site.xml and restarting the NameNode.

<property><name>dfs.namenode.acls.enabled</name><value>true</value></property>

References: https://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/

QUESTION 12Your customer has an older starter rack Big Data Appliance (BDA) that was purchased in 2013. The customer would like to know what the options are for growingthe storage footprint of its server.

Which two options are valid for expanding the customer’s BDA footprint? (Choose two.)

A. Elastically expand the footprint by adding additional high capacity nodes.B. Elastically expand the footprint by adding additional Big Data Oracle Database Servers.C. Elastically expand the footprint by adding additional Big Data Storage Servers.D. Racks manufactured before 2014 are no longer eligible for expansion.E. Upgrade to a full 18-node Big Data Appliance.

Correct Answer: DESection: (none)Explanation


QUESTION 13


What are three correct results of executing the preceding query? (Choose three.)

A. Values longer than 100 characters for the DESCRIPTION column are truncated.

B. ORDER_LINE_ITEM_COUNT in the HDFS file matches ITEM_CNT in the external table.

C. ITEM_CNT in the HDFS file matches ORDER_LINE_ITEM_COUNT in the external table.

D. Errors in the data for CUST_NUM or ORDER_NUM set the value to INVALID_NUM.

E. Errors in the data for CUST_NUM or ORDER_NUM set the value to 0000000000 .

F. Values longer than 100 characters for any column are truncated.

Correct Answer: ACDSection: (none)Explanation

Explanation/Reference:com.oracle.bigdata.overflow: Truncates string data. Values longer than 100 characters for the DESCRIPTION column are truncated.


com.oracle.bigdata.overflow: Truncates string data. Values longer than 100 characters for the DESCRIPTION column are truncated.

com.oracle.bigdata.erroropt: Replaces bad data. Errors in the data for CUST_NUM or ORDER_NUM set the value to INVALID_NUM.

References: https://docs.oracle.com/cd/E55905_01/doc.40/e55814/bigsql.htm#BIGUG76679

QUESTION 14What does the following line do in Apache Pig?

products = LOAD ‘/user/oracle/products’ AS (prod_id , item);

A. The products table is loaded by using data pump with prod_id and item .

B. The LOAD table is populated with prod_id and item .

C. The contents of /user/oracle/products are loaded as tuples and aliased to products.

D. The contents of /user/oracle/products are dumped to the screen.


Explanation/Reference:The LOAD function loads data from the file system.

Syntax: LOAD 'data' [USING function] [AS schema];Terms: 'data'The name of the file or directory, in single quote

References: https://pig.apache.org/docs/r0.11.1/basic.html#load

QUESTION 15What is the output of the following six commands when they are executed by using the Oracle XML Extensions for Hive in the Oracle XQuery for HadoopConnector?

1. $ echo "xxx" > src.txt2. $ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hi ve/init.sql3. hive> CREATE TABLE src (dummy STRING);4. hive> LOAD DATA LOCAL INPATH 'src.txt' OVERWRITE IN TO TABLE src;5. hive> SELECT * FROM src; OK xxx


6. hive> SELECT xml_query ("x/y", "<x><y>123</y><z>456 </z></x>") FROM src;

A. xyzB. 123C. 456D. xxxE. x/y


Explanation/Reference:Using the Hive Extensions

To enable the Oracle XQuery for Hadoop extensions, use the --auxpath and -i arguments when starting Hive:

$ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hi ve/init.sqlThe first time you use the extensions, verify that they are accessible. The following procedure creates a table named SRC, loads one row into it, and calls thexml_query function.

To verify that the extensions are accessible: 1. Log in to an Oracle Big Data Appliance server where you plan to work.2. Create a text file named src.txt that contains one line:$ echo "XXX" > src.txt3. Start the Hive command-line interface (CLI):$ hive --auxpath $OXH_HOME/hive/lib -i $OXH_HOME/hi ve/init.sqlThe init.sql file contains the CREATE TEMPORARY FUNCTION statements that declare the XML functions.4. Create a simple table:hive> CREATE TABLE src(dummy STRING);The SRC table is needed only to fulfill a SELECT syntax requirement. It is like the DUAL table in Oracle Database, which is referenced in SELECT statements totest SQL functions.5. Load data from src.txt into the table:hive> LOAD DATA LOCAL INPATH 'src.txt' OVERWRITE IN TO TABLE src;6. Query the table using Hive SELECT statements:hive> SELECT * FROM src;OKxxx7. Call an Oracle XQuery for Hadoop function for Hive. This example calls the xml_query function to parse an XML string:hive> SELECT xml_query("x/y", "<x><y>123</y><z>456< /z></x>") FROM src; .


. .["123"]If the extensions are accessible, then the query returns ["123"] , as shown in the example

References: https://docs.oracle.com/cd/E53356_01/doc.30/e53067/oxh_hive.htm#BDCUG693

QUESTION 16The NoSQL KVStore experiences a node failure. One of the replicas is promoted to primary.

How will the NoSQL client that accesses the store know that there has been a change in the architecture?

A. The KVLite utility updates the NoSQL client with the status of the master and replica.B. KVStoreConfig sends the status of the master and replica to the NoSQL client.

C. The NoSQL admin agent updates the NoSQL client with the status of the master and replica.D. The Shard State Table (SST) contains information about each shard and the master and replica status for the shard.

Correct Answer: DSection: (none)Explanation

Explanation/Reference:Given a shard, the Client Driver next consults the Shard State Table (SST). For each shard, the SST contains information about each replication node comprisingthe group (step 5). Based upon information in the SST, such as the identity of the master and the load on the various nodes in a shard, the Client Driver selects thenode to which to send the request and forwards the request to the appropriate node. In this case, since we are issuing a write operation, the request must go to themaster node.

Note: If the machine hosting the master should fail in any way, then the master automatically fails over to one of the other nodes in the shard. That is, one of thereplica nodes is automatically promoted to master.

References: http://www.oracle.com/technetwork/products/nosqldb/learnmore/nosql-wp-1436762.pdf

QUESTION 17Your customer is experiencing significant degradation in the performance of Hive queries. The customer wants to continue using SQL as the main query languagefor the HDFS store.



Which option can the customer use to improve performance?

A. native MapReduce Java programsB. ImpalaC. HiveFastQLD. Apache Grunt


Explanation/Reference:Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase withoutrequiring data movement or transformation.

References: https://en.wikipedia.org/wiki/Cloudera_Impala

QUESTION 18Your customer keeps getting an error when writing a key/value pair to a NoSQL replica.

What is causing the error?

A. The master may be in read-only mode and as result, writes to replicas are not being allowed.B. The replica may be out of sync with the master and is not able to maintain consistency.C. The writes must be done to the master.D. The replica is in read-only mode.E. The data file for the replica is corrupt.


Explanation/Reference:Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copiesthose writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-onlyoperations.


Note: Oracle NoSQL Database provides multi-terabyte distributed key/value pair storage that offers scalable throughput and performance. That is, it servicesnetwork requests to store and retrieve data which is organized into key-value pairs.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html

QUESTION 19The log data for your customer's Apache web server has seven string columns.

What is the correct command to load the log data from the file 'sample.log' into a new Hive table LOGS that does not currently exist?

A. hive> CREATE TABLE logs (t1 string, t2 string, t3 s tring, t4 string, t5 string, t6 string, t7 string) ROW FORMATDELIMITED FIELDS TERMINATED BY ' ';

B. hive> create table logs as select * from sample.log ;

C. hive> CREATE TABLE logs (t1 string, t2 string, t3 s tring, t4 string, t5 string, t6 string, t7 string) ROW FORMATDELIMITED FIELDS TERMINATED BY ' '; hive> LOAD DATA LOCAL INPATH 'sample.log' OVER WRITE INTO TABLE logs;

D. hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO TABLE logs; hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 str ing) ROW FORMATDELIMITED FIELDS TERMINATED BY ' ';

E. hive> create table logs as load sample.1og from had oop;


Explanation/Reference:The CREATE TABLE command creates a table with the given name.Load files into existing tables with the LOAD DATA command.

References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromquerieshttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

QUESTION 20Your customer’s Oracle NoSQL store has a replication factor of 3. One of the customer’s replica nodes goes down.

What will be the long-term performance impact on the customer’s NoSQL database if the node is replaced?

A. There will be no performance impact.B. The database read performance will be impacted.


C. The database read and write performance will be impacted.D. The database will be unavailable for reading or writing.E. The database write performance will be impacted.


Explanation/Reference:The number of nodes belonging to a shard is called its Replication Factor. The larger a shard's Replication Factor, the faster its read throughput (because there aremore machines to service the read requests) but the slower its write performance (because there are more machines to which writes must be copied).

Note: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and whichcopies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html#replicationfactor

QUESTION 21Your customer is using the IKM SQL to HDFS File (Sqoop) module to move data from Oracle to HDFS. However, the customer is experiencing performance issues.

What change should you make to the default configuration to improve performance?

A. Change the ODI configuration to high performance mode.B. Increase the number of Sqoop mappers.C. Add additional tables.D. Change the HDFS server I/O settings to duplex mode.


Explanation/Reference:Controlling the amount of parallelism that Sqoop will use to transfer data is the main way to control the load on your database. Using more mappers will lead to ahigher number of concurrent data transfer tasks, which can result in faster job completion. However, it will also increase the load on the database as Sqoop willexecute more concurrent queries.

References: https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html

QUESTION 22


What is the result when a flume event occurs for the following single node configuration?

A. The event is written to memory.B. The event is logged to the screen.C. The event output is not defined in this section.D. The event is sent out on port 44444.E. The event is written to the netcat process.



Explanation/Reference:This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sinkthat logs event data to the console.

Note: A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. Thedestination of the sink might be another agent or the central stores.

A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events.

Incorrect Answers:D: port 4444 is part of the source, not the sink.

References: https://flume.apache.org/FlumeUserGuide.html

QUESTION 23What kind of workload is MapReduce designed to handle?

A. batch processingB. interactiveC. computationalD. real timeE. commodity


Explanation/Reference:Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept ofMapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes andthus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It willtake time as per Configuration of system,namenode,task-tracker,job-tracker etc.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

QUESTION 24Your customer uses LDAP for centralized user/group management.

How will you integrate permissions management for the customer’s Big Data Appliance into the existing architecture?


A. Make Oracle Identity Management for Big Data the single source of truth and point LDAP to its keystore for user lookup.B. Enable Oracle Identity Management for Big Data and point its keystore to the LDAP directory for user lookup.C. Make Kerberos the single source of truth and have LDAP use the Key Distribution Center for user lookup.D. Enable Kerberos and have the Key Distribution Center use the LDAP directory for user lookup.


Explanation/Reference:Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository.The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which willthen link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket.

References: https://www.rittmanmead.com/blog/2015/04/setting-up-security-and-access-control-on-a-big-data-appliance/

QUESTION 25Your customer collects diagnostic data from its storage systems that are deployed at customer sites. The customer needs to capture and process this data bycountry in batches.

Why should the customer choose Hadoop to process this data?

A. Hadoop processes data on large clusters (10-50 max) on commodity hardware.B. Hadoop is a batch data processing architecture.C. Hadoop supports centralized computing of large data sets on large clusters.D. Node failures can be dealt with by configuring failover with clusterware.E. Hadoop processes data serially.


Explanation/Reference:Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept ofMapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes andthus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It willtake time as per Configuration of system,namenode,task-tracker,job-tracker etc.


Incorrect Answers:A: Yahoo! has by far the most number of nodes in its massive Hadoop clusters at over 42,000 nodes as of July 2011.

C: Hadoop supports distributed computing of large data sets on large clusters

E: Hadoop processes data in parallel.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

QUESTION 26Your customer wants to architect a system that helps to make real-time recommendations to users based on their past search history.

Which solution should the customer use?

A. Oracle Container DatabaseB. Oracle ExadataC. Oracle NoSQLD. Oracle Data Integrator


Explanation/Reference:Oracle Data Integration (both Oracle GoldenGate and Oracle Data Integrator) help to integrate data end-to-end between big data (NoSQL,Hadoop-based)environments and SQL-based environments. These data integration technologies are the key ingredient to Oracle’s Big Data Connectors. Oracle Big DataConnectors provide integration to from Oracle Big Data Appliance to relational Oracle Databases where in-Database analytics can be performed.

Oracle’s data integration solutions speed the loads of the Connecting Visibility to Value Oracle Exadata Database Machine by 500% while providing continuousaccess to business critical information across heterogeneous sources.

References: http://www.oracle.com/us/solutions/fastdata/fast-data-gets-real-time-wp-1927038.pdf

QUESTION 27How should you control the Sqoop parallel imports if the data does not have a primary key?

A. by specifying no primary key with the --no-primary argument

B. by specifying the number of maps by using the –m option

C. by indicating the split size by using the --direct-split-size option

D. by choosing a different column that contains unique data with the --split-by argument



Explanation/Reference:If the actual values for the primary key are not uniformly distributed across its range, then this can result in unbalanced tasks. You should explicitly choose adifferent column with the --split-by argument. For example, --split-by employee_id .

Note: When performing parallel imports, Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default,Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrievedfrom the database, and the map tasks operate on evenly-sized components of the total range.

References: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hbase

QUESTION 28Your customer uses Active Directory to manage user accounts. You are setting up Hadoop Security for the customer’s Big Data Appliance.

How will you integrate Hadoop and Active Directory?

A. Set up Kerberos’ Key Distribution Center to be the Active Directory keystore.B. Configure Active Directory to use Kerberos’ Key Distribution Center.C. Set up a one-way cross-realm trust from the Kerberos realm to the Active Directory realm.D. Set up a one-way cross-realm trust from the Active Directory realm to the Kerberos realm.


Explanation/Reference:If direct integration with AD is not currently possible, use the following instructions to configure a local MIT KDC to trust your AD server:1. Run an MIT Kerberos KDC and realm local to the cluster and create all service principals in this realm.2. Set up one-way cross-realm trust from this realm to the Active Directory realm. Using this method, there is no need to create service principals in Active

Directory, but Active Directory principals (users) can be authenticated to Hadoop.

Incorrect Answers:B: The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which willthen link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket.

References: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_hadoop_security_active_directory_integrate.html#topic_15_1


QUESTION 29What is the main purpose of the Oracle Loader for Hadoop (OLH) Connector?

A. runs transformations expressed in XQuery by translating them into a series of MapReduce jobs that are executed in parallel on a Hadoop clusterB. pre-partitions, sorts, and transforms data into an Oracle ready format on Hadoop and loads it into the Oracle databaseC. accesses and analyzes data in place on HDFS by using external tablesD. performs scalable joins between Hadoop and Oracle Database dataE. provides a SQL-like interface to data that is stored in HDFSF. is the single SQL point-of-entry to access all data


Explanation/Reference:Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. Itprepartitions the data if necessary and transforms it into a database-ready format.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/olh.htm#BDCUG140

QUESTION 30Your customer has three XML files in HDFS with the following contents. Each XML file contains comments made by users on a specific day. Each comment canhave zero or more “likes” from other users. The customer wants you to query this data and load it into the Oracle Database on Exadata.

How should you parse this data?


A. by creating a table in Hive and using MapReduce to parse the XML data by columnB. by configuring the Oracle SQL Connector for HDFS and parsing by using SerDeC. by using the XML file module in the Oracle XQuery for Hadoop ConnectorD. by using the built-in functions for reading JSON in the Oracle XQuery for Hadoop Connector




Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in Apache Hadoop in these formats:Data Pump files in HDFSDelimited text files in HDFSDelimited text files in Apache Hive tables

SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting theresults of serialization as individual fields for processing.A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

References:https://docs.oracle.com/cd/E53356_01/doc.30/e53067/osch.htm#BDCUG126 https://cwiki.apache.org/confluence/display/Hive/SerDe

QUESTION 31Identify two ways to create an external table to access Hive data on the Big Data Appliance by using Big Data SQL. (Choose two.)

A. Use Cloudera Manager's Big Data SQL Query builder.B. You can use the dbms_hadoop.create_extdd1_for_hive package to return the text of the CREATE TABLE command.

C. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_BDSQL access parameter.

D. Use a CREATE table statement with ORGANIZATION EXTERNAL and the ORACLE_HIVE access parameter.

E. Use the Enterprise Manager Big Data SQL Configuration page to create the table.

Correct Answer: BDSection: (none)Explanation

Explanation/Reference:CREATE_EXTDDL_FOR_HIVE returns a SQL CREATE TABLE ORGANIZATION EXTERNAL statement for a Hive table. It uses the ORACLE_HIVE accessdriver.

References: https://docs.oracle.com/cd/E55905_01/doc.40/e55814/bigsqlref.htm#BIGUG76630

QUESTION 32What are two of the main steps for setting up Oracle XQuery for Hadoop? (Choose two.)

A. unpacking the contents of oxh-version.zip into the installation directory

B. installing the Oracle SQL Connector for HadoopC. configuring an Oracle walletD. installing the Oracle Loader for Hadoop


Correct Answer: ADSection: (none)Explanation

Explanation/Reference:To install Oracle XQuery for Hadoop: 1. Unpack the contents of oxh-version.zip into the installation directory2. To support data loads into Oracle Database, install Oracle Loader for Hadoop

References: https://docs.oracle.com/cd/E49465_01/doc.23/e49333/start.htm#BDCUG509

QUESTION 33Identify two features of the Hadoop Distributed File System (HDFS). (Choose two.)

A. It is written to store large amounts of data.B. The file system is written in C#.C. It consists of Mappers, Reducers, and Combiners.D. The file system is written in Java.

Correct Answer: ADSection: (none)Explanation

Explanation/Reference:HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS hasbecome a key tool for managing pools of big data and supporting big data analytics applications.

The Hadoop framework, which HDFS is a part of, is itself is mostly written in the Java programming language, with some native code in C and command lineutilities written as shell scripts.

References: https://en.wikipedia.org/wiki/Apache_Hadoop

QUESTION 34What does the flume sink do in a flume configuration?


A. sinks the log file that is transmitted into HadoopB. hosts the components through which events flow from an external source to the next destination C. forwards events to the sourceD. consumes events delivered to it by an external source such as a web serverE. removes events from the channel and puts them into an external repository


Explanation/Reference:A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognizedby the target Flume source. When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the eventuntil it’s consumed by a Flume sink.

References: https://flume.apache.org/FlumeUserGuide.html

QUESTION 35Your customer is spending a lot of money on archiving data to comply with government regulations to retain data for 10 years.

How should you reduce your customer’s archival costs?

A. Denormalize the data.B. Offload the data into Hadoop.C. Use Oracle Data Integrator to improve performance.D. Move the data into the warehousing database.



Explanation/Reference:Extend Information Lifecycle Management to HadoopFor many years, Oracle Database has provided rich support for Information Lifecycle Management (ILM). Numerous capabilities are available for data tiering – orstoring data in different media based on access requirements and storage cost considerations.These tiers may scale from 1) in-memory for real time data analysis, 2) Database Flash for frequently accessed data, 3) Database Storage and Exadata Cells forqueries of operational data and 4) Hadoop for infrequently accessed raw and archive data:

References: http://www.oracle.com/technetwork/database/bigdata-appliance/overview/bigdatasql-datasheet-2934203.pdf

QUESTION 36What access driver does the Oracle SQL Connector for HDFS use when reading HDFS data by using external tables?

A. ORACLE_DATA_PUMP

B. ORACLE_LOADER

C. ORACLE_HDP

D. ORACLE_BDSQL

E. HADOOP_LOADER

F. ORACLE_HIVE_LOADER

Correct Answer: BSection: (none)


Explanation

Explanation/Reference:Oracle SQL Connector for HDFS creates the external table definition for Data Pump files by using the metadata from the Data Pump file header. It uses theORACLE_LOADER access driver with the preprocessor access parameter. It also uses a special access parameter named EXTERNAL VARIABLE DATA, whichenables ORACLE_LOADER to read the Data Pump format files generated by Oracle Loader for Hadoop.

References: https://docs.oracle.com/cd/E37231_01/doc.20/e36961/sqlch.htm#BDCUG356

QUESTION 37You recently set up a customer’s Big Data Appliance. At the time, all users wanted access to all the Hadoop data. Now, the customer wants more control over thedata that is stored in Hadoop.

How should you accommodate this request?

A. Configure Audit Vault and Database Firewall protection policies for the Hadoop data.B. Update the MySQL metadata for Hadoop to define access control lists.C. Configure an /etc/sudoers file to restrict the Hadoop data.D. Configure Apache Sentry policies to protect the Hadoop data.


Explanation/Reference:Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies thefollowing three authorization requirements:

Secure Authorization: the ability to control access to data and/or privileges on data for authenticated users.Fine-Grained Authorization: the ability to give users access to a subset of the data (e.g. column) in a databaseRole-Based Authorization: the ability to create/apply template-based privileges based on functional roles.

Incorrect Answers:C: The file /etc/sudoers contains a list of users or user groups with permission to execute a subset of commands while having the privileges of the root user oranother specified user. The program may be configured to require a password.

References: https://blogs.oracle.com/datawarehousing/new-big-data-appliance-security-features

QUESTION 38You are working with a client who does not allow the storage of user or schema passwords in plain text.


How can you configure the Oracle Loader for Hadoop configuration file to meet the requirements of this client?

A. Store the password in an Access Control List and configure the ACL location in the configuration file.B. Encrypt the password in the configuration file by using Transparent Data Encryption.C. Configure the configuration file to prompt for the password during remote job executions.D. Store the information in an Oracle wallet and configure the wallet location in the configuration file.


Explanation/Reference:In online database mode, Oracle Loader for Hadoop can connect to the target database using the credentials provided in the job configuration file or in an Oraclewallet. Oracle Wallet Manager is an application that wallet owners use to manage and edit the security credentials in their Oracle wallets. A wallet is a password-protectedcontainer used to store authentication and signing credentials, including private keys, certificates, and trusted certificates needed by SSL.

Note: Oracle Wallet Manager provides the following features:Wallet Password ManagementStrong Wallet EncryptionMicrosoft Windows Registry Wallet StorageBackward CompatibilityPublic-Key Cryptography Standards (PKCS) SupportMultiple Certificate SupportLDAP Directory Support

References: https://docs.oracle.com/cd/B28359_01/network.111/b28530/asowalet.htm#BABHEDIE

QUESTION 39Your customer needs the data that is generated from social media such as Facebook and Twitter, and the customer’s website to be consumed and sent to anHDFS directory for analysis by the marketing team.

Identify the architecture that you should configure.

A. multiple flume agents with collectors that output to a logger that writes to the Oracle Loader for Hadoop agentB. multiple flume agents with sinks that write to a consolidated source with a sink to the customer's HDFS directoryC. a single flume agent that collects data from the customer's website, which is connected to both Facebook and Twitter, and writes via the collector to the

customer's HDFS directoryD. multiple HDFS agents that write to a consolidated HDFS directoryE. a single HDFS agent that collects data from the customer's website, which ls connected to both Facebook and Twitter, and writes via the Hive to the customer's


HDFS directory


Explanation/Reference:Apache Flume - Fetching Twitter Data. Flume in this case will be responsible for capturing the tweets from Twitter in very high velocity and volume, buffer them inmemory channel (maybe do some aggregation since we're getting JSONs) and eventually sink them into HDFS.

References: https://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm

QUESTION 40What are the two advantages of using Hive over MapReduce? (Choose two.)

A. Hive is much faster than MapReduce because it accesses data directly.B. Hive allows for sophisticated analytics on large data sets.C. Hive does not require MapReduce to run in order to analyze data.D. Hive is a free tool; Hadoop requires a license.E. Hive simplifies Hadoop for new users.

Correct Answer: BESection: (none)Explanation

Explanation/Reference:E: A comparison of the performance of the Hadoop/Pig implementation of MapReduce with Hadoop/Hive.

Both Hive and Pig are platforms optimized for analyzing large data sets and are built on top of Hadoop. Hive is a platform that provides a declarative SQLlikelanguage whereas Pig requires users to write a procedural language called PigLatin.Writing MapReduce jobs in Java can be difficult, Hive and Pig has been developed and works as platforms on top of Hadoop. Hive and Pig allows users easyaccess to data compared to implementing their own MapReduce in Hadoop.

Incorrect Answers: A: Hive and Pig has been developed and works as platforms on top of Hadoop.

C: Apache Hive provides an SQL-like query language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez andSpark jobs.

D: Apache Hadoop is an open-source software framework, licensed through Apache License, Version 2.0 (ALv2), which is a permissive free software license


written by the Apache Software Foundation (ASF).

References: https://www.kth.se/social/files/55802074f2765474cb6f543c/7.pdf

QUESTION 41During a meeting with your customer’s IT security team, you are asked the names of the main OS users and groups for the Big Data Appliance.

Which users are created automatically during the installation of the Oracle Big Data Appliance?

A. flume, hbase, and hdfsB. mapred, bda, and engsysC. hbase, cdh5, and oracleD. bda, cdh5, and oracle



QUESTION 42Which command should you use to view the contents of the HDFS directory, /user/oracle/logs ?

A. hadoop fs –cat /user/oracle/logs

B. hadoop fs –ls /user/oracle/logs

C. cd /user/oracle hadoop fs –ls logs

D. cd /user/oracle/logs hadoop fs –ls *

E. hadoop fs –listfiles /user/oracle/logs

F. hive> select * from /user/oracle/logs


Explanation/Reference:To list the contents of a directory named /user/training/hadoop in HDFS.


#hadoop fs -ls /user/training/hadoop

Incorrect Answers:A: hadoop fs –cat displays the content of a file.

References: http://princetonits.com/blog/technology/33-frequently-used-hdfs-shell-commands/

QUESTION 43Your customer receives data in JSON format.

Which option should you use to load this data into Hive tables?

A. PythonB. SqoopC. a custom Java programD. FlumeE. SerDe

Correct Answer: ESection: (none)Explanation

Explanation/Reference:SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting theresults of serialization as individual fields for processing.A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

The JsonSerDe for JSON files is available in Hive 0.12 and later.

References: https://cwiki.apache.org/confluence/display/Hive/SerDe

QUESTION 44Your customer needs to move data from Hive to the Oracle database but does have any connectors purchased.

What is another architectural choice that the customer can make?

A. Use Apache Sqoop.B. Use Apache Sentry.C. Use Apache Pig.


D. Export data from Hive by using export/import.


Explanation/Reference:Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL,Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.

Incorrect Answers:B: Apache Sentry is an authorization module for Hadoop that provides the granular, role-based authorization required to provide precise levels of access to the rightusers and applications.

C: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure forevaluating these programs.

References: https://www.tutorialspoint.com/sqoop/


oracle.testkings.1z0-449.v2019-03-15.by.marcus · you can connect to hive and manage objects using...

Documents