先前文章我們已經示範了如何建置 Hbase & Phoenix 但是如何我要使用 Spark 要如何與 Phoenix 關聯呢? 一般來說會使用 JDBC 但今天我要示範除了JDBC的另外2個方式 此方式是 Phoenix 建議使用的! 在開始前你得先安裝 Spark,參考下面網址將測試資料建置在 Hbase https://blogs.apache.org/phoenix/entry/spark_integration_in_apache_phoenix to load data to phoenix hbase be sample data 如果上面網址的範例您可以成功完成,我們可以開始這次的教學了 if you successed to load data and put data to another table EMAIL_ENRON_PAGERANK you can get count(*) is 36692 ------------------------------------------------- Now I will present to you, spark call phoenix with Apache Spark plugin with 2 ways to get DataFrame different from jdbc connection. Please also reference https://phoenix.apache.org/phoenix_spark.html to get RDD. Now here we go. First start spark-shell without using driver-class-path [Remember] When we install phoenix, we have already add driver classpath to environment 請參考上一篇安裝教學 ------------------------------------------------------------------------------------------------------------------------ 範例一:使用 configuration 取得 DataFrame $spark-shell import org.apache.phoenix.spark._ import org.apache.hadoop.conf.Configuration val configuration = new Configuration() val df = sqlContext.phoenixTableAsDataFrame("EMAIL_ENRON",Seq("MAIL_FROM","MAIL_TO"),conf = configuration) df.show(5) 結果如下圖 -------------------------------------------------------------------------------------------------------------------------- 範例二:使用Data Source API 取得 DataFrame import org.apache.phoenix.spark._ val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "EMAIL_ENRON", "zkUrl" -> "192.168.13.61:2181") ) df.show(5) 結果如下圖 (#注意zkUrl是可以變化的,根據你設了幾台zk server for hbase) 結果與範例一相同呢! 這2種方法,如果程式開發時要一致,通常建議使用Configuration方式,這樣開發人員不用記憶Zookeeper的位置,且能充份發揮Zookeeper的特性,但若是開發時想使用第二種方式也可以,但是請將所有Zookeeper的位置都得加入,像我之前建立5個Zookeeper時,就得如入5個IP address(容錯用)
Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. Download From http://www.apache.org/dyn/closer.lua/phoenix/ Be carefully, your must download a match version contrast to HBase Because my hbase version is 1.1.2, so I download phoenix-4.6.0-HBase-1.1-bin.tar.gz 1. Unzip your phoenix 2.Move them to the place you want 3.Go to Phoenix folder to copy the phoenix-{Phoenix-version}-Hbase-{hbase-version}-server.jar & phoenix-core-{Phoenix-version}-Hbase-{hbase-version}.jar to $HBASE_HOME/lib You must do many times for your all hbase(region) servers. 4.Edit environment variables to add classpath for phoenix-client driver this step is for client to call hbase 5.After start your hbase and zookeeper, go to $PHOENIX_HOME/bin execute ./sqlline.py localhost (I execute the command on hbase-master, because I install phoenix on it.) 6.Execute 'ctrl + d' to exit phoenix, and now you can try other zookeeper to connect to get zookeeper feature 恭喜成功~*
Today, let us to talk about Hbase with Fully Distributed Mode. My environment as below download from Apache: JDK 1.8.0_65 Hbase 1.1.2 I have one hbase master and four hbase regionServers These server are must install hbase(including hbase master) and they are have the same configuration files. My hbase master is also my HDFS namenode server, I deployed them on the same machine. You also can deploy hbase matser different from hadoop namenode. Now Let's go. 1.Download Hbase Fromhttp://www.apache.org/dyn/closer.cgi/hbase/ Copy File to your hbase master machine & tar it & move the folder to the place you want
2.Go to your hbase conf file and edit hbase-site.xml, add content as below for your environment 3.Edit hbase-env.sh & add content as below, the hbase-manages-zk attribute means start-hbase incluing start zookeeper, or you must start zookeeper by another script. 4.Edit regionservers file, remove defailt config localhost , and add your hosts of hbase regionservers. 5.Compress your hbase folder and scp to other regionservers 6.Uncompress hbase.tar.gz for your all regionservers to the place you want. 7.Start Hbase and Zookeeper, to see Hbase master on http://hadoop-master:16010 You can see the zookeeper config as below. 恭喜~* 若您需要使用 Phoenix + Hbase的話 請繼續參考下一篇教學