2016年1月28日 星期四

Spark save(insert) to Phoenix sample code

先前學會怎麼用 Spark 取出在Phoenix的資料,現在我們來學學怎麼 save 資料(insert)至 Phoenix
,然後再學怎麼從 Table1 轉資料進入 Table2

1.首先至 $PHOENIX_HOME/bin 下 執行 sqlline.py localhost(or your zookeeper address)
使用下列 DDL Command Create table
CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
CREATE TABLE OUTPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);


2.使用 Spark-shell 準備 insert 資料嘍!
import org.apache.phoenix.spark._

val dataSet = List((4L, "4", 4), (5L, "5", 5), (6L, "6", 6))

sc.parallelize(dataSet).saveToPhoenix("INPUT_TABLE",Seq("ID","COL1","COL2"),zkUrl = Some("192.168.13.61:2181"))


 

3.回到 Phoneix 查詢結果吧!



Nice ,儲存成功!

------------------------------------------------------------------------------------------------------------------------
現在我們試著寫一段程式將 INPUT_TABLE 資料 存入  OUTPUT_TABLE 吧!

import org.apache.spark.sql._
import org.apache.phoenix.spark._

// Load INPUT_TABLE
val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> "INPUT_TABLE","zkUrl" -> "localhost"))

// Save to OUTPUT_TABLE
df.save("org.apache.phoenix.spark",SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE","zkUrl" -> "localhost"))

#注意,我這邊ZK設 localhost,是因為我 Spark-shell 此台Server同時也是 zookeeper 其中之一






至此程式執行完畢,緊接著至  Phoenix/bin下使用 sqlline.py 來觀察資料是否寫入成功



資料是否跟剛才 input_table 一樣呢! 如果一樣 那就成功嘍!

Spark call phoenix to get phoenix data with phoenix-spark plugin sample code 教學分享

先前文章我們已經示範了如何建置 Hbase & Phoenix
但是如何我要使用 Spark 要如何與 Phoenix 關聯呢?
一般來說會使用 JDBC 但今天我要示範除了JDBC的另外2個方式
此方式是 Phoenix 建議使用的!
在開始前你得先安裝 Spark,參考下面網址將測試資料建置在 Hbase
https://blogs.apache.org/phoenix/entry/spark_integration_in_apache_phoenix
to load data to phoenix hbase be sample data

如果上面網址的範例您可以成功完成,我們可以開始這次的教學了

if you successed to load data and put data to another table EMAIL_ENRON_PAGERANK
you can get count(*) is 36692
-------------------------------------------------
Now I will present to you, spark call phoenix with Apache Spark plugin
with 2 ways to get DataFrame different from jdbc connection.
Please also reference https://phoenix.apache.org/phoenix_spark.html to get RDD.
Now here we go.

First start spark-shell without using driver-class-path
[Remember] When we install phoenix, we have already add driver classpath to environment
請參考上一篇安裝教學
------------------------------------------------------------------------------------------------------------------------
範例一:使用 configuration 取得 DataFrame
$spark-shell

import org.apache.phoenix.spark._
import org.apache.hadoop.conf.Configuration

val configuration = new Configuration()

val df = sqlContext.phoenixTableAsDataFrame("EMAIL_ENRON",Seq("MAIL_FROM","MAIL_TO"),conf = configuration)

df.show(5)

結果如下圖




--------------------------------------------------------------------------------------------------------------------------
範例二:使用Data Source API 取得 DataFrame


import org.apache.phoenix.spark._

val df = sqlContext.load(
  "org.apache.phoenix.spark",
  Map("table" -> "EMAIL_ENRON", "zkUrl" -> "192.168.13.61:2181")
)

df.show(5)

結果如下圖
(#注意zkUrl是可以變化的,根據你設了幾台zk server for hbase)



結果與範例一相同呢! 

這2種方法,如果程式開發時要一致,通常建議使用Configuration方式,這樣開發人員不用記憶Zookeeper的位置,且能充份發揮Zookeeper的特性,但若是開發時想使用第二種方式也可以,但是請將所有Zookeeper的位置都得加入,像我之前建立5個Zookeeper時,就得如入5個IP address(容錯用) 


Phoenix for HBase install 教學

Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Download From http://www.apache.org/dyn/closer.lua/phoenix/
Be carefully, your must download a match version contrast to HBase
Because my hbase version is 1.1.2, so I download phoenix-4.6.0-HBase-1.1-bin.tar.gz

1. Unzip your phoenix


2.Move them to the place you want


3.Go to Phoenix folder to copy the phoenix-{Phoenix-version}-Hbase-{hbase-version}-server.jar
& phoenix-core-{Phoenix-version}-Hbase-{hbase-version}.jar to $HBASE_HOME/lib
You must do many times for your all hbase(region) servers.


4.Edit environment variables to add classpath for phoenix-client driver 
 this step is for client to call hbase


5.After start your hbase and zookeeper, go to $PHOENIX_HOME/bin execute ./sqlline.py localhost (I execute the command on hbase-master, because I install phoenix on it.)


6.Execute 'ctrl + d' to exit phoenix, and now you can try other zookeeper to connect to get zookeeper feature 


恭喜成功~*

2016年1月27日 星期三

HBase Fully Distributed Mode Install Step by Step

Today, let us to talk about Hbase with Fully Distributed Mode.
My environment as below download from Apache:
JDK 1.8.0_65
Hbase 1.1.2

I have one hbase master and four hbase regionServers

These server are must install hbase(including hbase master) and they are have the same configuration files.
My hbase master is also my HDFS namenode server, I deployed them on the same machine.
You also can deploy hbase matser different from hadoop namenode.

Now Let's go.

1.Download Hbase Fromhttp://www.apache.org/dyn/closer.cgi/hbase/ 
Copy File to your hbase master machine & tar it & move the folder to the place you want





2.Go to your hbase conf file and edit hbase-site.xml, add content as below for your environment


3.Edit hbase-env.sh & add content as below, the hbase-manages-zk attribute means start-hbase incluing start zookeeper, or you must start zookeeper by another script.



4.Edit regionservers file, remove defailt config localhost , and add your hosts of hbase regionservers.


5.Compress your hbase folder and scp to other regionservers 



6.Uncompress hbase.tar.gz for your all regionservers to the place you want.

7.Start Hbase and Zookeeper, to see Hbase master on http://hadoop-master:16010




You can see the zookeeper config as below.



恭喜~* 若您需要使用 Phoenix + Hbase的話 請繼續參考下一篇教學