<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      Spark安裝

      本文搭建環境為:Mac + Parallel Desktop + CentOS7 + JDK7 + Hadoop2.6 + Scala2.10.4 + IDEA14.0.5


      ——————————————————————————————————————————————————

      一、CentOS安裝

      ■ 安裝完成后記得保存快照

      ■ 環境準備
        CentOS7下載:http://mirrors.163.com/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1511.iso

      ■ Mac Parallel Desktop安裝CentOS 7 - http://www.linuxidc.com/Linux/2016-08/133827.htm
      配置網卡(無需)
        [root@localhost ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
      保存后重啟網卡
        /etc/init.d/network stop
        /etc/init.d/network start

      安裝網絡工具包(無需)
        yum install net-tools
        yum install wget

      packagekit問題:yum安裝出現“/var/run/yum.pid 已被鎖定,強行解除鎖
        rm -f /var/run/yum.pid

      更改源為阿里云
        cd /etc/yum.repos.d/
        mv CentOS-Base.repo Centos-Base.repo.bak
        wget -O CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
        yum clean all
        yum makecache

      ■ CentOS 7 GNOME 圖形界面(無需)
        yum groupinstall "X Window System"
        yum groupinstall "GNOME Desktop"
        startx --> 進入圖形界面
        runlevel —> 運行級別查看

      ■ CentOS 7安裝后配置
        http://www.rzrgm.cn/pinnsvin/p/5889857.html
      ——————————————————————————————————————————————————

      二、JDK安裝

      CentOS卸載openjdk

        卸載CentOS7-x64自帶的OpenJDK并安裝Sun的JDK7 - http://www.rzrgm.cn/CuteNet/p/3947193.html

        rpm -qa | grep java

        以下命令需根據上一指令結果:

        rpm -e --nodeps python-javapackages-3.4.1-11.el7.noarch
        rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
        rpm -e --nodeps java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
        rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64
        rpm -e --nodeps tzdata-java-2015g-1.el7.noarchrpm -e --nodeps javapackages-tools-3.4.1-11.el7.noarchrpm -e --nodeps java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64


      CentOS安裝Oracle JDK1.7

        http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html,下載jdk-7u79-linux-x64.tar.gz

        mkdir /usr/local/java
        cp jdk-7u79-linux-x64.tar.gz /usr/local/java
        cd /usr/local/java
        tar xvf jdk-7u79-linux-x64.tar.gz
        rm jdk-7u79-linux-x64.tar.gz

      設置jdk環境變量

        vim /etc/profile

      打開之后在末尾添加
        export JAVA_HOME=/usr/local/java/jdk1.7.0_79
        export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre
        export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
        export PATH=$JAVA_HOME/bin:$PATH

      執行配置文件,令其立刻生效
        source /etc/profile

      驗證是否安裝成功
        java -version
      ——————————————————————————————————————————————————

      三、Hadoop安裝

      http://dblab.xmu.edu.cn/blog/install-hadoop-in-centos/

      su
      useradd -m hadoop -s /bin/bash
      passwd hadoop(hadoop)
      visudo
      hadoop ALL=(ALL) ALL

       

      rpm -qa | grep ssh

      cd ~/.ssh/
      ssh-keygen -t rsa 都按回車
      cat id_rsa.pub >> authorized_keys
      chmod 600 ./authorized_keys

       

      下載Hadoop:http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

      sudo tar -zxf ~/home/hadoop/桌面/hadoop-2.6.0.tar.gz -C /usr/local
      cd /usr/local/
      mv ./hadoop-2.6.0/ ./hadoop
      sudo chown -R hadoop:hadoop ./hadoop

      檢查是否可用
      cd /usr/local/hadoop
      ./bin/hadoop version
      Hadoop 2.6.0
      Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
      Compiled by jenkins on 2014-11-13T21:10Z
      Compiled with protoc 2.5.0
      From source with checksum 18e43357c8f927c0695f1e9522859d6a
      This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar

      --> Hadoop初步環境搭建完成

       

      Hadoop單機配置(非分布式)

      cd /usr/local/hadoop
      mkdir ./input
      cp ./etc/hadoop/*.xml ./input # 將配置文件作為輸入文件
      ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
      cat ./output/* # 查看運行結果

      rm -r ./output

      gedit ~/.bashrc

      export HADOOP_HOME=/usr/local/hadoop
      export HADOOP_INSTALL=$HADOOP_HOME
      export HADOOP_MAPRED_HOME=$HADOOP_HOME
      export HADOOP_COMMON_HOME=$HADOOP_HOME
      export HADOOP_HDFS_HOME=$HADOOP_HOME
      export YARN_HOME=$HADOOP_HOME
      export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
      export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

      export JAVA_HOME=/usr/local/java/jdk1.7.0_79
      export JRE_HOME=/usr/local/java/jdk1.7.0_79/jre
      export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
      export PATH=$JAVA_HOME/bin:$PATH


      source ~/.bashrc


      gedit ./etc/hadoop/core-site.xml
      <configuration>
      <property>
      <name>hadoop.tmp.dir</name>
      <value>file:/usr/local/hadoop/tmp</value>
      <description>Abase for other temporary directories.</description>
      </property>
      <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
      </property>
      </configuration>


      gedit ./etc/hadoop/hdfs-site.xml
      <configuration>
      <property>
      <name>dfs.replication</name>
      <value>1</value>
      </property>
      <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/hadoop/tmp/dfs/name</value>
      </property>
      <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/hadoop/tmp/dfs/data</value>
      </property>
      </configuration>

       

      ./bin/hdfs namenode –format

      ./sbin/start-dfs.sh
      顯示如下:
      [hadoop@localhost hadoop]$ jps
      27710 NameNode
      28315 SecondaryNameNode
      28683 Jps
      27973 DataNode

       

      問題

      WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        tar -x hadoop-native-64-2.6.0.tar -C /usr/local/hadoop/lib/native/
        cp /usr/local/hadoop/lib/native/* /usr/local/hadoop/lib/


      加入系統變量
      export HADOOP_COMMON_LIB_NATIVE_DIR=/home/administrator/work/hadoop-2.6.0/lib/native
      export HADOOP_OPTS="-Djava.library.path=/home/administrator/work/hadoop-2.6.0/lib"
      export HADOOP_ROOT_LOGGER=DEBUG,console

       

      主要是jre目錄下缺少了libhadoop.so和libsnappy.so兩個文件。具體是,spark-shell依賴的是scala,scala依賴的是JAVA_HOME下的jdk,libhadoop.so和libsnappy.so兩個文件應該放到$JAVA_HOME/jre/lib/amd64下面。
      這兩個so:libhadoop.so和libsnappy.so。前一個so可以在HADOOP_HOME下找到,如hadoop\lib\native。第二個libsnappy.so需要下載一個snappy-1.1.0.tar.gz,然后./configure,make編譯出來,編譯成功之后在.libs文件夾下。
      當這兩個文件準備好后再次啟動spark shell不會出現這個問題。
      鏈接:https://www.zhihu.com/question/23974067/answer/26267153

       

      問題:由于在root用戶下安裝Java,而Hadoop用戶缺少操作java目錄的權限

      cd /
      sudo chown -R hadoop:hadoop ./usr/local/java

       

      Hadoop開啟關閉調試信息

      開啟:export HADOOP_ROOT_LOGGER=DEBUG,console
      關閉:export HADOOP_ROOT_LOGGER=INFO,console

       

      Hadoop偽分布式實例

      ./bin/hdfs dfs -mkdir -p /user/hadoop

      ./bin/hdfs dfs -mkdir input
      ./bin/hdfs dfs -put ./etc/hadoop/*.xml input
      ./bin/hdfs dfs -ls input

      ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'

      ./bin/hdfs dfs -cat output/*

      rm -r ./output # 先刪除本地的 output 文件夾(如果存在)
      ./bin/hdfs dfs -get output ./output # 將 HDFS 上的 output 文件夾拷貝到本機
      cat ./output/*


      Hadoop 運行程序時,輸出目錄不能存在,否則會提示錯誤 “org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/hadoop/output already exists” ,因此若要再次執行,需要執行如下命令刪除 output 文件夾:
      ./bin/hdfs dfs -rm -r output # 刪除 output 文件夾

      關閉Hadoop
      ./sbin/stop-dfs.sh

      下次啟動 hadoop 時,無需進行 NameNode 的初始化,只需要運行
      ./sbin/start-dfs.sh 就可以!

      啟動YARN

      YARN 是從 MapReduce 中分離出來的,負責資源管理與任務調度。YARN 運行于 MapReduce 之上,提供了高可用性、高擴展性,YARN 的更多介紹在此不展開,有興趣的可查閱相關資料。
      上述通過 ./sbin/start-dfs.sh 啟動 Hadoop,僅僅是啟動了 MapReduce 環境,我們可以啟動 YARN ,讓 YARN 來負責資源管理與任務調度。

      ./sbin/start-dfs.sh

      mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
      gedit ./etc/hadoop/mapred-site.xml
      <configuration>
      <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      </property>
      </configuration>

      gedit ./etc/hadoop/yarn-site.xml
      <configuration>
      <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
      </property>
      </configuration>

      ./sbin/start-yarn.sh $ 啟動YARN
      ./sbin/mr-jobhistory-daemon.sh start historyserver # 開啟歷史服務器,才能在Web中查看任務運行情況


      [hadoop@localhost hadoop]$ jps
      11148 JobHistoryServer
      9788 NameNode
      10059 DataNode
      11702 Jps
      10428 SecondaryNameNode
      10991 NodeManager
      10874 ResourceManager


      http://localhost:8088/cluster


      關閉YARN

      ./sbin/stop-yarn.sh
      ./sbin/mr-jobhistory-daemon.sh stop historyserver

      ——————————————————————————————————————————————————

      四、Spark安裝

      《Spark快速入門指南 – Spark安裝與基礎使用》- http://dblab.xmu.edu.cn/blog/spark-quick-start-guide/

      下載

        spark-1.6.0-bin-hadoop2.6.tgz
        http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz

      解壓
        sudo tar -zxf ~/下載/spark-1.6.0-bin-hadoop2.6.tgz -C /usr/local/
        cd /usr/local
        sudo mv ./spark-1.6.0-bin-hadoop2.6/ ./spark
        sudo chown -R hadoop:hadoop ./spark # 此處的 hadoop 為你的用戶名

      安裝后,需要在 ./conf/spark-env.sh 中修改 Spark 的 Classpath,執行如下命令拷貝一個配置文件:
        cd /usr/local/spark
        cp ./conf/spark-env.sh.template ./conf/spark-env.sh

        gedit ./conf/spark-env.sh
        export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

      全局環境變量:
      sudo gedit /etc/profile
      source /etc/profile
      export JAVA_HOME=/usr/local/java
      export HADOOP_HOME=/usr/hadoop
      export SCALA_HOME=/usr/lib/scala-2.10.4
      export SPARK_HOME=/usr/local/spark


      配置Spark環境變量
      cd $SPARK_HOME/conf
      cp spark-env.sh.template spark-env.sh
      gedit spark-env.sh

       

      spark-env.sh配置

      export SCALA_HOME=/usr/lib/scala-2.10.4
      export HADOOP_HOME=/usr/hadoop
      export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
      export SPARK_HOME=/usr/local/spark
      export SPARK_PID_DIR=$SPARK_HOME/tmp
      export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

       

      export SPARK_MASTER_IP=127.0.0.1
      export SPARK_MASTER_PORT=7077
      export SPARK_MASTER_WEBUI_PORT=8099

      export SPARK_WORKER_CORES=1 //每個Worker使用的CPU核數
      export SPARK_WORKER_INSTANCES=1 //每個Slave中啟動幾個Worker實例
      export SPARK_WORKER_MEMORY=512m //每個Worker使用多大的內存
      export SPARK_WORKER_WEBUI_PORT=8081 //Worker的WebUI端口號
      export SPARK_EXECUTOR_CORES=1 //每個Executor使用使用的核數
      export SPARK_EXECUTOR_MEMORY=128m //每個Executor使用的內存

      export SPARK_CLASSPATH=$SPARK_HOME/conf/:$SPARK_HOME/lib/*:/usr/local/hadoop/lib/native:$SPARK_CLASSPATH

       

      運行Spark示例

      Spark 的安裝目錄(/usr/local/spark)為當前路徑

      cd /usr/local/spark
      ./bin/run-example SparkPi 2>&1 | grep "Pi is roughly"


      Python 版本的 SparkPi 則需要通過 spark-submit 運行:
      ./bin/spark-submit examples/src/main/python/pi.py 2>&1 | grep "Pi is roughly"

      Hadoop和YARN上運行示例

      cd /etc/local/hadoop

      ./sbin/start-dfs.sh
      ./sbin/start-yarn.sh

      運行示例
      cd /usr/local/spark
      bin/spark-submit --master yarn ./examples/src/main/python/wordcount.py file:///usr/local/spark/LICENSE

       

      (快照:運行成功Spark示例)

      通過 Spark Shell 進行交互分析

      ./bin/spark-shell

      val textFile = sc.textFile("file:///usr/local/spark/README.md")
      // textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27

      textFile.count() // RDD 中的 item 數量,對于文本文件,就是總行數
      // res0: Long = 95

      textFile.first() // RDD 中的第一個 item,對于文本文件,就是第一行內容
      // res1: String = # Apache Spark

      val linesWithSpark = textFile.filter(line => line.contains("Spark")) // 篩選出包含 Spark 的行

      linesWithSpark.count() // 統計行數
      // res4: Long = 17

      textFile.filter(line => line.contains("Spark")).count() // 統計包含 Spark 的行數
      // res4: Long = 17

      RDD的更多操作

      textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
      // res5: Int = 14

      import java.lang.Math
      textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
      // res6: Int = 14

      val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) // 實現單詞統計
      // wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:29
      wordCounts.collect() // 輸出單詞統計結果
      // res7: Array[(String, Int)] = Array((package,1), (For,2), (Programs,1), (processing.,1), (Because,1), (The,1)...)

      Spark SQL 和 DataFrames

       

      Spark Streaming

      方式一:

      wget http://downloads.sourceforge.net/project/netcat/netcat/0.6.1/netcat-0.6.1-1.i386.rpm -O ~/netcat-0.6.1-1.i386.rpm # 下載
      sudo rpm -iUv ~/netcat-0.6.1-1.i386.rpm # 安裝

      方式二:

      wget http://sourceforge.NET/projects/netcat/files/netcat/0.7.1/netcat-0.7.1-1.i386.rpm
      rpm -ihv netcat-0.7.1-1.i386.rpm
      yum list glibc*
      rpm -ihv netcat-0.7.1-1.i386.rpm

      # 記為終端 1
      nc -l -p 9999

      # 需要另外開啟一個終端,記為終端 2,然后運行如下命令
      /usr/local/spark/bin/run-example streaming.NetworkWordCount localhost 9999 2>/dev/null

      (快照:完成Spark Streaming實例)


      關閉 Spark 調試信息

      把spark/conf/log4j.properties下的
      log4j.rootCategory=【Warn】=> 【ERROR】
      log4j.logger.org.spark-project.jetty=【Warn】=> 【ERROR】


      ——————————————————————————————————————————————————

      五、Scala安裝

      安裝scala 2.10.4:下載scala,http://www.scala-lang.org/,下載scala-2.10.4.tgz,并復制到/usr/lib
      sudo tar -zxf scala-2.10.4.tgz -C /usr/lib

      采用全局設置方法,修改etc/profile,是所有用戶的共用的環境變量
      sudo gedit /etc/profile
      export SCALA_HOME=/usr/lib/scala-2.10.4
      export PATH=$SCALA_HOME/bin:$PATH

      source /etc/profile
      scala -version

      [hadoop@localhost 下載]$ scala -version
      Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
      ——————————————————————————————————————————————————

      六、CentOS中安裝IntelliJ IDEA

      參考:http://dongxicheng.org/framework-on-yarn/apache-spark-intellij-idea/

      Spark集成開發環境搭建

      ● 《linux系統下IntelliJ IDEA的安裝及使用》 - http://www.linuxdiyf.com/linux/19143.html
      不建議大家使用eclipse開發spark程序和閱讀源代碼,推薦使用Intellij IDEA

      ● 下載IDEA14.0.5:
      http://confluence.jetbrains.com/display/IntelliJIDEA/Previous+IntelliJ+IDEA+Releases
      http://download.jetbrains.8686c.com/idea/ideaIU-14.0.5.tar.gz

      https://download.jetbrains.8686c.com/idea/ideaIU-2016.2.5-no-jdk.tar.gz(只支持JDK1.8以上)

      Unsupported Java Version: Cannot start under Java 1.7.0_79-b15: Java 1.8 or later is required.


      解壓,進入到解壓后文件夾的bin目錄下執行
      tar -zxvf ideaIU-14.tar.gz -C /usr/intellijIDEA
      export IDEA_JDK=/usr/local/java/jdk1.7.0_79
      ./idea.sh

      key:IDEA
      value:61156-YRN2M-5MNCN-NZ8D2-7B4EW-U12L4

      安裝Scala插件

      http://www.linuxdiyf.com/linux/19143.html

      下載地址:http://plugins.jetbrains.com/files/1347/19005/scala-intellij-bin-1.4.zip

      安裝插件后,在啟動界面中選擇創建新項目,彈出的界面中將會出現"Scala"類型項目,如下圖,選擇scala-》scala

      點擊next,就如以下界面,project name自己隨便起的名字,把自己安裝的scala和jdk選中,注意,在選擇scala版本是一定不要選擇2.11.X版本,那樣后續會出大錯!完成后,點擊Finish

      然后再File下選擇project Structure,然后進入如下界面,進入后點擊Libraries,在右邊框后沒任何信息,然后點擊“+”號,進入你安裝spark時候解壓的spark-XXX-bin-hadoopXX下,在lib目錄下,選擇spark-assembly-XXX-hadoopXX.jar,結果如下圖所示,然后點擊Apply,最后點擊ok

       

      Spark開發環境配置及流程(Intellij IDEA)

      《Intellij安裝scala插件詳解》
      http://blog.csdn.net/a2011480169/article/details/52712421
      從上面顯示的信息是: Updatated: 2016/7/13
      于是我們到下面的網站去找匹配的插件: http://plugins.jetbrains.com/plugin/?idea&id=1347
      當我們下載完插件之后: 把下載的.zip格式的scala插件放到Intellij的安裝的plugins目錄下;
      再安裝剛剛放到Intellij的plugins目錄下的scala插件(注:直接安裝zip文件)即可。

      搭建Spark開發環境
      在intellij IDEA中創建scala project,并依次選擇“File”–> “project structure” –> “Libraries”,選擇“+”,將spark-hadoop 對應的包導入

      《Spark入門實戰系列--3.Spark編程模型(下)--IDEA搭建及實》
      http://www.rzrgm.cn/shishanyuan/p/4721120.html

      scala示例代碼

      package class3
      import org.apache.spark.SparkConf
      import org.apache.spark.SparkContext
      
      object WordConut {
      
        def main(args: Array[String]) {
      
          val conf = new SparkConf().setAppName("TrySparkStreaming").setMaster("local[2]")
          val sc = new SparkContext(conf)
      
          val txtFile = "/root/test"
          val txtData = sc.textFile(txtFile)
      
          txtData.cache()
          txtData.count()
      
          val wcData = txtData.flatMap { line => line.split(",") }.map { word => (word, 1) }.reduceByKey(_ + _)
      
          wcData.collect().foreach(println)
      
          sc.stop
      
        }
      
      }

       

      ——————————————————————————————————————————————————

      Spark使用HDFS數據處理

      [hadoop@localhost spark]$ hdfs dfs -put LICENSE /zhaohang
      hdfs dfs -ls
      hdfs dfs -cat /zhaohang | wc -l

      cd /usr/local/spark/bin
      ./pyspark --master yarn

      lines=sc.textFile("hdfs://localhost:9000/zhaohang",1)
      16/11/17 19:36:34 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 228.8 KB, free 228.8 KB)
      16/11/17 19:36:34 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.5 KB, free 248.3 KB)
      16/11/17 19:36:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.211.55.8:60185 (size: 19.5 KB, free: 511.5 MB)
      16/11/17 19:36:34 INFO spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2

      temp1 = lines.flatMap(lambda x:x.split(' '))

      temp1.collect()
      map = temp1.map(lambda x: (x,1))
      map.collect()


      rdd = sc.parallelize([1,2,3,4],2)
      def f(iterator): yield sum(iterator)
      rdd.mapPartitions(f).collect() //[3,7]

       

      rdd = sc.parallelize(["a","b","c"])

      test = rdd.flatMap(lambda x:(x,1))
      test.count()

      sorted(test.collect()) //[1, 1, 1, 'a', 'b', 'c']


      Spark界面:http://localhost:8088/proxy/application_1479381551764_0002/jobs/

      關閉YARN及HDFS

      cd /usr/local/hadoop
      ./sbin/stop-dfs.sh
      ./sbin/stop-yarn.sh

      ——————————————————————————————————————————————————

      Spark SQL示例

      開啟Hadoop和YARN
      cd /usr/local/hadoop
      ./sbin/start-dfs.sh
      ./sbin/start-yarn.sh
       
      查看JSON示例數據
      cd /usr/local/spark
      cat ./examples/src/main/resources/people.json
       
      啟動Spark命令行
      cd /usr/local/spark
      ./bin/spark-shell
       
      執行如下命令導入數據源
      val df = sqlContext.read.json("file:///usr/local/spark/examples/src/main/resources/people.json")
      df.show()
       
      DataFrames 處理結構化數據的一些基本操作
      df.select("name").show() // 只顯示 "name" 列
      df.select(df("name"), df("age") + 1).show() // 將 "age" 加 1
      df.filter(df("age") > 21).show() # 條件語句
      df.groupBy("age").count().show() // groupBy 操作
       
      使用 SQL 語句來進行操作
      df.registerTempTable("people") // 將 DataFrame 注冊為臨時表 people
      val result = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19") // 執行 SQL 查詢
      result.show() // 輸出結果
       
      關閉Hadoop和YARN
      cd /usr/local/hadoop
      ./sbin/stop-dfs.sh
      ./sbin/stop-yarn.sh

      ——————————————————————————————————————————————————

      posted @ 2016-11-17 08:45  Uncle_Nucky  閱讀(663)  評論(0)    收藏  舉報
      主站蜘蛛池模板: 久久精品熟女亚洲av麻| 亚洲国产精品黄在线观看| 五指山市| 高清中文字幕一区二区| 国产精品福利自产拍久久| 国产欧美丝袜在线二区| av无码久久久久不卡网站蜜桃| 精品久久人人妻人人做精品| 任我爽精品视频在线播放| 美女胸18大禁视频网站| 亚洲国产大片永久免费看| 国模一区二区三区私拍视频| 麻豆蜜桃av蜜臀av色欲av| 久久精品国产九一九九九| 国产性色的免费视频网站 | 美女一区二区三区亚洲麻豆| 亚洲国产成人综合精品| 亚洲av乱码一区二区| 国产精品国产精品偷麻豆| 国产高清在线精品一区二区三区| 国内精品久久久久久无码不卡| 国产99视频精品免费专区| 老司机午夜福利视频| 久久婷婷五月综合色99啪ak| 成人午夜av在线播放| 日韩一区二区在线观看视频| 亚洲男人天堂东京热加勒比| 94人妻少妇偷人精品| 国产AV福利第一精品| 国产综合亚洲区在线观看| 99国产精品永久免费视频| 国产精品综合色区在线观| 亚洲中文精品一区二区| 久久一区二区中文字幕| 亚洲国产精品线观看不卡| 久热这里只有精品12| 国产精品成人网址在线观看| 双鸭山市| 色偷偷www.8888在线观看| 国产视频一区二区在线看| 潘金莲高清dvd碟片|