Hadoop(一)-创新互联

目录

在淮南等地区,都构建了全面的区域性战略布局,加强发展的系统性、市场前瞻性、产品创新能力,以专注、极致的服务理念,为客户提供网站设计、网站制作 网站设计制作专业公司,公司网站建设,企业网站建设,品牌网站建设,成都全网营销推广,外贸营销网站建设,淮南网站建设费用合理。

1.介绍:

2.下载

3.部署

伪分布式模式

1.部署jdk

2.部署hadoop

3.hdfs部署

4.ssh远程登录并执行

5.启动hdfs

7.部署yarn 

8.启动yarn

9.打开RM web ui 

10.启动停止命令

1.介绍:

广义:以 apache hadoop软件为主的生态圈: hive、flume、hbase、kafka、spark、flink
狭义:apache hadoop软件

hdfs 存储 海量的数据
mapreduce  计算、分析
yarn 资源和作业的调度

1.hdfs 存储 海量的数据:
namenode 负责指挥数据的存储
datanode 主要负责数据的存储
seconderynamenode  主要辅助namenode工作
2.yarn 资源和作业的调度
resourcemanager  负责指挥资源分配
nodemanager 真正的资源

2.下载

1.官网:    hadoop.apache.org / project.apache.org

2.https://archive.apache.org/dist

3.部署 3.1伪分布式模式

所有进程在一台机器上运行,所有操作在hadoop用户下进行

1.部署jdk
tar -zxvf ./jdk-8u45-linux-x64.gz -C ~/app/  //解压压缩包
ln -s ./jdk1.8.0_45/ java  //建立软连接 配置相关参数比较方便
//目录介绍
drwxr-xr-x. 2 hadoop hadoop     4096 Apr 11  2015 bin  java相关的脚本
drwxr-xr-x. 3 hadoop hadoop     4096 Apr 11  2015 include java运行过程中需要的jar
drwxr-xr-x. 5 hadoop hadoop     4096 Apr 11  2015 jre
drwxr-xr-x. 5 hadoop hadoop     4096 Apr 11  2015 lib java运行过程中需要的jar
-rw-r--r--. 1 hadoop hadoop 21099089 Apr 11  2015 src.zip java的源码包

配置环境变量 java 里面的脚本 在当前linux任何位置都可以使用

vim ~/.bashrc

export JAVA_HOME=/home/hadoop/app/java
export PATH=${JAVA_HOME}/bin:$PATH

source ~/.bashrc

java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
2.部署hadoop
tar -zxvf ./hadoop-3.3.4.tar.gz -C ~/app/
ln -s ./hadoop-3.3.4/ hadoop

//目录介绍
drwxr-xr-x. 2 hadoop hadoop  4096 Jul 29 21:44 bin  hadoop相关脚本
drwxr-xr-x. 3 hadoop hadoop  4096 Jul 29 20:35 etc  hadoop配置文件
drwxr-xr-x. 2 hadoop hadoop  4096 Jul 29 21:44 include
drwxr-xr-x. 3 hadoop hadoop  4096 Jul 29 21:44 lib
drwxr-xr-x. 3 hadoop hadoop  4096 Jul 29 20:35 sbin hadoop组件启动停止脚本
drwxr-xr-x. 4 hadoop hadoop  4096 Jul 29 22:21 share hadoop相关案例

配置环境变量:

vim ~/.bashrc

#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

source ~/.bashrc

配置参数

vim hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/java
3.hdfs部署
//1.core-site.xml
//fs.defaultFS 指定 namenode 所在机器
cd app/hadoop/conf
vim core-site.xmlfs.defaultFShdfs://fang02:9000//2.hdfs-site.xml
vim hdfs-site.xmldfs.replication1
4.ssh远程登录并执行
ssh to the localhost without a passphrase //免密登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

格式化文件系统

hdfs namenode -format

2022-11-11 22:25:33,783 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been
uccessfully formatted.
5.启动hdfs
start-dfs.sh//启动进程
检查 hdfs进程
jps/ps -ef | grep hdfs
4642 NameNode
4761 DataNode
4974 SecondaryNameNode

6.查看namenode web ui

http://fang02:9870/
http://192.168.41.12:9870/

7.部署yarn 
vim mapred-site.xml:mapreduce.framework.nameyarnmapreduce.application.classpath$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*vim yarn-site.xml:yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.env-whitelistJAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
8.启动yarn
start-yarn.sh
9.打开RM web ui 

http://fang02:8088/
http://192.168.41.12:8088/

10.启动停止命令
satar-all.sh //启动dadoop
stop-all.sh //停止hadoop
3.2 完全分布式 1.集群划分

hdfs:
     namenode nn
     datanode dn
     seconderynamenode  snn
yarn :
     resourcemanager rm
     nodemanager     nm

 bigdata32 : nn  dn      nm
 bigdata33 : dn  rm  nm
 bigdata34 : snn dn      nm

2.准备机器

3台 4G 2cpu 40G克隆机器 修改:

(1) ip  vim /etc/sysconfig/network-scripts/ifcfg-ens33
(2) hostname  vim /etc/hostname
(3) ip映射   vim /etc/hosts

3.ssh 免密登录【三台机器都要做】
[hadoop@bigdata32 ~]$ mkdir app software data shell project
[hadoop@bigdata32 ~]$ ssh-keygen -t rsa 
//拷贝公钥 【三台机器都要做】
ssh-copy-id bigdata32
ssh-copy-id bigdata33
ssh-copy-id bigdata34
4 jdk 部署【三台机器都要做】
//1.scp:
scp [[user@]host1:]file1 ... [[user@]host2:]file2
scp bigdata32:~/1.log  bigdata33:~
//2.rsync:
rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
rsync 	~/1.log bigdata34:~
bigdata32:~/1.log: 文件内容发生更新
rsync -av	~/1.log bigdata34:~
5.编写文件同步脚本
#!/bin/bash
#三台机器 进行文件发放
if [ $# -lt 1 ];then
	echo "参数不足"
	echo "eg:$0 filename..."
fi
#遍历发送文件到 三台机器
for host in bigdata32 bigdata33 bigdata34
do
	echo "=============$host=================="
	#1.遍历发送文件的目录
	for file in $@
	do
	#2.判断文件是否存在
	if [ -e ${file} ];then
		pathdir=$(cd $(dirname ${file});pwd)
		filename=$(basename ${file})
		#3.同步文件
		ssh $host "mkdir -p $pathdir"
		rsync -av $pathdir/$filename $host:$pathdir
	else
		echo "${file} 不存在"
	fi
	done
done

给脚本配置环境变量:

vim ~/.bashrc

export SHELL_HOME=/home/hadoop/shell
export PATH=${PATH}:${SHELL_HOME}

source ~/.bashrc
6.jdk 部署【三台机器都要安装】
//1.bigdata32 先安装jdk
[hadoop@bigdata32 software]$ tar -zxvf jdk-8u45-linux-x64.gz -C ~/app/
[hadoop@bigdata32 app]$ ln -s jdk1.8.0_45/ java
[hadoop@bigdata32 app]$ vim ~/.bashrc

#JAVA_HOME
export JAVA_HOME=/home/hadoop/app/java
export PATH=${PATH}:${JAVA_HOME}/bin

[hadoop@bigdata32 app]$ which java
~/app/java/bin/java

[hadoop@bigdata32 app]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode

[hadoop@bigdata32 app]$ xsync java/
[hadoop@bigdata32 app]$ xsync jdk1.8.0_45
[hadoop@bigdata32 app]$ xsync ~/.bashrc
//三台机器 source  ~/.bashrc
7.部署hadoop

 bigdata32 : nn  dn      nm
 bigdata33 :        dn  rm  nm
 bigdata34 :    snn dn      nm

[hadoop@bigdata32 software]$ tar -zxvf hadoop-3.3.4.tar.gz -C ~/app/
[hadoop@bigdata32 app]$ ln -s hadoop-3.3.4/ hadoop

[hadoop@bigdata32 app]$ vim ~/.bashrc

#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin

[hadoop@bigdata32 app]$ source ~/.bashrc

[hadoop@bigdata32 app]$ which hadoop
~/app/hadoop/bin/hadoop

//【三台机器一起做】
[hadoop@bigdata32 hadoop]$ pwd
/home/hadoop/data/hadoop
[hadoop@bigdata32 data]$ mkdir hadoop
8.配置hdfs
vim core-site.xml:fs.defaultFShdfs://bigdata32:9000hadoop.tmp.dir/home/hadoop/data/hadoopvim hdfs-site.xml:dfs.replication3dfs.namenode.secondary.http-addressbigdata34:9868dfs.namenode.secondary.https-addressbigdata34:9869
[hadoop@bigdata32 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@bigdata32 hadoop]$ cat workers
bigdata32
bigdata33
bigdata34
//同步bigdata32内容 到bigdata33 bigdata34
[hadoop@bigdata32 app]$ xsync hadoop
[hadoop@bigdata32 app]$ xsync hadoop-3.3.4
[hadoop@bigdata32 app]$ xsync ~/.bashrc
//三台机器都要做souce  ~/.bashrc

//格式化:格式化操作 部署时候做一次即可 namenode在哪就在哪台机器格式化
[hadoop@bigdata32 app]$hdfs namenode -format 

//启动hdfs:
start-dfs.sh  //namenode在哪 就在哪启动

访问namenode web ui: http://bigdata32:9870/

9.配置yarn
//先配置bigdata32 + 同步
 vim mapred-site.xml:mapreduce.framework.nameyarnmapreduce.application.classpath$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*vim yarn-site.xml:yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.env-whitelistJAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOMEyarn.resourcemanager.hostnamebigdata33
//bigdata32机器 配置文件分发到bigdata33 34:
[hadoop@bigdata32 app]$ xsync hadoop-3.3.4

//启动yarn:
start-yarn.sh //resourcemanager在哪 就在哪启动

访问RM web ui:bigdata33:8088

3.3启动停止hadoop 1.伪分布式
hdfs: start-dfs.sh
yarn:  start-yarn.sh
start-all.sh //启动hadoop
stop-all.sh //关闭hadoop
2.完全分布式

编写一个 群起脚本:

[hadoop@bigdata32 ~]$ vim shell/hadoop-cluster

#!/bin/bash
if [ $# -lt 1 ];then
	echo "Usage:$0 start|stop"
	exit
fi

case $1 in
 "start")
	echo "========启动hadoop集群========"
	echo "========启动 hdfs========"
	ssh bigdata32 "/home/hadoop/app/hadoop/sbin/start-dfs.sh"
	echo "========启动 yarn========"
	ssh bigdata33 "/home/hadoop/app/hadoop/sbin/start-yarn.sh"
 ;;
  "stop")
	echo "========停止hadoop集群========"
	echo "========停止 yarn========"
	ssh bigdata33 "/home/hadoop/app/hadoop/sbin/stop-yarn.sh"
	echo "========停止 hdfs========"
	ssh bigdata32 "/home/hadoop/app/hadoop/sbin/stop-dfs.sh"
 ;;
   *)
	echo "Usage:$0 start|stop"
 ;;
esac

编写查看 java 进程的脚本

[hadoop@bigdata32 ~]$ vim shell/jpsall

for host in bigdata32 bigdata33 bigdata34
do
    echo "==========$host========="
    ssh $host "/home/hadoop/app/java/bin/jps| grep -v Jps"
done

你是否还在寻找稳定的海外服务器提供商?创新互联www.cdcxhl.cn海外机房具备T级流量清洗系统配攻击溯源,准确流量调度确保服务器高可用性,企业级服务器适合批量采购,新人活动首月15元起,快前往官网查看详情吧


本文名称:Hadoop(一)-创新互联
当前URL:http://pcwzsj.com/article/dhcjog.html