Oozie 大数据开发标配 (一)-基本使用及排错
一、Oozie功能模块介绍
Oozie英文翻译为:驯象人。一个基于工作流引擎的开源框架,由Cloudera公司贡献给Apache,提供对Hadoop MapReduce、Pig Jobs的任务调度与协调。Oozie需要部署到Java Servlet容器中运行。主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。
1.1 模块
1)Workflow
顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)
2)Coordinator
定时触发 workflow
3)Bundle Job
绑定多个 Coordinator
1.2 常用节点
1)控制流节点(Control Flow Nodes)
控制流节点一般都是定义在工作流开始或结束的位置,比如start,end,kill 等。以及提供工作流的执行路径机制,如decision,fork,join等。
2) 动作节点(Action Nodes)
负责执行具体动作的节点,比如:拷贝文件,执行某个Shell脚本等等。
1.3 配置文件
Oozie配置文件
命令行:
[root@homaybd01 yarn]# jps
9456 ResourceManager
9634 NodeManager
29990 Jps
7335 NameNode
6716
6909 DataNode
10286 -- process information unavailable
[root@homaybd01 yarn]# jps -l
9456 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
9634 org.apache.hadoop.yarn.server.nodemanager.NodeManager
30147 sun.tools.jps.Jps
7335 org.apache.hadoop.hdfs.server.namenode.NameNode
6716
6909 org.apache.hadoop.hdfs.server.datanode.DataNode
10286 -- process information unavailable
[root@homaybd05 /]# jps
12850 DataNode
24551 AmbariServer
12520 QuorumPeerMain
13916 -- process information unavailable
15246 ZeppelinServer
7630 Jps
13486 NodeManager
21311 RemoteInterpreterServer
22015 Bootstrap
[root@homaybd05 /]# jps -l
12850 org.apache.hadoop.hdfs.server.datanode.DataNode
24551 org.apache.ambari.server.controller.AmbariServer
12520 org.apache.zookeeper.server.quorum.QuorumPeerMain
13916 -- process information unavailable
7645 sun.tools.jps.Jps
15246 org.apache.zeppelin.server.ZeppelinServer
13486 org.apache.hadoop.yarn.server.nodemanager.NodeManager
21311 org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
22015 org.apache.catalina.startup.Bootstrap
[root@homaybd05 /]#
查找Oozie安装的目录:
[root@homaybd05 /]# find / -name "oozie"
find: ‘/proc/24873/task/9967’: No such file or directory
/run/oozie
/etc/oozie
/var/tmp/oozie
/var/lib/mysql/oozie
/var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/org/apache/oozie
/var/lib/smartsense/hst-agent/resources/collection-scripts/oozie
/var/lib/oozie
/var/log/oozie
/var/spool/mail/oozie
/usr/bin/oozie
/usr/hdp/3.1.0.0-78/etc/oozie
/usr/hdp/3.1.0.0-78/oozie
/usr/hdp/3.1.0.0-78/oozie/bin/oozie
/usr/hdp/3.1.0.0-78/oozie/share/lib/oozie
/usr/hdp/3.1.0.0-78/oozie/oozie-server/webapps/oozie
/usr/hdp/3.1.0.0-78/oozie/oozie-server/work/Catalina/localhost/oozie
/usr/hdp/3.1.0.0-78/oozie/webapps/oozie
/usr/hdp/3.1.0.0-78/oozie/var/tmp/oozie
/home/oozie
/hadoop/oozie
[root@homaybd05 /]#
目录:
[root@homaybd05 3.1.0.0-78]# pwd
/usr/hdp/3.1.0.0-78
[root@homaybd05 3.1.0.0-78]# ls -l
total 40
drwxr-xr-x. 4 root root 34 Mar 4 16:19 atlas
drwxr-xr-x. 9 root root 119 Mar 4 16:21 etc
drwxr-xr-x. 9 root root 4096 Mar 4 16:20 hadoop
drwxr-xr-x. 7 root root 4096 Mar 4 16:19 hadoop-hdfs
drwxr-xr-x. 6 root root 8192 Mar 4 16:19 hadoop-mapreduce
drwxr-xr-x. 9 root root 4096 Mar 4 16:19 hadoop-yarn
drwxr-xr-x. 11 root root 133 Mar 4 16:20 hbase
drwxr-xr-x. 16 oozie hadoop 4096 Mar 4 16:34 oozie
drwxr-xr-x. 4 root root 4096 Mar 4 16:19 ranger-hbase-plugin
drwxr-xr-x. 4 root root 4096 Mar 4 16:18 ranger-hdfs-plugin
drwxr-xr-x. 4 root root 224 Mar 4 16:18 ranger-yarn-plugin
drwxr-xr-x. 3 root root 17 Mar 4 16:18 spark2
drwxr-xr-x. 4 root root 32 Mar 4 16:20 usr
drwxr-xr-x. 7 root root 132 Mar 4 16:20 zookeeper
[root@homaybd05 3.1.0.0-78]#
查看Oozie目录结构:
[root@homaybd05 oozie]# pwd
/usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]# ls -l
total 1910352
drwxr-xr-x. 2 root root 220 Mar 4 16:21 bin
lrwxrwxrwx. 1 root root 23 Mar 4 16:26 conf -> /etc/oozie/3.1.0.0-78/0
drwxr-xr-x. 2 root root 138 Mar 4 16:21 doc
drwxr-xr-x. 3 root root 18 Mar 4 16:21 etc
drwxr-xr-x. 2 root root 8192 Mar 4 16:21 lib
drwxr-xr-x. 2 root root 38 Mar 4 16:34 libext
drwxr-xr-x. 2 root root 16384 Mar 4 16:22 libserver
drwxr-xr-x. 2 root root 16384 Mar 4 16:22 libtools
drwxr-xr-x. 3 root root 18 Mar 4 16:21 man
drwxr-xr-x. 4 oozie hadoop 45 Mar 4 16:35 oozie-server
-rw-r--r--. 1 root root 1647990764 Mar 4 16:26 oozie-sharelib.tar.gz
-rwxr-xr-x. 1 root root 308150911 Dec 6 2018 oozie.war
drwxr-xr-x. 2 root root 34 Mar 4 16:22 schema
drwxr-xr-x. 3 root root 17 Mar 4 16:21 share
drwxr-xr-x. 4 oozie oozie 48 Mar 4 16:22 tomcat-deployment
drwxr-xr-x. 3 root root 17 Mar 4 16:34 var
drwxr-xr-x. 4 root root 48 Mar 4 16:22 webapps
[root@homaybd05 oozie]#
oozie-site.xml 文件
[root@homaybd05 conf]# cat oozie-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>credentialStoreClassPath</name>
<value>/var/lib/ambari-agent/cred/lib/*</value>
</property>
<property>
<name>hadoop.security.credential.provider.path</name>
<value>localjceks://file/usr/hdp/current/oozie-server/conf/oozie-site.jceks</value>
</property>
<property>
<name>oozie.action.retry.interval</name>
<value>30</value>
</property>
<property>
<name>oozie.action.sharelib.for.spark.exclude</name>
<value>oozie/jackson.*</value>
</property>
<property>
<name>oozie.authentication.authentication.provider.url</name>
<value></value>
</property>
<property>
<name>oozie.authentication.expected.jwt.audiences</name>
<value></value>
</property>
<property>
<name>oozie.authentication.jwt.cookie</name>
<value>hadoop-jwt</value>
</property>
<property>
<name>oozie.authentication.public.key.pem</name>
<value></value>
</property>
<property>
<name>oozie.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>
<property>
<name>oozie.authentication.type</name>
<value>simple</value>
</property>
<property>
<name>oozie.base.url</name>
<value>http://homaybd05:11000/oozie</value>
</property>
<property>
<name>oozie.credentials.credentialclasses</name>
<value>hcat=org.apache.oozie.action.hadoop.HiveCredentials,hive2=org.apache.oozie.action.hadoop.Hive2Credentials</value>
</property>
<property>
<name>oozie.db.schema.name</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.ActionService.executor.ext.classes</name>
<value>
org.apache.oozie.action.email.EmailActionExecutor,
org.apache.oozie.action.hadoop.ShellActionExecutor,
org.apache.oozie.action.hadoop.SqoopActionExecutor,
org.apache.oozie.action.hadoop.DistcpActionExecutor</value>
</property>
<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>true</value>
</property>
<property>
<name>oozie.service.CallableQueueService.callable.concurrency</name>
<value>3</value>
</property>
<property>
<name>oozie.service.CallableQueueService.queue.size</name>
<value>1000</value>
</property>
<property>
<name>oozie.service.CallableQueueService.threads</name>
<value>10</value>
</property>
<property>
<name>oozie.service.coord.normal.default.timeout</name>
<value>120</value>
</property>
<property>
<name>oozie.service.coord.push.check.requeue.interval</name>
<value>30000</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/usr/hdp/3.1.0.0-78/hadoop/conf</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.enabled</name>
<value>false</value>
</property>
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>false</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://homaybd05/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.pool.max.active.conn</name>
<value>10</value>
</property>
<property>
<name>oozie.service.PurgeService.older.than</name>
<value>30</value>
</property>
<property>
<name>oozie.service.PurgeService.purge.interval</name>
<value>3600</value>
</property>
<property>
<name>oozie.service.SchemaService.wf.ext.schemas</name>
<value>shell-action-0.1.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,sqoop-action-0.2.xsd,ssh-action-0.1.xsd,distcp-action-0.1.xsd,shell-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,hive-action-0.3.xsd</value>
</property>
<property>
<name>oozie.service.SparkConfigurationService.spark.configurations</name>
<value>*=/usr/hdp/current/spark-client/conf</value>
</property>
<property>
<name>oozie.service.URIHandlerService.uri.handlers</name>
<value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/${user.name}/share/lib</value>
</property>
<property>
<name>oozie.services</name>
<value>
org.apache.oozie.service.SchedulerService,
org.apache.oozie.service.InstrumentationService,
org.apache.oozie.service.CallableQueueService,
org.apache.oozie.service.UUIDService,
org.apache.oozie.service.ELService,
org.apache.oozie.service.AuthorizationService,
org.apache.oozie.service.UserGroupInformationService,
org.apache.oozie.service.HadoopAccessorService,
org.apache.oozie.service.URIHandlerService,
org.apache.oozie.service.MemoryLocksService,
org.apache.oozie.service.DagXLogInfoService,
org.apache.oozie.service.SchemaService,
org.apache.oozie.service.LiteWorkflowAppService,
org.apache.oozie.service.JPAService,
org.apache.oozie.service.StoreService,
org.apache.oozie.service.SLAStoreService,
org.apache.oozie.service.DBLiteWorkflowStoreService,
org.apache.oozie.service.CallbackService,
org.apache.oozie.service.ActionService,
org.apache.oozie.service.ActionCheckerService,
org.apache.oozie.service.RecoveryService,
org.apache.oozie.service.PurgeService,
org.apache.oozie.service.CoordinatorEngineService,
org.apache.oozie.service.BundleEngineService,
org.apache.oozie.service.DagEngineService,
org.apache.oozie.service.CoordMaterializeTriggerService,
org.apache.oozie.service.StatusTransitService,
org.apache.oozie.service.PauseTransitService,
org.apache.oozie.service.GroupsService,
org.apache.oozie.service.ProxyUserService,
org.apache.oozie.service.JobsConcurrencyService,
org.apache.oozie.service.ShareLibService,
org.apache.oozie.service.SparkConfigurationService,
org.apache.oozie.service.XLogStreamingService</value>
</property>
<property>
<name>oozie.services.ext</name>
<value>
org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value>
</property>
<property>
<name>oozie.system.id</name>
<value>oozie-${user.name}</value>
</property>
<property>
<name>oozie.systemmode</name>
<value>NORMAL</value>
</property>
<property>
<name>use.system.libpath.for.mapreduce.and.pig.jobs</name>
<value>false</value>
</property>
</configuration>[root@homaybd05 conf]#
1.3 配置Ext JS库
Oozie web console is disabled.
To enable Oozie web console install the Ext JS library.
Refer to Oozie Quick Start documentation for details.
Documentation
Oozie Web Console
缺少ext-2.2.zip扩展包,ext是一个js框架,用于展示oozie前端页面,所以,需要下载下来。
wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
或者复制这个链接:http://archive.cloudera.com/gplextras/misc/ext-2.2.zip 下载到本地。
操作步骤:
[root@homaybd05 libext]# pwd
/usr/hdp/3.1.0.0-78/oozie/libext
[root@homaybd05 libext]# wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip--2022-03-10 15:38:47-- http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.108.167
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.108.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6800612 (6.5M) [application/zip]
Saving to: ‘ext-2.2.zip’
100%[==========================================>] 6,800,612 3.43MB/s in 1.9s
2022-03-10 15:38:49 (3.43 MB/s) - ‘ext-2.2.zip’ saved [6800612/6800612]
[root@homaybd05 libext]# ls -l
total 9064
-rw-r--r--. 1 root root 6800612 Feb 22 2018 ext-2.2.zip
-rw-r--r--. 1 oozie hadoop 2475087 Mar 4 16:34 mysql-connector-java.jar
[root@homaybd05 libext]#
然后通过命令或者界面手动重启服务:
启动命令如下:
[root@homaybd05]$ bin/oozied.sh start
关闭命令如下:
[root@homaybd05]$ bin/oozied.sh stop
然后再重新访问oozie管理后台:
http://homaybd05:11000/oozie/?user.name=admin
二、Oozie案例
我们使用Oozie官方模板来进行测试。
案例一:Oozie调度shell脚本
目标:使用Oozie调度Shell脚本
分步实现:
1)解压官方案例模板
[root@homaybd05 doc]# pwd
/usr/hdp/3.1.0.0-78/oozie/doc
[root@homaybd05 doc]# ls -l
total 224
-rw-r--r--. 1 oozie hadoop 1366 Dec 6 2018 configuration.xsl
-rw-r--r--. 1 oozie hadoop 37664 Dec 6 2018 LICENSE.txt
-rw-r--r--. 1 oozie hadoop 909 Dec 6 2018 NOTICE.txt
-rwxr-xr-x. 1 oozie hadoop 46394 Dec 6 2018 oozie-examples.tar.gz
-rw-r--r--. 1 oozie hadoop 3084 Dec 6 2018 README.txt
-rw-r--r--. 1 oozie hadoop 124815 Dec 6 2018 release-log.txt
[root@homaybd05 doc]#
进入到示例目录,解压:
[root@homaybd05 doc]# tar -zxvf oozie-examples.tar.gz
查看解压后的目录:
[root@homaybd05 examples]# ls -l
total 4
drwxr-xr-x. 27 hive users 4096 Dec 6 2018 apps
drwxr-xr-x. 5 root root 46 Mar 10 00:00 input-data
drwxr-xr-x. 3 hive users 17 Dec 6 2018 src
查看提供的示例:
[root@homaybd05 apps]# ls -l
total 0
drwxr-xr-x. 3 hive users 151 Mar 10 00:00 aggregator
drwxr-xr-x. 2 hive users 46 Mar 10 00:00 bundle
drwxr-xr-x. 3 hive users 82 Mar 10 00:00 coord-input-logic
drwxr-xr-x. 2 hive users 71 Mar 10 00:00 cron
drwxr-xr-x. 2 hive users 71 Mar 10 00:00 cron-schedule
drwxr-xr-x. 3 hive users 73 Mar 10 00:00 custom-main
drwxr-xr-x. 3 hive users 59 Mar 10 00:00 datelist-java-main
drwxr-xr-x. 3 hive users 103 Mar 10 00:00 demo
drwxr-xr-x. 2 hive users 48 Mar 10 00:00 distcp
drwxr-xr-x. 3 hive users 59 Mar 10 00:00 hadoop-el
drwxr-xr-x. 2 hive users 159 Mar 10 00:00 hcatalog
drwxr-xr-x. 2 hive users 107 Mar 10 00:00 hive
drwxr-xr-x. 2 hive users 138 Mar 10 00:00 hive2
drwxr-xr-x. 3 hive users 59 Mar 10 00:00 java-main
drwxr-xr-x. 3 hive users 137 Mar 10 00:00 map-reduce
drwxr-xr-x. 2 hive users 48 Mar 10 00:00 no-op
drwxr-xr-x. 2 hive users 62 Mar 10 00:00 pig
drwxr-xr-x. 2 hive users 48 Mar 10 00:00 shell
drwxr-xr-x. 2 hive users 71 Mar 10 00:00 sla
drwxr-xr-x. 3 hive users 59 Mar 10 00:00 spark
drwxr-xr-x. 2 hive users 100 Mar 10 00:00 sqoop
drwxr-xr-x. 2 hive users 100 Mar 10 00:00 sqoop-freeform
drwxr-xr-x. 2 hive users 48 Mar 10 00:00 ssh
drwxr-xr-x. 2 hive users 78 Mar 10 00:00 streaming
drwxr-xr-x. 2 hive users 48 Mar 10 00:00 subwf
2)创建工作目录
[root@homaybd05 oozie]# cd /usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]# mkdir oozie-apps/
3)拷贝任务模板到 oozie-apps/
目录
[atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r doc/examples/apps/shell/ oozie-apps
查看shell目录下的文件:
[root@homaybd05 shell]# ls -l
total 8
-rw-r--r--. 1 root root 971 Mar 10 00:12 job.properties
-rw-r--r--. 1 root root 2075 Mar 10 00:12 workflow.xml
4)编写脚本p1.sh
[atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ vi oozie-apps/shell/p1.sh
内容如下:
#!/bin/bash
date > /opt/module/p1.log
5)修改job.properties和workflow.xml文件
job.properties
#HDFS地址
nameNode=hdfs://homaybd01:8020
#ResourceManager地址,yarn,服务端口
jobTracker=homaybd01:8050
#队列名称
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC=p1.sh
上述的 nameNode可在第一台服务器查询所得:
[root@homaybd01 hadoop]# jps
9456 ResourceManager
9634 NodeManager
7335 NameNode
2488 HMaster
6716
6909 DataNode
32398 Jps
[root@homaybd01 hadoop]#
查看脚本:
[root@homaybd05 shell]# pwd
/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell
查看:
[root@homaybd01 hadoop]# pwd
/usr/hdp/3.1.0.0-78/hadoop/etc/hadoop
[root@homaybd01 hadoop]# cat core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://homaybd01:8020</value>
<final>true</final>
</property>
ResourceManager地址,可在 yarn.site查的:
<property>
<name>yarn.resourcemanager.address</name>
<value>homaybd01:8050</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>homaybd01:8141</value>
</property>
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<!-- 将当前任务提交给yarn 分配资源 -->
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<!-- <argument>my_output=Hello Oozie</argument> -->
<file>/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<decision name="check-output">
<switch>
<case to="end">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<default to="fail-output"/>
</switch>
</decision>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
6)上传任务配置
查看hdfs目录:
[root@homaybd05 ~]# hadoop fs -ls /
Found 13 items
drwxrwxrwt - yarn hadoop 0 2022-03-09 14:58 /app-logs
drwxr-xr-x - hdfs hdfs 0 2022-03-09 09:15 /apps
drwxr-xr-x - yarn hadoop 0 2022-03-04 16:32 /ats
drwxr-xr-x - hdfs hdfs 0 2022-03-04 16:32 /atsv2
drwxr-xr-x - hdfs hdfs 0 2022-03-04 16:32 /hdp
drwx------ - livy hdfs 0 2022-03-04 16:32 /livy2-recovery
drwxr-xr-x - mapred hdfs 0 2022-03-04 16:32 /mapred
drwxrwxrwx - mapred hadoop 0 2022-03-04 16:32 /mr-history
drwxr-xr-x - hdfs hdfs 0 2022-03-04 16:32 /services
drwxrwxrwx - spark hadoop 0 2022-03-10 11:58 /spark2-history
drwxrwxrwx - hdfs hdfs 0 2022-03-09 10:51 /tmp
drwxrwxrwx - hdfs hdfs 0 2022-03-09 14:58 /user
drwxr-xr-x - hdfs hdfs 0 2022-03-04 16:32 /warehouse
创建hdfs目录:
[root@homaybd05 ~]# hadoop fs -mkdir /user/oozie-test/
查看目录:
[root@homaybd05 ~]# hadoop fs -ls /user
Found 10 items
drwxrwxrwx - ambari-qa hdfs 0 2022-03-04 16:37 /user/ambari-qa
drwxrwxrwx - hbase hdfs 0 2022-03-09 09:15 /user/hbase
drwxrwxrwx - hdfs hdfs 0 2022-03-09 14:45 /user/hdfs
drwxr-xr-x - hive hdfs 0 2022-03-09 17:03 /user/hive
drwxrwxrwx - livy hdfs 0 2022-03-04 16:32 /user/livy
drwxrwxrwx - oozie hdfs 0 2022-03-04 16:34 /user/oozie
drwxr-xr-x - root hdfs 0 2022-03-10 12:04 /user/oozie-test
drwxr-xr-x - root hdfs 0 2022-03-09 16:10 /user/root
drwxrwxrwx - spark hdfs 0 2022-03-09 17:22 /user/spark
drwxrwxrwx - yarn-ats hadoop 0 2022-03-04 16:32 /user/yarn-ats
上传到hdfs目录
[root@homaybd05 ~]# cd /usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]$ hadoop fs -put oozie-apps/ /user/oozie-test
查看上传:
[root@homaybd05 oozie]# hadoop fs -ls /user/oozie-test
Found 1 items
drwxr-xr-x - root hdfs 0 2022-03-10 14:55 /user/oozie-test/oozie-apps
[root@homaybd05 oozie]#
7)执行任务
[root@homaybd05 oozie]$ bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
执行后报了这样的错误:
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
Error: E0504 : E0504: App directory [hdfs://homaybd01:8020/user/root/oozie-apps/apps/shell] does not exist
因为我们使用的是 root 账号,所以,在该账号下hdfs没有找到对应的文件,需要将之前创建的 oozie-test 下的文件转到 root 用户
[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root
查看:
[root@homaybd05 oozie]# hadoop fs -ls /user/root
Found 3 items
drwx------ - root hdfs 0 2022-03-10 02:00 /user/root/.Trash
drwxr-xr-x - root hdfs 0 2022-03-10 15:07 /user/root/.sparkStaging
drwxr-xr-x - root hdfs 0 2022-03-10 16:08 /user/root/oozie-apps
[root@homaybd05 oozie]#
如果写错了,则可以删除:
[root@homaybd05 oozie]# hadoop fs -rm -r /user/root/oozie-apps/
[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root
再次执行任务:
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000000-220310154506139-oozie-oozi-W
可以看到,已经执行OK了。
通过管理后台,我们可以看到任务已经完成了。
如果看不到上述的错误信息日志,那可能是浏览器兼容问题。
换用火狐浏览器,即可看到执行出错的信息:
错误的信息:
JA008: File does not exist: hdfs://homaybd01:8020/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/p1.sh#p1.sh
原来是workflow.xml shell脚本的文件路径没配置对,正确的应该是:
<file>/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/${EXEC}#${EXEC}</file> # 错误
# 注意,这里的路径为hdfs路径
<file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file>
修改后的文件:
[root@homaybd05 shell]# cat workflow.xml
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<!-- <argument>my_output=Hello Oozie</argument> -->
<file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="check-output"/>
<error to="fail"/>
</action>
<decision name="check-output">
<switch>
<case to="end">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<default to="fail-output"/>
</switch>
</decision>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
修改之后,然后删掉原来hdfs的文件,再重新上传
[root@homaybd05 oozie]# hadoop fs -rm -r /user/root/oozie-apps/
[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root
再次执行任务:
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000002-220310154506139-oozie-oozi-W
可以看到shell脚本已经成功执行了:
又报了一个错误:
可以通过日志看到,这个任务提交给了 homaybd01
这台服务器来执行:
http://homaybd01:8088/proxy/application_1646382766050_0070/
然后访问 http://homaybd01:8088 查看 homaybd01
hadoop的执行任务:
查看执行日志:
点进去:
可以看到yarn的任务执行成功了
找到原因了,是由于在 homaybd03
这台机器执行,但是该机器下边没有 /opt/module/
目录,所以报错了。
Log Type: stderr
Log Upload Time: Thu Mar 10 17:46:37 +0800 2022
Log Length: 156
./p1.sh: line 2: /opt/module/p1.log: No such file or directory
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
然后再各个节点创建该目录即可。
[root@homaybd03 opt]# cd module
-bash: cd: module: No such file or directory
[root@homaybd03 opt]# mkdir module
再次执行任务:
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000004-220310154506139-oozie-oozi-W
又报错了...
然后再次查看 Hadoop的日志。
报错详情:
Log Type: stderr
Log Upload Time: Thu Mar 10 21:29:08 +0800 2022
Log Length: 148
./p1.sh: line 2: /opt/module/p1.log: Permission denied
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
./p1.sh: line 2: /opt/module/p1.log: Permission denied
表示该文件没有可执行权限,需要设置可执行权限:
[root@homaybd05 shell]# chmod 755 p1.sh
查看 上传到hdfs下的文件权限:
然后修改hdfs的文件执行权限:
[root@homaybd05 oozie]# hadoop fs -chmod 755 /user/root/oozie-apps/shell/p1.sh
[root@homaybd05 oozie]# hadoop fs -ls /user/root/oozie-apps/shell/
Found 3 items
-rw-r--r-- 3 root hdfs 979 2022-03-11 00:40 /user/root/oozie-apps/shell/job.properties
-rwxr-xr-x 3 root hdfs 48 2022-03-11 00:40 /user/root/oozie-apps/shell/p1.sh
-rw-r--r-- 3 root hdfs 2145 2022-03-11 00:40 /user/root/oozie-apps/shell/workflow.xml
[root@homaybd05 oozie]#
可以看到,已经改为可执行权限了。
最后一次,再执行一次任务:
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000008-220310154506139-oozie-oozi-W
可以看到,终于执行成功了 ^_^
另外,整个任务job交给Yarn来执行,Yarn 选一个nodeManager 来执行,在那个节点上执行就打印在哪里,上边是在 homaybd01
节点执行,我们去该节点看是否有日志生成。
[root@homaybd01 module]# ls -l
total 0
-rw-r--r--. 1 yarn hadoop 0 Mar 11 00:56 p1.log
[root@homaybd01 module]# cat p1.log
[root@homaybd01 module]#
8)杀死某个任务
[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -kill 0000000-220310154506139-oozie-oozi-W
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)