Oozie 大数据开发标配 (一)-基本使用及排错

一、Oozie功能模块介绍

Oozie英文翻译为:驯象人。一个基于工作流引擎的开源框架,由Cloudera公司贡献给Apache,提供对Hadoop MapReduce、Pig Jobs的任务调度与协调。Oozie需要部署到Java Servlet容器中运行。主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。

1.1 模块

1)Workflow
顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)
2)Coordinator
定时触发 workflow
3)Bundle Job
绑定多个 Coordinator

1.2 常用节点

1)控制流节点(Control Flow Nodes)
控制流节点一般都是定义在工作流开始或结束的位置,比如start,end,kill 等。以及提供工作流的执行路径机制,如decision,fork,join等。

2) 动作节点(Action Nodes)
负责执行具体动作的节点,比如:拷贝文件,执行某个Shell脚本等等。

1.3 配置文件

Oozie配置文件
file

命令行:

[root@homaybd01 yarn]# jps
9456 ResourceManager
9634 NodeManager
29990 Jps
7335 NameNode
6716 
6909 DataNode
10286 -- process information unavailable
[root@homaybd01 yarn]# jps -l
9456 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
9634 org.apache.hadoop.yarn.server.nodemanager.NodeManager
30147 sun.tools.jps.Jps
7335 org.apache.hadoop.hdfs.server.namenode.NameNode
6716 
6909 org.apache.hadoop.hdfs.server.datanode.DataNode
10286 -- process information unavailable
[root@homaybd05 /]# jps
12850 DataNode
24551 AmbariServer
12520 QuorumPeerMain
13916 -- process information unavailable
15246 ZeppelinServer
7630 Jps
13486 NodeManager
21311 RemoteInterpreterServer
22015 Bootstrap
[root@homaybd05 /]# jps -l
12850 org.apache.hadoop.hdfs.server.datanode.DataNode
24551 org.apache.ambari.server.controller.AmbariServer
12520 org.apache.zookeeper.server.quorum.QuorumPeerMain
13916 -- process information unavailable
7645 sun.tools.jps.Jps
15246 org.apache.zeppelin.server.ZeppelinServer
13486 org.apache.hadoop.yarn.server.nodemanager.NodeManager
21311 org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
22015 org.apache.catalina.startup.Bootstrap
[root@homaybd05 /]# 

查找Oozie安装的目录:

[root@homaybd05 /]# find / -name "oozie"
find: ‘/proc/24873/task/9967’: No such file or directory
/run/oozie
/etc/oozie
/var/tmp/oozie
/var/lib/mysql/oozie
/var/lib/ambari-server/resources/views/work/WORKFLOW_MANAGER{1.0.0}/org/apache/oozie
/var/lib/smartsense/hst-agent/resources/collection-scripts/oozie
/var/lib/oozie
/var/log/oozie
/var/spool/mail/oozie
/usr/bin/oozie
/usr/hdp/3.1.0.0-78/etc/oozie
/usr/hdp/3.1.0.0-78/oozie
/usr/hdp/3.1.0.0-78/oozie/bin/oozie
/usr/hdp/3.1.0.0-78/oozie/share/lib/oozie
/usr/hdp/3.1.0.0-78/oozie/oozie-server/webapps/oozie
/usr/hdp/3.1.0.0-78/oozie/oozie-server/work/Catalina/localhost/oozie
/usr/hdp/3.1.0.0-78/oozie/webapps/oozie
/usr/hdp/3.1.0.0-78/oozie/var/tmp/oozie
/home/oozie
/hadoop/oozie
[root@homaybd05 /]# 

目录:

[root@homaybd05 3.1.0.0-78]# pwd
/usr/hdp/3.1.0.0-78
[root@homaybd05 3.1.0.0-78]# ls -l
total 40
drwxr-xr-x.  4 root  root     34 Mar  4 16:19 atlas
drwxr-xr-x.  9 root  root    119 Mar  4 16:21 etc
drwxr-xr-x.  9 root  root   4096 Mar  4 16:20 hadoop
drwxr-xr-x.  7 root  root   4096 Mar  4 16:19 hadoop-hdfs
drwxr-xr-x.  6 root  root   8192 Mar  4 16:19 hadoop-mapreduce
drwxr-xr-x.  9 root  root   4096 Mar  4 16:19 hadoop-yarn
drwxr-xr-x. 11 root  root    133 Mar  4 16:20 hbase
drwxr-xr-x. 16 oozie hadoop 4096 Mar  4 16:34 oozie
drwxr-xr-x.  4 root  root   4096 Mar  4 16:19 ranger-hbase-plugin
drwxr-xr-x.  4 root  root   4096 Mar  4 16:18 ranger-hdfs-plugin
drwxr-xr-x.  4 root  root    224 Mar  4 16:18 ranger-yarn-plugin
drwxr-xr-x.  3 root  root     17 Mar  4 16:18 spark2
drwxr-xr-x.  4 root  root     32 Mar  4 16:20 usr
drwxr-xr-x.  7 root  root    132 Mar  4 16:20 zookeeper
[root@homaybd05 3.1.0.0-78]# 

查看Oozie目录结构:

[root@homaybd05 oozie]# pwd
/usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]# ls -l
total 1910352
drwxr-xr-x. 2 root  root          220 Mar  4 16:21 bin
lrwxrwxrwx. 1 root  root           23 Mar  4 16:26 conf -> /etc/oozie/3.1.0.0-78/0
drwxr-xr-x. 2 root  root          138 Mar  4 16:21 doc
drwxr-xr-x. 3 root  root           18 Mar  4 16:21 etc
drwxr-xr-x. 2 root  root         8192 Mar  4 16:21 lib
drwxr-xr-x. 2 root  root           38 Mar  4 16:34 libext
drwxr-xr-x. 2 root  root        16384 Mar  4 16:22 libserver
drwxr-xr-x. 2 root  root        16384 Mar  4 16:22 libtools
drwxr-xr-x. 3 root  root           18 Mar  4 16:21 man
drwxr-xr-x. 4 oozie hadoop         45 Mar  4 16:35 oozie-server
-rw-r--r--. 1 root  root   1647990764 Mar  4 16:26 oozie-sharelib.tar.gz
-rwxr-xr-x. 1 root  root    308150911 Dec  6  2018 oozie.war
drwxr-xr-x. 2 root  root           34 Mar  4 16:22 schema
drwxr-xr-x. 3 root  root           17 Mar  4 16:21 share
drwxr-xr-x. 4 oozie oozie          48 Mar  4 16:22 tomcat-deployment
drwxr-xr-x. 3 root  root           17 Mar  4 16:34 var
drwxr-xr-x. 4 root  root           48 Mar  4 16:22 webapps
[root@homaybd05 oozie]# 

oozie-site.xml 文件

[root@homaybd05 conf]# cat oozie-site.xml 
  <configuration  xmlns:xi="http://www.w3.org/2001/XInclude">

    <property>
      <name>credentialStoreClassPath</name>
      <value>/var/lib/ambari-agent/cred/lib/*</value>
    </property>

    <property>
      <name>hadoop.security.credential.provider.path</name>
      <value>localjceks://file/usr/hdp/current/oozie-server/conf/oozie-site.jceks</value>
    </property>

    <property>
      <name>oozie.action.retry.interval</name>
      <value>30</value>
    </property>

    <property>
      <name>oozie.action.sharelib.for.spark.exclude</name>
      <value>oozie/jackson.*</value>
    </property>

    <property>
      <name>oozie.authentication.authentication.provider.url</name>
      <value></value>
    </property>

    <property>
      <name>oozie.authentication.expected.jwt.audiences</name>
      <value></value>
    </property>

    <property>
      <name>oozie.authentication.jwt.cookie</name>
      <value>hadoop-jwt</value>
    </property>

    <property>
      <name>oozie.authentication.public.key.pem</name>
      <value></value>
    </property>

    <property>
      <name>oozie.authentication.simple.anonymous.allowed</name>
      <value>true</value>
    </property>

    <property>
      <name>oozie.authentication.type</name>
      <value>simple</value>
    </property>

    <property>
      <name>oozie.base.url</name>
      <value>http://homaybd05:11000/oozie</value>
    </property>

    <property>
      <name>oozie.credentials.credentialclasses</name>
      <value>hcat=org.apache.oozie.action.hadoop.HiveCredentials,hive2=org.apache.oozie.action.hadoop.Hive2Credentials</value>
    </property>

    <property>
      <name>oozie.db.schema.name</name>
      <value>oozie</value>
    </property>

    <property>
      <name>oozie.service.ActionService.executor.ext.classes</name>
      <value>
      org.apache.oozie.action.email.EmailActionExecutor,
      org.apache.oozie.action.hadoop.ShellActionExecutor,
      org.apache.oozie.action.hadoop.SqoopActionExecutor,
      org.apache.oozie.action.hadoop.DistcpActionExecutor</value>
    </property>

    <property>
      <name>oozie.service.AuthorizationService.security.enabled</name>
      <value>true</value>
    </property>

    <property>
      <name>oozie.service.CallableQueueService.callable.concurrency</name>
      <value>3</value>
    </property>

    <property>
      <name>oozie.service.CallableQueueService.queue.size</name>
      <value>1000</value>
    </property>

    <property>
      <name>oozie.service.CallableQueueService.threads</name>
      <value>10</value>
    </property>

    <property>
      <name>oozie.service.coord.normal.default.timeout</name>
      <value>120</value>
    </property>

    <property>
      <name>oozie.service.coord.push.check.requeue.interval</name>
      <value>30000</value>
    </property>

    <property>
      <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
      <value>*=/usr/hdp/3.1.0.0-78/hadoop/conf</value>
    </property>

    <property>
      <name>oozie.service.HadoopAccessorService.kerberos.enabled</name>
      <value>false</value>
    </property>

    <property>
      <name>oozie.service.JPAService.create.db.schema</name>
      <value>false</value>
    </property>

    <property>
      <name>oozie.service.JPAService.jdbc.driver</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
      <name>oozie.service.JPAService.jdbc.url</name>
      <value>jdbc:mysql://homaybd05/oozie</value>
    </property>

    <property>
      <name>oozie.service.JPAService.jdbc.username</name>
      <value>oozie</value>
    </property>

    <property>
      <name>oozie.service.JPAService.pool.max.active.conn</name>
      <value>10</value>
    </property>

    <property>
      <name>oozie.service.PurgeService.older.than</name>
      <value>30</value>
    </property>

    <property>
      <name>oozie.service.PurgeService.purge.interval</name>
      <value>3600</value>
    </property>

    <property>
      <name>oozie.service.SchemaService.wf.ext.schemas</name>
      <value>shell-action-0.1.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,sqoop-action-0.2.xsd,ssh-action-0.1.xsd,distcp-action-0.1.xsd,shell-action-0.2.xsd,oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,hive-action-0.3.xsd</value>
    </property>

    <property>
      <name>oozie.service.SparkConfigurationService.spark.configurations</name>
      <value>*=/usr/hdp/current/spark-client/conf</value>
    </property>

    <property>
      <name>oozie.service.URIHandlerService.uri.handlers</name>
      <value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value>
    </property>

    <property>
      <name>oozie.service.WorkflowAppService.system.libpath</name>
      <value>/user/${user.name}/share/lib</value>
    </property>

    <property>
      <name>oozie.services</name>
      <value>
      org.apache.oozie.service.SchedulerService,
      org.apache.oozie.service.InstrumentationService,
      org.apache.oozie.service.CallableQueueService,
      org.apache.oozie.service.UUIDService,
      org.apache.oozie.service.ELService,
      org.apache.oozie.service.AuthorizationService,
      org.apache.oozie.service.UserGroupInformationService,
      org.apache.oozie.service.HadoopAccessorService,
      org.apache.oozie.service.URIHandlerService,
      org.apache.oozie.service.MemoryLocksService,
      org.apache.oozie.service.DagXLogInfoService,
      org.apache.oozie.service.SchemaService,
      org.apache.oozie.service.LiteWorkflowAppService,
      org.apache.oozie.service.JPAService,
      org.apache.oozie.service.StoreService,
      org.apache.oozie.service.SLAStoreService,
      org.apache.oozie.service.DBLiteWorkflowStoreService,
      org.apache.oozie.service.CallbackService,
      org.apache.oozie.service.ActionService,
      org.apache.oozie.service.ActionCheckerService,
      org.apache.oozie.service.RecoveryService,
      org.apache.oozie.service.PurgeService,
      org.apache.oozie.service.CoordinatorEngineService,
      org.apache.oozie.service.BundleEngineService,
      org.apache.oozie.service.DagEngineService,
      org.apache.oozie.service.CoordMaterializeTriggerService,
      org.apache.oozie.service.StatusTransitService,
      org.apache.oozie.service.PauseTransitService,
      org.apache.oozie.service.GroupsService,
      org.apache.oozie.service.ProxyUserService,
      org.apache.oozie.service.JobsConcurrencyService,
      org.apache.oozie.service.ShareLibService,
      org.apache.oozie.service.SparkConfigurationService,
      org.apache.oozie.service.XLogStreamingService</value>
    </property>

    <property>
      <name>oozie.services.ext</name>
      <value>
      org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value>
    </property>

    <property>
      <name>oozie.system.id</name>
      <value>oozie-${user.name}</value>
    </property>

    <property>
      <name>oozie.systemmode</name>
      <value>NORMAL</value>
    </property>

    <property>
      <name>use.system.libpath.for.mapreduce.and.pig.jobs</name>
      <value>false</value>
    </property>

  </configuration>[root@homaybd05 conf]# 

1.3 配置Ext JS库

Oozie web console is disabled.

To enable Oozie web console install the Ext JS library.

Refer to Oozie Quick Start documentation for details.

 Documentation
Oozie Web Console

file

缺少ext-2.2.zip扩展包,ext是一个js框架,用于展示oozie前端页面,所以,需要下载下来。

wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip

或者复制这个链接:http://archive.cloudera.com/gplextras/misc/ext-2.2.zip 下载到本地。

操作步骤:

[root@homaybd05 libext]# pwd
/usr/hdp/3.1.0.0-78/oozie/libext
[root@homaybd05 libext]# wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip--2022-03-10 15:38:47--  http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.108.167
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.108.167|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6800612 (6.5M) [application/zip]
Saving to: ‘ext-2.2.zip’

100%[==========================================>] 6,800,612   3.43MB/s   in 1.9s   

2022-03-10 15:38:49 (3.43 MB/s) - ‘ext-2.2.zip’ saved [6800612/6800612]

[root@homaybd05 libext]# ls -l
total 9064
-rw-r--r--. 1 root  root   6800612 Feb 22  2018 ext-2.2.zip
-rw-r--r--. 1 oozie hadoop 2475087 Mar  4 16:34 mysql-connector-java.jar
[root@homaybd05 libext]# 

然后通过命令或者界面手动重启服务:

启动命令如下:
[root@homaybd05]$ bin/oozied.sh start
关闭命令如下:
[root@homaybd05]$ bin/oozied.sh stop

file

然后再重新访问oozie管理后台:
http://homaybd05:11000/oozie/?user.name=admin

file

二、Oozie案例

我们使用Oozie官方模板来进行测试。
file

案例一:Oozie调度shell脚本

目标:使用Oozie调度Shell脚本
分步实现:

1)解压官方案例模板

[root@homaybd05 doc]# pwd
/usr/hdp/3.1.0.0-78/oozie/doc
[root@homaybd05 doc]# ls -l
total 224
-rw-r--r--. 1 oozie hadoop   1366 Dec  6  2018 configuration.xsl
-rw-r--r--. 1 oozie hadoop  37664 Dec  6  2018 LICENSE.txt
-rw-r--r--. 1 oozie hadoop    909 Dec  6  2018 NOTICE.txt
-rwxr-xr-x. 1 oozie hadoop  46394 Dec  6  2018 oozie-examples.tar.gz
-rw-r--r--. 1 oozie hadoop   3084 Dec  6  2018 README.txt
-rw-r--r--. 1 oozie hadoop 124815 Dec  6  2018 release-log.txt
[root@homaybd05 doc]# 

进入到示例目录,解压:

[root@homaybd05 doc]# tar -zxvf oozie-examples.tar.gz

查看解压后的目录:

[root@homaybd05 examples]# ls -l
total 4
drwxr-xr-x. 27 hive users 4096 Dec  6  2018 apps
drwxr-xr-x.  5 root root    46 Mar 10 00:00 input-data
drwxr-xr-x.  3 hive users   17 Dec  6  2018 src

查看提供的示例:

[root@homaybd05 apps]# ls -l
total 0
drwxr-xr-x. 3 hive users 151 Mar 10 00:00 aggregator
drwxr-xr-x. 2 hive users  46 Mar 10 00:00 bundle
drwxr-xr-x. 3 hive users  82 Mar 10 00:00 coord-input-logic
drwxr-xr-x. 2 hive users  71 Mar 10 00:00 cron
drwxr-xr-x. 2 hive users  71 Mar 10 00:00 cron-schedule
drwxr-xr-x. 3 hive users  73 Mar 10 00:00 custom-main
drwxr-xr-x. 3 hive users  59 Mar 10 00:00 datelist-java-main
drwxr-xr-x. 3 hive users 103 Mar 10 00:00 demo
drwxr-xr-x. 2 hive users  48 Mar 10 00:00 distcp
drwxr-xr-x. 3 hive users  59 Mar 10 00:00 hadoop-el
drwxr-xr-x. 2 hive users 159 Mar 10 00:00 hcatalog
drwxr-xr-x. 2 hive users 107 Mar 10 00:00 hive
drwxr-xr-x. 2 hive users 138 Mar 10 00:00 hive2
drwxr-xr-x. 3 hive users  59 Mar 10 00:00 java-main
drwxr-xr-x. 3 hive users 137 Mar 10 00:00 map-reduce
drwxr-xr-x. 2 hive users  48 Mar 10 00:00 no-op
drwxr-xr-x. 2 hive users  62 Mar 10 00:00 pig
drwxr-xr-x. 2 hive users  48 Mar 10 00:00 shell
drwxr-xr-x. 2 hive users  71 Mar 10 00:00 sla
drwxr-xr-x. 3 hive users  59 Mar 10 00:00 spark
drwxr-xr-x. 2 hive users 100 Mar 10 00:00 sqoop
drwxr-xr-x. 2 hive users 100 Mar 10 00:00 sqoop-freeform
drwxr-xr-x. 2 hive users  48 Mar 10 00:00 ssh
drwxr-xr-x. 2 hive users  78 Mar 10 00:00 streaming
drwxr-xr-x. 2 hive users  48 Mar 10 00:00 subwf

2)创建工作目录

[root@homaybd05 oozie]# cd /usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]#  mkdir oozie-apps/

3)拷贝任务模板到 oozie-apps/ 目录

[atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r doc/examples/apps/shell/ oozie-apps

查看shell目录下的文件:

[root@homaybd05 shell]# ls -l
total 8
-rw-r--r--. 1 root root  971 Mar 10 00:12 job.properties
-rw-r--r--. 1 root root 2075 Mar 10 00:12 workflow.xml

4)编写脚本p1.sh

[atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ vi oozie-apps/shell/p1.sh

内容如下:

#!/bin/bash
date > /opt/module/p1.log

5)修改job.properties和workflow.xml文件
job.properties

#HDFS地址
nameNode=hdfs://homaybd01:8020
#ResourceManager地址,yarn,服务端口
jobTracker=homaybd01:8050
#队列名称
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC=p1.sh

上述的 nameNode可在第一台服务器查询所得:

[root@homaybd01 hadoop]# jps
9456 ResourceManager
9634 NodeManager
7335 NameNode
2488 HMaster
6716 
6909 DataNode
32398 Jps
[root@homaybd01 hadoop]# 

查看脚本:

[root@homaybd05 shell]# pwd
/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell

查看:

[root@homaybd01 hadoop]# pwd
/usr/hdp/3.1.0.0-78/hadoop/etc/hadoop
[root@homaybd01 hadoop]# cat core-site.xml
 <property>
      <name>fs.defaultFS</name>
      <value>hdfs://homaybd01:8020</value>
      <final>true</final>
    </property>

ResourceManager地址,可在 yarn.site查的:

 <property>
      <name>yarn.resourcemanager.address</name>
      <value>homaybd01:8050</value>
    </property>
    <property>
      <name>yarn.resourcemanager.admin.address</name>
      <value>homaybd01:8141</value>
    </property>

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
    <shell xmlns="uri:oozie:shell-action:0.2">
            <!-- 将当前任务提交给yarn 分配资源 -->
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <exec>${EXEC}</exec>
        <!-- <argument>my_output=Hello Oozie</argument> -->
        <file>/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/${EXEC}#${EXEC}</file>
        <capture-output/>
    </shell>
    <ok to="end"/>
    <error to="fail"/>
</action>
<decision name="check-output">
    <switch>
        <case to="end">
            ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
        </case>
        <default to="fail-output"/>
    </switch>
</decision>
<kill name="fail">
    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
    <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>

6)上传任务配置

查看hdfs目录:

[root@homaybd05 ~]# hadoop fs -ls /
Found 13 items
drwxrwxrwt   - yarn   hadoop          0 2022-03-09 14:58 /app-logs
drwxr-xr-x   - hdfs   hdfs            0 2022-03-09 09:15 /apps
drwxr-xr-x   - yarn   hadoop          0 2022-03-04 16:32 /ats
drwxr-xr-x   - hdfs   hdfs            0 2022-03-04 16:32 /atsv2
drwxr-xr-x   - hdfs   hdfs            0 2022-03-04 16:32 /hdp
drwx------   - livy   hdfs            0 2022-03-04 16:32 /livy2-recovery
drwxr-xr-x   - mapred hdfs            0 2022-03-04 16:32 /mapred
drwxrwxrwx   - mapred hadoop          0 2022-03-04 16:32 /mr-history
drwxr-xr-x   - hdfs   hdfs            0 2022-03-04 16:32 /services
drwxrwxrwx   - spark  hadoop          0 2022-03-10 11:58 /spark2-history
drwxrwxrwx   - hdfs   hdfs            0 2022-03-09 10:51 /tmp
drwxrwxrwx   - hdfs   hdfs            0 2022-03-09 14:58 /user
drwxr-xr-x   - hdfs   hdfs            0 2022-03-04 16:32 /warehouse

创建hdfs目录:

[root@homaybd05 ~]# hadoop fs  -mkdir /user/oozie-test/

查看目录:

[root@homaybd05 ~]# hadoop fs -ls /user
Found 10 items
drwxrwxrwx   - ambari-qa hdfs            0 2022-03-04 16:37 /user/ambari-qa
drwxrwxrwx   - hbase     hdfs            0 2022-03-09 09:15 /user/hbase
drwxrwxrwx   - hdfs      hdfs            0 2022-03-09 14:45 /user/hdfs
drwxr-xr-x   - hive      hdfs            0 2022-03-09 17:03 /user/hive
drwxrwxrwx   - livy      hdfs            0 2022-03-04 16:32 /user/livy
drwxrwxrwx   - oozie     hdfs            0 2022-03-04 16:34 /user/oozie
drwxr-xr-x   - root      hdfs            0 2022-03-10 12:04 /user/oozie-test
drwxr-xr-x   - root      hdfs            0 2022-03-09 16:10 /user/root
drwxrwxrwx   - spark     hdfs            0 2022-03-09 17:22 /user/spark
drwxrwxrwx   - yarn-ats  hadoop          0 2022-03-04 16:32 /user/yarn-ats

上传到hdfs目录

[root@homaybd05 ~]# cd /usr/hdp/3.1.0.0-78/oozie
[root@homaybd05 oozie]$ hadoop fs -put oozie-apps/ /user/oozie-test

查看上传:

[root@homaybd05 oozie]# hadoop fs -ls /user/oozie-test
Found 1 items
drwxr-xr-x   - root hdfs          0 2022-03-10 14:55 /user/oozie-test/oozie-apps
[root@homaybd05 oozie]# 

7)执行任务

[root@homaybd05 oozie]$ bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run

执行后报了这样的错误:

[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
Error: E0504 : E0504: App directory [hdfs://homaybd01:8020/user/root/oozie-apps/apps/shell] does not exist

因为我们使用的是 root 账号,所以,在该账号下hdfs没有找到对应的文件,需要将之前创建的 oozie-test 下的文件转到 root 用户
file

[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root

查看:

[root@homaybd05 oozie]# hadoop fs -ls /user/root
Found 3 items
drwx------   - root hdfs          0 2022-03-10 02:00 /user/root/.Trash
drwxr-xr-x   - root hdfs          0 2022-03-10 15:07 /user/root/.sparkStaging
drwxr-xr-x   - root hdfs          0 2022-03-10 16:08 /user/root/oozie-apps
[root@homaybd05 oozie]# 

如果写错了,则可以删除:

[root@homaybd05 oozie]# hadoop fs -rm -r /user/root/oozie-apps/
[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root

再次执行任务:

[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000000-220310154506139-oozie-oozi-W

可以看到,已经执行OK了。

file

通过管理后台,我们可以看到任务已经完成了。

file
如果看不到上述的错误信息日志,那可能是浏览器兼容问题。

换用火狐浏览器,即可看到执行出错的信息:
file

错误的信息:

JA008: File does not exist: hdfs://homaybd01:8020/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/p1.sh#p1.sh

原来是workflow.xml shell脚本的文件路径没配置对,正确的应该是:


<file>/usr/hdp/3.1.0.0-78/oozie/oozie-apps/shell/${EXEC}#${EXEC}</file>   # 错误

  # 注意,这里的路径为hdfs路径
<file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file> 

修改后的文件:

[root@homaybd05 shell]# cat workflow.xml
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
    <start to="shell-node"/>
    <action name="shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>${EXEC}</exec>
           <!-- <argument>my_output=Hello Oozie</argument> -->
           <file>/user/root/oozie-apps/shell/${EXEC}#${EXEC}</file>
            <capture-output/>
        </shell>
        <ok to="check-output"/>
        <error to="fail"/>
    </action>
    <decision name="check-output">
        <switch>
            <case to="end">
                ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
            </case>
            <default to="fail-output"/>
        </switch>
    </decision>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <kill name="fail-output">
        <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

修改之后,然后删掉原来hdfs的文件,再重新上传

[root@homaybd05 oozie]# hadoop fs -rm -r /user/root/oozie-apps/
[root@homaybd05 oozie]# hadoop fs -put oozie-apps/ /user/root

再次执行任务:

[root@homaybd05 oozie]#  bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000002-220310154506139-oozie-oozi-W

可以看到shell脚本已经成功执行了:
file

file

又报了一个错误:
file

可以通过日志看到,这个任务提交给了 homaybd01 这台服务器来执行:

http://homaybd01:8088/proxy/application_1646382766050_0070/

然后访问 http://homaybd01:8088 查看 homaybd01 hadoop的执行任务:
file

查看执行日志:
file

点进去:
file

可以看到yarn的任务执行成功了
file

file

file

找到原因了,是由于在 homaybd03 这台机器执行,但是该机器下边没有 /opt/module/ 目录,所以报错了。

 Log Type: stderr

Log Upload Time: Thu Mar 10 17:46:37 +0800 2022

Log Length: 156

./p1.sh: line 2: /opt/module/p1.log: No such file or directory
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

然后再各个节点创建该目录即可。

[root@homaybd03 opt]# cd module
-bash: cd: module: No such file or directory
[root@homaybd03 opt]#  mkdir module

再次执行任务:

[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000004-220310154506139-oozie-oozi-W

又报错了...

然后再次查看 Hadoop的日志。
file

file

file

file

file

报错详情:

 Log Type: stderr

Log Upload Time: Thu Mar 10 21:29:08 +0800 2022

Log Length: 148

./p1.sh: line 2: /opt/module/p1.log: Permission denied
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

./p1.sh: line 2: /opt/module/p1.log: Permission denied 表示该文件没有可执行权限,需要设置可执行权限:

file

[root@homaybd05 shell]# chmod 755 p1.sh

file

查看 上传到hdfs下的文件权限:
file

然后修改hdfs的文件执行权限:

[root@homaybd05 oozie]# hadoop fs -chmod 755 /user/root/oozie-apps/shell/p1.sh
[root@homaybd05 oozie]# hadoop fs -ls /user/root/oozie-apps/shell/
Found 3 items
-rw-r--r--   3 root hdfs        979 2022-03-11 00:40 /user/root/oozie-apps/shell/job.properties
-rwxr-xr-x   3 root hdfs         48 2022-03-11 00:40 /user/root/oozie-apps/shell/p1.sh
-rw-r--r--   3 root hdfs       2145 2022-03-11 00:40 /user/root/oozie-apps/shell/workflow.xml
[root@homaybd05 oozie]# 

可以看到,已经改为可执行权限了。

最后一次,再执行一次任务

[root@homaybd05 oozie]# bin/oozie job -oozie http://homaybd05:11000/oozie -config oozie-apps/shell/job.properties -run
job: 0000008-220310154506139-oozie-oozi-W

可以看到,终于执行成功了 ^_^
file

file

file

另外,整个任务job交给Yarn来执行,Yarn 选一个nodeManager 来执行,在那个节点上执行就打印在哪里,上边是在 homaybd01 节点执行,我们去该节点看是否有日志生成。

[root@homaybd01 module]# ls -l
total 0
-rw-r--r--. 1 yarn hadoop 0 Mar 11 00:56 p1.log
[root@homaybd01 module]# cat p1.log
[root@homaybd01 module]# 

8)杀死某个任务

[root@homaybd05 oozie]#  bin/oozie job -oozie http://homaybd05:11000/oozie -kill 0000000-220310154506139-oozie-oozi-W

为者常成,行者常至