Oozie 大数据开发标配 (三)-实战案例

一、workflow回顾

如何定义一个 Wrokflow

  • job.properties
    • 关键点:指向workflow.xml 文件所在的HDFS位置
  • workflow.xml
    • 定义文件
    • xml文件
    • 包含几点:
      • start
      • action
        MapReduce、Hive、Sqoop、Shell
        • ok
        • fail
          • kill
      • end
    • lib 目录
      依赖的jar包

workflow.xml 编写

二、HiveAction

HiveAction 官网DOC

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0">
    ...
    <action name="[NODE-NAME]">
        <hive xmlns="uri:oozie:hive-action:1.0">
            <resource-manager>[RESOURCE-MANAGER]</resource-manager>
            <name-node>[NAME-NODE]</name-node>
            <prepare>
               <delete path="[PATH]"/>
               ...
               <mkdir path="[PATH]"/>
               ...
            </prepare>
            <job-xml>[HIVE SETTINGS FILE]</job-xml>
            <configuration>
                <property>
                    <name>[PROPERTY-NAME]</name>
                    <value>[PROPERTY-VALUE]</value>
                </property>
                ...
            </configuration>
            <script>[HIVE-SCRIPT]</script>
            <param>[PARAM-VALUE]</param>
                ...
            <param>[PARAM-VALUE]</param>
            <file>[FILE-PATH]</file>
            ...
            <archive>[FILE-PATH]</archive>
            ...
        </hive>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

example:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
    ...
    <action name="myfirsthivejob">
        <hive xmlns="uri:oozie:hive-action:1.0">
            <resource-manager>foo:8032</resource-manager>
            <name-node>bar:8020</name-node>
            <prepare>
                <delete path="${jobOutput}"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <script>myscript.q</script>
            <param>InputDir=/home/tucu/input-data</param>
            <param>OutputDir=${jobOutput}</param>
        </hive>
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>
    </action>
    ...
</workflow-app>

三、Coordinator

Oozie Coordinator Specification 官网DOC

Triggering Mechanism(触发机制)
As of now, the Oozie coordinator supports two of the most commom triggering mechanisms, namely time(基于时间) and data availability(数据可用性).

四、Oozie和DolphinScheduler对比

为者常成,行者常至