hive -e 和 -f 通过 beeline 执行文件中的 hive sql 语句
一、hive beeline详解
Hive客户端工具后续将使用Beeline 替代HiveCLI ,并且后续版本也会废弃掉HiveCLI 客户端工具,Beeline是 Hive 0.11版本引入的新命令行客户端工具,它是基于SQLLine CLI的JDBC客户端。
Beeline支持嵌入模式(embedded mode)和远程模式(remote mode)。在嵌入式模式下,运行嵌入式的Hive(类似Hive CLI),而远程模式可以通过Thrift连接到独立的HiveServer2进程上。从Hive 0.14版本开始,Beeline使用HiveServer2工作时,它也会从HiveServer2输出日志信息到STDERR。
1. beeline 的常用参数
Usage: java org.apache.hive.cli.beeline.BeeLine
-u <database url> the JDBC URL to connect to
-n <username> the username to connect as
-p <password> the password to connect as
-d <driver class> the driver class to use
-i <init file> script file for initialization
-e <query> query that should be executed
-f <exec file> script file that should be executed
-w (or) --password-file <password file> the password file to read password from
--hiveconf property=value Use value for given property
--hivevar name=value hive variable name and value
This is Hive specific settings in which variables
can be set at session level and referenced in Hive
commands or queries.
--color=[true/false] control whether color is used for display
--showHeader=[true/false] show column names in query results
--headerInterval=ROWS; the interval between which heades are displayed
--fastConnect=[true/false] skip building table/column list for tab-completion
--autoCommit=[true/false] enable/disable automatic transaction commit
--verbose=[true/false] show verbose error messages and debug info
--showWarnings=[true/false] display connection warnings
--showNestedErrs=[true/false] display nested errors
--numberFormat=[pattern] format numbers using DecimalFormat pattern
--force=[true/false] continue running script even after errors
--maxWidth=MAXWIDTH the maximum width of the terminal
--maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns
--silent=[true/false] be more silent
--autosave=[true/false] automatically save preferences
--outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display
Note that csv, and tsv are deprecated - use csv2, tsv2 instead
--truncateTable=[true/false] truncate table column when it exceeds length
--delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |)
--isolation=LEVEL set the transaction isolation level
--nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string
--addlocaldriverjar=DRIVERJARNAME Add driver jar file in the beeline client side
--addlocaldrivername=DRIVERNAME Add drvier name needs to be supported in the beeline client side
--help display this message
Beeline version 2.3.4.spark2 by Apache Hive
[root@homaybd03 hive]# cd /usr/hdp/3.1.0.0-78/hive/
[root@homaybd03 hive]# bin/beeline -u jdbc:hive2://192.168.1.124:10000 -n hive -p hive
[root@homaybd03 hive]# /usr/hdp/3.1.0.0-78/hive/bin/beeline -u jdbc:hive2://192.168.1.124:10000 -n hive -p hive -f simple.sql
打印:
[hive@homaybd03 ~]$ /usr/hdp/3.1.0.0-78/hive/bin/beeline -u jdbc:hive2://192.168.1.124:10000 -n hive -p hive -f simple.sql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://192.168.1.124:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.1.124:10000> drop table if exists ds_db.yuce_ma_moxing_scr;
INFO : Compiling command(queryId=hive_20221005220907_f73ef1d2-b2d8-456a-9bd7-b29449312fc9): drop table if exists ds_db.pre_ma_cshx_12to24mons_scr
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20221005220907_f73ef1d2-b2d8-456a-9bd7-b29449312fc9); Time taken: 0.047 seconds
INFO : Executing command(queryId=hive_20221005220907_f73ef1d2-b2d8-456a-9bd7-b29449312fc9): drop table if exists ds_db.yuce_ma_moxing_scr;
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20221005220907_f73ef1d2-b2d8-456a-9bd7-b29449312fc9); Time taken: 0.38 seconds
INFO : OK
No rows affected (0.511 seconds)
二、示例
-
在服务器命令行中直接执行sql语句或者sql文件中的内容
有时需要在命令行中直接执行hive sql语句,希望不进入hive模式执行,这时候就可以使用hive -e 或者 hive -f 来实现 -
hive -e 和 hive -f
-
beeline -f ,beeline使用
beeline -u "jdbc:hive2://localhost:10000" -n yourname -p passwore -f hivescript.sql
查询示例:
[root@homaybd04 ~]# su hive
[hive@homaybd04 root]$ hive -e "select * from ds_db.student;select * from ds_db.student limit 1"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
22/10/05 21:26:53 [main]: INFO jdbc.HiveConnection: Connected to homaybd04:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO : Compiling command(queryId=hive_20221005212653_98b57836-abee-4a6d-b5d2-fada18a27bfd): select * from ds_db.student
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:student.id, type:int, comment:null), FieldSchema(name:student.name, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20221005212653_98b57836-abee-4a6d-b5d2-fada18a27bfd); Time taken: 0.161 seconds
INFO : Executing command(queryId=hive_20221005212653_98b57836-abee-4a6d-b5d2-fada18a27bfd): select * from ds_db.student
INFO : Completed executing command(queryId=hive_20221005212653_98b57836-abee-4a6d-b5d2-fada18a27bfd); Time taken: 0.006 seconds
INFO : OK
+-------------+---------------+
| student.id | student.name |
+-------------+---------------+
| 1 | zhangsan |
| 2 | lisi |
+-------------+---------------+
2 rows selected (0.655 seconds)
INFO : Compiling command(queryId=hive_20221005212653_dc1edb81-2e3e-453d-8e93-a009d9dfc31e): select * from ds_db.student limit 1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:student.id, type:int, comment:null), FieldSchema(name:student.name, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20221005212653_dc1edb81-2e3e-453d-8e93-a009d9dfc31e); Time taken: 0.162 seconds
INFO : Executing command(queryId=hive_20221005212653_dc1edb81-2e3e-453d-8e93-a009d9dfc31e): select * from ds_db.student limit 1
INFO : Completed executing command(queryId=hive_20221005212653_dc1edb81-2e3e-453d-8e93-a009d9dfc31e); Time taken: 0.005 seconds
INFO : OK
+-------------+---------------+
| student.id | student.name |
+-------------+---------------+
| 1 | zhangsan |
+-------------+---------------+
1 row selected (0.247 seconds)
Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
Closing: 0: jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
[hive@homaybd04 root]$
脚本执行多个SQL语句,实际是根据分号来一条一条执行:
故意将要查询的表名 student
写错为 student4
,然后看执行的结果,发现是串行的,一个执行错误,另一个就不再往下执行:
[hive@homaybd04 ~]$ hive -e "select * from ds_db.student4;select * from ds_db.student limit 1"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
22/10/05 22:02:12 [main]: INFO jdbc.HiveConnection: Connected to homaybd04:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'student4' (state=42S02,code=10001)
Closing: 0: jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
[hive@homaybd04 ~]$ hive -e "select * from ds_db.student;select * from ds_db.student4 limit 1"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
22/10/05 22:02:40 [main]: INFO jdbc.HiveConnection: Connected to homaybd04:10000
Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO : Compiling command(queryId=hive_20221005220240_e3666c67-231a-4c35-baed-4bf4620aa9a9): select * from ds_db.student
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:student.id, type:int, comment:null), FieldSchema(name:student.name, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20221005220240_e3666c67-231a-4c35-baed-4bf4620aa9a9); Time taken: 0.122 seconds
INFO : Executing command(queryId=hive_20221005220240_e3666c67-231a-4c35-baed-4bf4620aa9a9): select * from ds_db.student
INFO : Completed executing command(queryId=hive_20221005220240_e3666c67-231a-4c35-baed-4bf4620aa9a9); Time taken: 0.004 seconds
INFO : OK
+-------------+---------------+
| student.id | student.name |
+-------------+---------------+
| 1 | zhangsan |
| 2 | lisi |
+-------------+---------------+
2 rows selected (0.337 seconds)
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'student4' (state=42S02,code=10001)
Closing: 0: jdbc:hive2://homaybd04:2181,homaybd05:2181,homaybd03:2181/default;password=hive;serviceDiscoveryMode=zooKeeper;user=hive;zooKeeperNamespace=hiveserver2
[hive@homaybd04 ~]$
查看后端的HiveServer2服务:http://192.168.1.124:10002/
用于执行的sql脚本可以在本地文件系统,也可以在HDFS上:
# 本地文件系统
hive -f /usr/file/simple.sql;
# HDFS文件系统
hive -f hdfs://hadoop001:8020/tmp/simple.sql;
其中simple.sql内容如下:
select * from emp;
二、插入数据
hive_test.py
from pyhive import hive
import sqlalchemy as sa
from sqlalchemy import create_engine, event
import pandas as pd
import time
import sys
def hive_cli(username, host, port, database):
conn = hive.connect(host=host, port=port, username=username, database=database)
return conn
def hive_engine(username, password, host, port, database):
"""
https://github.com/pandas-dev/pandas/issues/15276
:param username:
:param password:
:param host:
:param port:
:param database:
:return:
"""
engine = create_engine('hive://{}:{}@{}:{}/{}?auth=LDAP'.format(username, password, host, port, database))
return engine
def hive_engine_old(username, password, host, port, database):
"""
https://github.com/pandas-dev/pandas/issues/15276
:param username:
:param password:
:param host:
:param port:
:param database:
:return:
"""
# engine = create_engine('hive://{}:{}@{}:{}/{}?auth=LDAP'.format(username, password, host, port, database))
engine = create_engine('hive://{}:{}@{}:{}/{}?auth=CUSTOM'.format(username, password, host, port, database))
return engine
def get_table_data(engine, table_name, offset=1, limit=10, order_by=None) -> pd.DataFrame:
table = sa.Table(table_name, sa.MetaData(), autoload=True, autoload_with=engine)
if order_by is not None and order_by != "":
data = pd.read_sql(table.select().order_by(order_by).offset(offset).limit(limit), engine)
else:
data = pd.read_sql(table.select().offset(offset).limit(limit), engine)
return data
def save_data(engine, table_name, data, if_exists="replace"):
result = data.to_sql(table_name, con=engine, index=False, if_exists=if_exists, method="multi")
return result
def timeit(f):
def wrapper(*args, **kwargs):
start_time = time.time()
res = f(*args, **kwargs)
end_time = time.time()
print("%s函数运行时间:%.2f" % (f.__name__, end_time - start_time))
return res
return wrapper
@timeit
def main():
df = pd.read_csv("./train.csv")
# print(df.head(1))
# print("打印数据")
# 追加数据
total_list = []
for i in range(0, 125):
print(f"开始合并 {i}")
total_list.append(df)
# print(total_list)
username = "hive"
password = "hive"
host = '192.168.1.124'
port = 10000
database = 'test_db'
print("连接数据库")
engine = hive_engine(username, password, host, port, database)
print("数据库已连接")
print("条数:", len(total_list))
res_df = pd.concat(total_list)
print("总数:", len(res_df))
res = save_data(engine, "fanqi_train_tbl", res_df, if_exists="replace")
print(res)
# print(df)
if __name__ == '__main__':
main()
查看执行步骤:
HiveServer2服务:http://192.168.1.124:10002/
查看Hadoop集群:http://192.168.1.121:8088/cluster
相关文章:
Hive Beeline 详解
官网 | HiveServer2 Clients
Replace Hive CLI with Beeline on a cluster with Sentry
Hive CLI和Beeline命令行的基本使用
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)