第八章 Centos下单机部署sqoop-1.99.7
2021/8/18 7:07:09
本文主要是介绍第八章 Centos下单机部署sqoop-1.99.7,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
Sqoop官网下载地址:http://archive.apache.org/dist/sqoop/ Sqoop是建立Hadoop、Hive环境安装之上的,所以需要Hadoop和Hive的环境搭建,Hive即需要依赖于HDFS又需要依赖YARN。安装好Hadoop后需要进行启动HDFS和YARN。
一、下载安装包
[root@stg-airflow001 ~]$ cd /data/software/ [root@stg-airflow001 /data/software]$ wget http://archive.apache.org/dist/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz [root@stg-airflow001 /data/software]$ ll 总用量 1398724 -rw-r--r-- 1 root root 102436055 7月 6 2020 sqoop-1.99.7-bin-hadoop200.tar.gz
二、解压安装包
[root@stg-airflow001 /data/software]$ tar xf sqoop-1.99.7-bin-hadoop200.tar.gz -C /opt/ [root@stg-airflow001 /data/software]$ cd /opt/ [root@stg-airflow001 /opt]$ ll 总用量 0 drwxr-xr-x 8 root root 157 8月 6 12:58 sqoop-1.99.7-bin-hadoop200
三、创建软连接
[root@stg-airflow001 /opt]$ ln -s sqoop-1.99.7-bin-hadoop200 sqoop [root@stg-airflow001 /opt]$ ll 总用量 0 lrwxrwxrwx 1 root root 26 8月 6 14:03 sqoop -> sqoop-1.99.7-bin-hadoop200 drwxr-xr-x 8 root root 157 8月 6 12:58 sqoop-1.99.7-bin-hadoop200
四、配置环境变量
[root@stg-airflow001 /opt]$ vim /etc/profile.d/sqoop.sh export PATH=/opt/sqoop/bin:$PATH
五、刷新环境变量
[root@stg-airflow001 /opt]$ source /etc/profile.d/sqoop.sh
六、备份配置文件
[root@stg-airflow001 /opt]$ cd /opt/sqoop/conf/ [root@stg-airflow001 /opt/sqoop/conf]$ ll 总用量 28 -rw-r--r-- 1 root root 1584 7月 20 2016 sqoop_bootstrap.properties -rw-r--r-- 1 root root 9797 8月 6 14:42 sqoop.properties [root@stg-airflow001 /opt/sqoop/conf]$ cp sqoop.properties sqoop.properties.bak [root@stg-airflow001 /opt/sqoop/conf]$ ll 总用量 28 -rw-r--r-- 1 root root 1584 7月 20 2016 sqoop_bootstrap.properties -rw-r--r-- 1 root root 9797 8月 6 14:42 sqoop.properties -rw-r--r-- 1 root root 9714 7月 20 2016 sqoop.properties.bak
七、修改配置文件
# 1.方式一:默认连接 [root@stg-airflow001 /opt/sqoop/conf]$ vim sqoop.properties ... ... # Hadoop configuration directory org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/etc/hadoop/ # 进行全局替换 # 设置日志文件的目录 0,%s/@LOGDIR@/\/data\/hadoop\/sqoop\/logs/g # 设置数据的存放目录 0,%s/@BASEDIR@/\/data\/hadoop\/sqoop\/bin\/BASEDIR/g
# 2.方式二:连接mysql [root@stg-airflow001 /opt/sqoop/conf]$ vim sqoop.properties ... ... # Hadoop configuration directory org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/etc/hadoop/ # JDBC repository provider configuration org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED org.apache.sqoop.repository.jdbc.maximum.connections=10 org.apache.sqoop.repository.jdbc.url=jdbmysql://161.89.107.66:6301/sqoop?createDatabaseIfNotExist=true org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver org.apache.sqoop.repository.jdbc.user=hive org.apache.sqoop.repository.jdbc.password=hive # System properties for embedded Derby configuration org.apache.sqoop.repository.sysprop.derby.stream.error.file=@LOGDIR@/mysqlrepo.log org.apache.sqoop.jetty.port=12000 #配置第三方jar包位置(没有配置) org.apache.sqoop.connector.external.loadpath=/usr/local/sqoop-1.99.7/lib
八、上传驱动包
[root@stg-airflow001 ~]$ cd /opt/sqoop/server/lib/ [root@stg-airflow001 /opt/sqoop/server/lib]$ rz [root@stg-airflow001 /opt/sqoop/server/lib]$ ll 总用量 68656 -rw-r--r-- 1 root root 2415211 8月 6 14:44 mysql-connector-java-8.0.23.jar
九、启动Sqoop
1.初始化Sqoop
[root@stg-airflow001 ~]$ sqoop2-tool upgrade Setting conf dir: /opt/sqoop/bin/../conf Sqoop home directory: /opt/sqoop Sqoop tool executor: Version: 1.99.7 Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine Running tool: class org.apache.sqoop.tools.tool.UpgradeTool 0 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
2.sqoop验证
[root@stg-airflow001 ~]$ sqoop2-tool verify Setting conf dir: /opt/sqoop/bin/../conf Sqoop home directory: /opt/sqoop Sqoop tool executor: Version: 1.99.7 Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine Running tool: class org.apache.sqoop.tools.tool.VerifyTool 0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server. 15 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread Verification was successful. Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly. ps:此步骤后,mysql数据库中会创建数据库名为SQOOP的数据库,用以存储sqoop的metadata 在执行sqoop2-tool verify时报错 2019-05-24T12:15:45,449 ERROR [main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP" 在mysql中执行 set global sql_mode ='ANSI_QUOTES'; 或者在mysql的配置文档my-default.cnf中添加 sql_mode=ANSI_QUOTES
3.启动sqoop2服务
[root@stg-airflow001 ~]$ sqoop2-server start Setting conf dir: /opt/sqoop/bin/../conf Sqoop home directory: /opt/sqoop Starting the Sqoop2 server... 0 [main] INFO org.apache.sqoop.core.SqoopServer - Initializing Sqoop server. 28 [main] INFO org.apache.sqoop.core.PropertiesConfigurationProvider - Starting config file poller thread Sqoop2 server started.
十、运行sqoop客户端
1.运行sqoop客户端
[root@stg-airflow001 ~]$ sqoop2-shell Setting conf dir: /opt/sqoop/bin/../conf Sqoop home directory: /opt/sqoop 八月 13, 2021 9:03:52 下午 java.util.prefs.FileSystemPreferences$1 run 信息: Created user preferences directory. Sqoop Shell: Type 'help' or '\h' for help. sqoop:000>
2.设置显示错误信息
sqoop:000> set option --name verbose --value true Verbose option was changed to true
3.连接sqoop服务端
sqoop:000> set server --host localhost Server is set successfully
4.验证连接
sqoop:000> show version -all client version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable server version: Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016 API versions: [v1]
5.创建link对象
# 1.检查sqoop服务已经注册的connectors sqoop:000> show connector +------------------------+---------+------------------------------------------------------------+----------------------+ | Name | Version | Class | Supported Directions | +------------------------+---------+------------------------------------------------------------+----------------------+ | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO | | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO | | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO | | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO | | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO | | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO | | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO | +------------------------+---------+------------------------------------------------------------+----------------------+
sqoop:000> create link -c generic-jdbc-connector Creating link for connector with name generic-jdbc-connector Please fill following values to create new link object Name: mysql-link Database connection Driver class: com.mysql.jdbc.Driver Connection String: jdbmysql://node1:3306/ecs Username: root Password: **** Fetch Size: Connection Properties: There are currently 0 values in the map: entry# SQL Dialect Identifier enclose: 备注:定界符(输入的空格) New link was successfully created with validation status OK and name mysql-link sqoop:000> show link +------------+------------------------+---------+ | Name | Connector Name | Enabled | +------------+------------------------+---------+ | mysql-link | generic-jdbc-connector | true | +------------+------------------------+---------+ sqoop:000> show link -a 1 link(s) to show: link with name mysql-link (Enabled: true, Created by root at 5/24/19 1:22 PM, Updated by root at 5/24/19 1:22 PM) Using Connector generic-jdbc-connector with name {1} Database connection Driver class: com.mysql.jdbc.Driver Connection String: jdbmysql://node1:3306/ecs Username: root Password: Fetch Size: Connection Properties: SQL Dialect Identifier enclose: sqoop:000> create link -c hdfs-connector Creating link for connector with name hdfs-connector Please fill following values to create new link object Name: hdfs-link HDFS cluster URI: hdfs://node1:9820 Conf directory: /home/hadoop-3.2.0/etc/hadoop Additional configs:: There are currently 0 values in the map: entry# New link was successfully created with validation status OK and name hdfs-link sqoop:000> HDFS 导出到mysql sqoop:000> create job -f "hdfs-link" -t "mysql-link" Creating job for links with from name hdfs-link and to name mysql-link Please fill following values to create new job object Name: hdfs_mysql_job Input configuration Input directory: /user/data Override null value: Null value: Incremental import Incremental type: 0 : NONE 1 : NEW_FILES Choose: 0 Last imported date: Database target Schema name: ecs Table name: users Column names: There are currently 0 values in the list: element# Staging table: Clear stage table: Throttling resources Extractors: Loaders: Classpath configuration Extra mapper jars: There are currently 0 values in the list: element# New job was successfully created with validation status OK and name hdfs_mysql_job sqoop:000> show job +----+----------------+----------------------------+-------------------------------------+---------+ | Id | Name | From Connector | To Connector | Enabled | +----+----------------+----------------------------+-------------------------------------+---------+ | 1 | hdfs_mysql_job | hdfs-link (hdfs-connector) | mysql-link (generic-jdbc-connector) | true | +----+----------------+----------------------------+-------------------------------------+---------+ 修改job sqoop:000> update job -n hdfs_mysql_job 启动job sqoop:000> start job -n hdfs_mysql_job Submission details Job Name: hdfs_mysql_job Server URL: http://node1:12000/sqoop/ Created by: root Creation date: 2019-05-24 14:27:45 HKT Lastly updated by: root External ID: job_1558650816925_0003 http://node1:8088/proxy/application_1558650816925_0003/ 2019-05-24 14:27:45 HKT: BOOTING - Progress is not available 查看job状态 sqoop:000> status job -n hdfs_mysql_job Submission details Job Name: hdfs_mysql_job Server URL: http://node1:12000/sqoop/ Created by: root Creation date: 2019-05-24 14:27:45 HKT Lastly updated by: root External ID: job_1558650816925_0003 http://node1:8088/proxy/application_1558650816925_0003/ 2019-05-24 14:30:10 HKT: BOOTING - 0.00 % 启动job,并显示执行状态 sqoop:000> start job -n hdfs_mysql_job -s 停止job sqoop:000> stop job -n hdfs_mysql_job mysql导入到hadoop sqoop:000> create job -f "mysql-link" -t "hdfs-link" Name: mysql_hdfs_job Database source Schema name: ecs Table name: role SQL statement: Column names: There are currently 0 values in the list: element# Partition column: Partition column nullable: Boundary query: Incremental read Check column: Last value: Target configuration Override null value: Null value: File format: 0 : TEXT_FILE 1 : SEQUENCE_FILE 2 : PARQUET_FILE Choose: 0 Compression code 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 Custom code Error message: Can't be null nor empty Output directory: /user/data Append mode: Throttling resources Extractors: Loaders: Classpath configuration Extra mapper jars: There are currently 0 values in the list: element# New job was successfully created with validation status OK and name mysql_hdfs_job 出现错误时,开启显示错误详细信息 set option --name verbose --value true 启动job时报错 Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_JDBC_CONNECTOR_0003:Unable to access meta data - 查看错误信息,使用ecs账户时错误信息 Caused by: Exception: java.sql.SQLSyntaxErrorException Message: Unknown table 'role' in information_schema 使用root账户时,错误信息是: Caused by: Exception: java.sql.SQLSyntaxErrorException Message: Table 'SQOOP.role' doesn't exist 看起来是识别不到设置的数据ecs,默认取的第一个数据库 使用sqoop2从mysql导入到hdfs的测试失败 sqoop2-shell报错 [root@node1 ~]# sqoop2-shell Setting conf dir: /usr/local/sqoop-1.99.7/bin/../conf Sqoop home directory: /usr/local/sqoop-1.99.7 Exception in thread "main" java.lang.ExceptionInInitializerError at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66) at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallConstructorSite(CallSiteArray.java:87) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247) at org.codehaus.groovy.tools.shell.Groovysh.<clinit>(Groovysh.groovy:54) at org.apache.sqoop.shell.SqoopShell.main(SqoopShell.java:75) Caused by: groovy.lang.GroovyRuntimeException: Conflicting module versions. Module [groovy-all is loaded in version 2.4.11 and you are trying to load version 2.4.0 解决方法:找出groovy.jar删除 [root@node1 sqoop-1.99.7]# find . -name 'groo*' [root@node1 sqoop-1.99.7]# rm -r ./shell/lib/groovy-all-2.4.0.jar rm: remove regular file ‘./shell/lib/groovy-all-2.4.0.jar’? y
这篇关于第八章 Centos下单机部署sqoop-1.99.7的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2025-01-03用LangChain构建会检索和搜索的智能聊天机器人指南
- 2025-01-03图像文字理解,OCR、大模型还是多模态模型?PalliGema2在QLoRA技术上的微调与应用
- 2025-01-03混合搜索:用LanceDB实现语义和关键词结合的搜索技术(应用于实际项目)
- 2025-01-03停止思考数据管道,开始构建数据平台:介绍Analytics Engineering Framework
- 2025-01-03如果 Azure-Samples/aks-store-demo 使用了 Score 会怎样?
- 2025-01-03Apache Flink概述:实时数据处理的利器
- 2025-01-01使用 SVN合并操作时,怎么解决冲突的情况?-icode9专业技术文章分享
- 2025-01-01告别Anaconda?试试这些替代品吧
- 2024-12-31自学记录鸿蒙API 13:实现人脸比对Core Vision Face Comparator
- 2024-12-31自学记录鸿蒙 API 13:骨骼点检测应用Core Vision Skeleton Detection