Spark操作数据表入门【进行数据写入和读出】————附带详细步骤
2021/7/9 6:07:55
本文主要是介绍Spark操作数据表入门【进行数据写入和读出】————附带详细步骤,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
文章目录
- 0 准备
- 1 使用脚本运行
- 2 使用shell执行
- 3 使用脚本执行的结果
0 准备
运行路径为:/usr/app/spark-2.4.7-bin-hadoop2.7
1 使用脚本运行
执行脚本运行下面的python文件:
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native export PATH=$PATH:$LD_LIBRARY_PATH bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.3 bin/testSpark.py /usr/app/spark-2.4.7-bin-hadoop2.7/bin/spark-submit --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4 /usr/app/spark-2.4.7-bin-hadoop2.7/bin/testSpark.py
spark对数据进行读取和创建的python文件:
# coding=utf-8 from pyspark.context import SparkContext from pyspark.sql.session import SparkSession spark = SparkSession.builder \ .master("local[*]") \ .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .config("spark.default.parallelism", 2) \ .appName("hudi-datalake-test") \ .getOrCreate() sparkContext = spark.sparkContext; basePath = '/user/hive/warehouse/test_spark_mor' tripsSnapshotDF = spark. \ read. \ format("hudi"). \ load(basePath + "/*/*/*/*") tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") print('原始文件:') df=spark.sql("select * from hudi_trips_snapshot where category = 'E'") df.show() # 写数据: savePath = '/user/hive/warehouse/test_spark_mor10' hudi_options = { 'hoodie.table.name': 'test_spark_mor10', 'hoodie.datasource.write.recordkey.field': 'id', 'hoodie.datasource.write.partitionpath.field': 'create_date', 'hoodie.datasource.write.table.name': 'test_spark_mor10', 'hoodie.datasource.write.operation': 'insert', 'hoodie.datasource.write.precombine.field': 'ts', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2 } df.write.format("hudi"). \ options(**hudi_options). \ mode("append"). \ save(savePath) # 再读新数据 basePath = savePath tripsSnapshotDF = spark. \ read. \ format("hudi"). \ load(basePath + "/*/*/*/*") tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") print('新数据:') spark.sql("select * from hudi_trips_snapshot ").show()
2 使用shell执行
启动的shell的方法如下:
bin/pyspark --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4
启动后执行的指令如下:
basePath = '/user/hive/warehouse/test_spark_mor' tripsSnapshotDF = spark. \ read. \ format("hudi"). \ load(basePath + "/*/*/*/*") tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") print('原始文件') df=spark.sql("select * from hudi_trips_snapshot where category = 'E'") df.show() # 写数据: savePath = '/user/hive/warehouse/test_spark_mor10' hudi_options = { 'hoodie.table.name': 'test_spark_mor10', 'hoodie.datasource.write.recordkey.field': 'id', 'hoodie.datasource.write.partitionpath.field': 'create_date', 'hoodie.datasource.write.table.name': 'test_spark_mor10', 'hoodie.datasource.write.operation': 'insert', 'hoodie.datasource.write.precombine.field': 'ts', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2 } df.write.format("hudi"). \ options(**hudi_options). \ mode("append"). \ save(savePath) # 再读新数据 basePath = savePath tripsSnapshotDF = spark. \ read. \ format("hudi"). \ load(basePath + "/*/*/*/*") tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") print('新数据:') spark.sql("select * from hudi_trips_snapshot ").show()
3 使用脚本执行的结果
部分结果如下:
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152), which has no missing parents 21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 23.9 KB, free 365.4 MB) 21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 11.4 KB, free 365.4 MB) 21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on hdp-jk-1:36125 (size: 11.4 KB, free: 366.2 MB) 21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks 21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8527 bytes) 21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2) 21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:49 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/30 13:54:49 INFO codegen.CodeGenerator: Code generated in 39.268569 ms 21/06/30 13:54:49 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 1400 bytes result sent to driver 21/06/30 13:54:49 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 283 ms on localhost (executor driver) (1/1) 21/06/30 13:54:49 INFO scheduler.DAGScheduler: ResultStage 2 (isEmpty at HoodieSparkSqlWriter.scala:181) finished in 0.309 s 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Job 2 finished: isEmpty at HoodieSparkSqlWriter.scala:181, took 0.317609 s 21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [] 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [] 21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [] 21/06/30 13:54:49 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210630135447 action: commit 21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210630135447__commit__REQUESTED] 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED]] 21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:49 INFO client.SparkRDDWriteClient: Successfully synced to metadata table 21/06/30 13:54:49 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now 21/06/30 13:54:49 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Registering RDD 11 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 0 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Got job 3 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158) 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3) 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3) 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents 21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 26.8 KB, free 365.3 MB) 21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 12.8 KB, free 365.3 MB) 21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on hdp-jk-1:36125 (size: 12.8 KB, free: 366.2 MB) 21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks 21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes) 21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 4, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes) 21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3) 21/06/30 13:54:49 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 4) 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_0 stored as values in memory (estimated size 196.0 B, free 365.3 MB) 21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_0 in memory on hdp-jk-1:36125 (size: 196.0 B, free: 366.2 MB) 21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"})) 21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_1 stored as values in memory (estimated size 251.0 B, free 365.3 MB) 21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_1 in memory on hdp-jk-1:36125 (size: 251.0 B, free: 366.2 MB) 21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1370 bytes result sent to driver 21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 4). 1370 bytes result sent to driver 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 392 ms on localhost (executor driver) (1/2) 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 4) in 391 ms on localhost (executor driver) (2/2) 21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 21/06/30 13:54:50 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.432 s 21/06/30 13:54:50 INFO scheduler.DAGScheduler: looking for newly runnable stages 21/06/30 13:54:50 INFO scheduler.DAGScheduler: running: Set() 21/06/30 13:54:50 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4) 21/06/30 13:54:50 INFO scheduler.DAGScheduler: failed: Set() 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents 21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 3.7 KB, free 365.3 MB) 21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 2.1 KB, free 365.3 MB) 21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on hdp-jk-1:36125 (size: 2.1 KB, free: 366.2 MB) 21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes) 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, localhost, executor driver, partition 0, ANY, 7662 bytes) 21/06/30 13:54:50 INFO executor.Executor: Running task 1.0 in stage 4.0 (TID 5) 21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 4.0 (TID 6) 21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms 21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms 21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 4.0 (TID 5). 1098 bytes result sent to driver 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 5) in 128 ms on localhost (executor driver) (1/2) 21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 4.0 (TID 6). 1176 bytes result sent to driver 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 143 ms on localhost (executor driver) (2/2) 21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 21/06/30 13:54:50 INFO scheduler.DAGScheduler: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.185 s 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.737207 s 21/06/30 13:54:50 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=4, numUpdates=0}, partitionStat={2021/06/29=WorkloadStat {numInserts=1, numUpdates=0}, 2021/06/30=WorkloadStat {numInserts=3, numUpdates=0}}, operationType=INSERT} 21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit.requested 21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight 21/06/30 13:54:50 INFO commit.UpsertPartitioner: AvgRecordSize => 1024 21/06/30 13:54:50 INFO spark.SparkContext: Starting job: collectAsMap at UpsertPartitioner.java:252 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Got job 4 (collectAsMap at UpsertPartitioner.java:252) with 2 output partitions 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252) 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251), which has no missing parents 21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11 stored as values in memory (estimated size 231.2 KB, free 365.1 MB) 21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 82.6 KB, free 365.0 MB) 21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on hdp-jk-1:36125 (size: 82.6 KB, free: 366.1 MB) 21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 2 tasks 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 7733 bytes) 21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 5.0 (TID 8, localhost, executor driver, partition 1, PROCESS_LOCAL, 7733 bytes) 21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 7) 21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 5.0 (TID 8) 21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 5.0 (TID 8). 705 bytes result sent to driver 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 5.0 (TID 8) in 73 ms on localhost (executor driver) (1/2) 21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 7). 705 bytes result sent to driver 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 7) in 142 ms on localhost (executor driver) (2/2) 21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 21/06/30 13:54:51 INFO scheduler.DAGScheduler: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252) finished in 0.225 s 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Job 4 finished: collectAsMap at UpsertPartitioner.java:252, took 0.235333 s 21/06/30 13:54:51 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:51 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/29 Small Files => [] 21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 1, totalInsertBuckets => 1, recordsPerBucket => 122880 21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/29 => [(InsertBucket {bucketNumber=0, weight=1.0},1.0)] 21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/30 Small Files => [] 21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 3, totalInsertBuckets => 1, recordsPerBucket => 122880 21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/30 => [(InsertBucket {bucketNumber=1, weight=1.0},1.0)] 21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total Buckets :2, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=20421052-ee81-4427-ab8f-464d81b40b8f, partitionPath=2021/06/29}, 1=BucketInfo {bucketType=INSERT, fileIdPrefix=49a56794-40a5-45c2-bd4a-08a566590703, partitionPath=2021/06/30}}, Partition to insert buckets => {2021/06/29=[(InsertBucket {bucketNumber=0, weight=1.0},1.0)], 2021/06/30=[(InsertBucket {bucketNumber=1, weight=1.0},1.0)]}, UpdateLocations mapped to buckets =>{} 21/06/30 13:54:51 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210630135447 21/06/30 13:54:51 INFO spark.SparkContext: Starting job: count at HoodieSparkSqlWriter.scala:470 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Registering RDD 15 (mapToPair at BaseSparkCommitActionExecutor.java:192) as input to shuffle 1 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Got job 5 (count at HoodieSparkSqlWriter.scala:470) with 2 output partitions 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470) 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6) 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6) 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192), which has no missing parents 21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12 stored as values in memory (estimated size 253.2 KB, free 364.8 MB) 21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 92.7 KB, free 364.7 MB) 21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on hdp-jk-1:36125 (size: 92.7 KB, free: 366.0 MB) 21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 2 tasks 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 9, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes) 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 10, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes) 21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 6.0 (TID 10) 21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 9) 21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_0 locally 21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_1 locally 21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 6.0 (TID 9). 1370 bytes result sent to driver 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 9) in 139 ms on localhost (executor driver) (1/2) 21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 6.0 (TID 10). 1327 bytes result sent to driver 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 10) in 131 ms on localhost (executor driver) (2/2) 21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 21/06/30 13:54:51 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (mapToPair at BaseSparkCommitActionExecutor.java:192) finished in 0.195 s 21/06/30 13:54:51 INFO scheduler.DAGScheduler: looking for newly runnable stages 21/06/30 13:54:51 INFO scheduler.DAGScheduler: running: Set() 21/06/30 13:54:51 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7) 21/06/30 13:54:51 INFO scheduler.DAGScheduler: failed: Set() 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470), which has no missing parents 21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13 stored as values in memory (estimated size 325.0 KB, free 364.3 MB) 21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 118.5 KB, free 364.2 MB) 21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on hdp-jk-1:36125 (size: 118.5 KB, free: 365.9 MB) 21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 2 tasks 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 11, localhost, executor driver, partition 0, ANY, 7662 bytes) 21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 12, localhost, executor driver, partition 1, ANY, 7662 bytes) 21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 7.0 (TID 11) 21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 7.0 (TID 12) 21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks 21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks 21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records 21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records 21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread 21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread 21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records 21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records 21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet.marker.CREATE 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet.marker.CREATE 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz] 21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz] 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 99 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 102 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 85 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 155 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 128 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 93 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 111 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 94 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 101 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 105 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 140 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 83 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 131 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 72 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 84 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 59 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 123 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned shuffle 0 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 134 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 115 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 97 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 139 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_11_piece0 on hdp-jk-1:36125 in memory (size: 82.6 KB, free: 366.0 MB) 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 67 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 124 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 138 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 103 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 81 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 82 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 95 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 79 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 156 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 89 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 98 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 146 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 77 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 57 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 96 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 141 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 100 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 86 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on hdp-jk-1:36125 in memory (size: 12.8 KB, free: 366.0 MB) 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 117 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 147 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 150 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 66 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 114 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 65 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 116 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 109 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 113 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 60 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 62 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 78 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 143 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 104 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 135 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 112 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 58 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 63 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_10_piece0 on hdp-jk-1:36125 in memory (size: 2.1 KB, free: 366.0 MB) 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 142 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 76 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 107 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 121 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 71 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 129 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 137 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 91 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 125 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 74 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 148 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 133 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 144 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 73 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 64 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 88 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 120 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 149 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 122 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 92 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 87 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 126 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 154 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 106 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 136 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 152 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 61 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 108 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 118 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_12_piece0 on hdp-jk-1:36125 in memory (size: 92.7 KB, free: 366.1 MB) 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 145 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 151 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 70 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 90 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 119 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 110 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 127 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 80 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on hdp-jk-1:36125 in memory (size: 11.4 KB, free: 366.1 MB) 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/30 with fileId 49a56794-40a5-45c2-bd4a-08a566590703-0 21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/29 with fileId 20421052-ee81-4427-ab8f-464d81b40b8f-0 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 68 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 132 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 69 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 153 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 130 21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 75 21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 20421052-ee81-4427-ab8f-464d81b40b8f-0 as we are done with all the records 1 21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 240 21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 49a56794-40a5-45c2-bd4a-08a566590703-0 as we are done with all the records 3 21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 461 21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/30 fileID 49a56794-40a5-45c2-bd4a-08a566590703-0, took 1012 ms. 21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads 21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_1 stored as values in memory (estimated size 301.0 B, free 365.0 MB) 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_1 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB) 21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/29 fileID 20421052-ee81-4427-ab8f-464d81b40b8f-0, took 1044 ms. 21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads 21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_0 stored as values in memory (estimated size 301.0 B, free 365.0 MB) 21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_0 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB) 21/06/30 13:54:52 INFO executor.Executor: Finished task 1.0 in stage 7.0 (TID 12). 1428 bytes result sent to driver 21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 12) in 1256 ms on localhost (executor driver) (1/2) 21/06/30 13:54:52 INFO executor.Executor: Finished task 0.0 in stage 7.0 (TID 11). 1428 bytes result sent to driver 21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 11) in 1261 ms on localhost (executor driver) (2/2) 21/06/30 13:54:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 21/06/30 13:54:52 INFO scheduler.DAGScheduler: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470) finished in 1.351 s 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Job 5 finished: count at HoodieSparkSqlWriter.scala:470, took 1.556432 s 21/06/30 13:54:52 INFO hudi.HoodieSparkSqlWriter$: No errors. Proceeding to commit the write. 21/06/30 13:54:52 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Got job 6 (collect at SparkRDDWriteClient.java:120) with 2 output partitions 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 9 (collect at SparkRDDWriteClient.java:120) 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 8) 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:52 INFO scheduler.DAGScheduler: Submitting ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120), which has no missing parents 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14 stored as values in memory (estimated size 325.5 KB, free 364.6 MB) 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 118.8 KB, free 364.5 MB) 21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on hdp-jk-1:36125 (size: 118.8 KB, free: 366.0 MB) 21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120) (first 15 tasks are for partitions Vector(0, 1)) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with 2 tasks 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 9.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 9.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes) 21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 9.0 (TID 13) 21/06/30 13:54:53 INFO executor.Executor: Running task 1.0 in stage 9.0 (TID 14) 21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_1 locally 21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_0 locally 21/06/30 13:54:53 INFO executor.Executor: Finished task 1.0 in stage 9.0 (TID 14). 1486 bytes result sent to driver 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 9.0 (TID 14) in 125 ms on localhost (executor driver) (1/2) 21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 9.0 (TID 13). 1486 bytes result sent to driver 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 (TID 13) in 130 ms on localhost (executor driver) (2/2) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 9 (collect at SparkRDDWriteClient.java:120) finished in 0.219 s 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 6 finished: collect at SparkRDDWriteClient.java:120, took 0.225906 s 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]] 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:53 INFO util.CommitUtils: Creating metadata for INSERT numWriteStats:2numReplaceFileIds:0 21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 7 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 10 (collect at HoodieSparkEngineContext.java:78) 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15 stored as values in memory (estimated size 72.1 KB, free 364.4 MB) 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.4 MB) 21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_15_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 366.0 MB) 21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 10.0 with 1 tasks 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes) 21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 10.0 (TID 15) 21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 10.0 (TID 15). 833 bytes result sent to driver 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 15) in 40 ms on localhost (executor driver) (1/1) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 10 (collect at HoodieSparkEngineContext.java:78) finished in 0.071 s 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 7 finished: collect at HoodieSparkEngineContext.java:78, took 0.075028 s 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]] 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Committing 20210630135447 action commit 21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 8 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 11 (collect at HoodieSparkEngineContext.java:78) 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 72.1 KB, free 364.4 MB) 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.3 MB) 21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 365.9 MB) 21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 11.0 with 1 tasks 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes) 21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 11.0 (TID 16) 21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 11.0 (TID 16). 876 bytes result sent to driver 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 11.0 (TID 16) in 36 ms on localhost (executor driver) (1/1) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 11 (collect at HoodieSparkEngineContext.java:78) finished in 0.067 s 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 8 finished: collect at HoodieSparkEngineContext.java:78, took 0.073743 s 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210630135447__commit__INFLIGHT] 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Completed [==>20210630135447__commit__INFLIGHT] 21/06/30 13:54:53 INFO spark.SparkContext: Starting job: foreach at HoodieSparkEngineContext.java:83 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 9 (foreach at HoodieSparkEngineContext.java:83) with 1 output partitions 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83) 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83), which has no missing parents 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 71.2 KB, free 364.3 MB) 21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 25.7 KB, free 364.2 MB) 21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on hdp-jk-1:36125 (size: 25.7 KB, free: 365.9 MB) 21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 12.0 with 1 tasks 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 17, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes) 21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 12.0 (TID 17) 21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 12.0 (TID 17). 666 bytes result sent to driver 21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 17) in 37 ms on localhost (executor driver) (1/1) 21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83) finished in 0.066 s 21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 9 finished: foreach at HoodieSparkEngineContext.java:83, took 0.071339 s 21/06/30 13:54:53 INFO table.MarkerFiles: Removing marker directory at /user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED], [==>20210630135447__commit__INFLIGHT], [20210630135447__commit__COMPLETED]] 21/06/30 13:54:53 INFO table.HoodieTimelineArchiveLog: No Instants to archive 21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now 21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210630135453 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]] 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hive/warehouse/test_spark_mor10. Server=hdp-jk-1:35915, Timeout=300 21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:53 INFO view.AbstractTableFileSystemView: Took 1 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:53 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:53 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://hdp-jk-1:35915/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3) 21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]] 21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 1 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:54 INFO service.RequestHandler: TimeTakenMillis[Total=92, Refresh=81, handle=10, Check=1], Success=true, Query=basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3, Host=hdp-jk-1:35915, synced=false 21/06/30 13:54:54 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !! 21/06/30 13:54:54 INFO clean.CleanPlanner: Nothing to clean here. It is already clean 21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaner started 21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any 21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10 21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]] 21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST 21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating remote first table view 21/06/30 13:54:54 INFO client.SparkRDDWriteClient: Successfully synced to metadata table 21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Committed 20210630135447 21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Commit 20210630135447 successful! 21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Config.inlineCompactionEnabled ? false 21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Compaction Scheduled is Optional.empty 21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Is Async Compaction Enabled ? false 21/06/30 13:54:54 INFO client.AbstractHoodieClient: Stopping Timeline service !! 21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closing Timeline server 21/06/30 13:54:54 INFO service.TimelineService: Closing Timeline Service 21/06/30 13:54:54 INFO javalin.Javalin: Stopping Javalin ... 21/06/30 13:54:54 INFO javalin.Javalin: Javalin has stopped 21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:54 INFO service.TimelineService: Closed Timeline Service 21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closed Timeline server 21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 19 from persistence list 21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 19 21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 9 from persistence list 21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 9 21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:55 INFO hudi.DataSourceUtils: Getting table path.. 21/06/30 13:54:55 INFO util.TablePathUtils: Getting table path from path : hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/.aux/.bootstrap/.fileids 21/06/30 13:54:55 INFO hudi.DefaultSource: Obtained hudi table path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO hudi.DefaultSource: Is bootstrapped table => false 21/06/30 13:54:55 WARN hudi.DefaultSource: Loading Base File Only View. 21/06/30 13:54:55 INFO hudi.DefaultSource: Constructing hoodie (as parquet) data source with options :Map(path -> /user/hive/warehouse/test_spark_mor10/*/*/*/*, hoodie.datasource.query.type -> snapshot) 21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]] 21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties 21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading Active commit timeline for hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]] 21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/29) 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/29) =2, Time taken =2 21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/29, #FileGroups=1 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=1 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/29) =4 21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/30) 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/30) =2, Time taken =1 21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/30, #FileGroups=1 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=1 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/30) =4 21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30 21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups 21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations 21/06/30 13:54:55 INFO datasources.InMemoryFileIndex: It took 93 ms to list leaf files for 6 paths. 21/06/30 13:54:55 INFO spark.SparkContext: Starting job: resolveRelation at DefaultSource.scala:193 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Got job 10 (resolveRelation at DefaultSource.scala:193) with 1 output partitions 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (resolveRelation at DefaultSource.scala:193) 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193), which has no missing parents 21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 73.5 KB, free 364.2 MB) 21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 26.7 KB, free 364.1 MB) 21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on hdp-jk-1:36125 (size: 26.7 KB, free: 365.9 MB) 21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Adding task set 13.0 with 1 tasks 21/06/30 13:54:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 18, localhost, executor driver, partition 0, PROCESS_LOCAL, 7869 bytes) 21/06/30 13:54:55 INFO executor.Executor: Running task 0.0 in stage 13.0 (TID 18) 21/06/30 13:54:55 INFO executor.Executor: Finished task 0.0 in stage 13.0 (TID 18). 1308 bytes result sent to driver 21/06/30 13:54:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 18) in 197 ms on localhost (executor driver) (1/1) 21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 21/06/30 13:54:55 INFO scheduler.DAGScheduler: ResultStage 13 (resolveRelation at DefaultSource.scala:193) finished in 0.227 s 21/06/30 13:54:55 INFO scheduler.DAGScheduler: Job 10 finished: resolveRelation at DefaultSource.scala:193, took 0.230370 s 新数据: 21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Pruning directories with: 21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Post-Scan Filters: 21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Output Data Schema: struct<_hoodie_commit_time: string, _hoodie_commit_seqno: string, _hoodie_record_key: string, _hoodie_partition_path: string, _hoodie_file_name: string ... 9 more fields> 21/06/30 13:54:55 INFO execution.FileSourceScanExec: Pushed Filters: 21/06/30 13:54:55 INFO codegen.CodeGenerator: Code generated in 63.038365 ms 21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19 stored as values in memory (estimated size 292.0 KB, free 363.8 MB) 21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 25.9 KB, free 363.8 MB) 21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on hdp-jk-1:36125 (size: 25.9 KB, free: 365.9 MB) 21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 19 from showString at NativeMethodAccessorImpl.java:0 21/06/30 13:54:56 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4629786 bytes, open cost is considered as scanning 4194304 bytes. 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 205 21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 238 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 11 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0) 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 256 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 305 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 266 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 167 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 298 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 264 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_15_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 365.9 MB) 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 182 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 165 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 302 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 301 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 172 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 299 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_13_piece0 on hdp-jk-1:36125 in memory (size: 118.5 KB, free: 366.0 MB) 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 204 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 232 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 161 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_17_piece0 on hdp-jk-1:36125 in memory (size: 25.7 KB, free: 366.0 MB) 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 163 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 316 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 206 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 166 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 175 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 170 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 186 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 201 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 214 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 297 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 300 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 277 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 287 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 292 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 219 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 235 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 249 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 239 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 185 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 254 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 326 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 330 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned shuffle 1 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 222 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 194 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 311 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 278 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 276 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 328 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 215 21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20 stored as values in memory (estimated size 14.6 KB, free 364.5 MB) 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_16_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 366.1 MB) 21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 5.7 KB, free 364.5 MB) 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.1 MB) 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 310 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 270 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 291 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 160 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 195 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 288 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 296 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 237 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 227 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 279 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 178 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 236 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 202 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 282 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 226 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 271 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 319 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 284 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 309 21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 233 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 257 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 259 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 293 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks 21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 19, localhost, executor driver, partition 0, ANY, 8353 bytes) 21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 14.0 (TID 19) 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_14_piece0 on hdp-jk-1:36125 in memory (size: 118.8 KB, free: 366.2 MB) 21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet, range: 0-435537, partition values: [empty row] 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 320 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 174 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 177 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 190 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 245 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 262 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 164 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 253 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 295 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 285 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 191 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 213 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 216 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 203 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 228 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 231 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 269 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 324 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 221 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 329 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_18_piece0 on hdp-jk-1:36125 in memory (size: 26.7 KB, free: 366.2 MB) 21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 14.0 (TID 19). 1555 bytes result sent to driver 21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 19) in 63 ms on localhost (executor driver) (1/1) 21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0) finished in 0.091 s 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 11 finished: showString at NativeMethodAccessorImpl.java:0, took 0.130874 s 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 272 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 321 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 252 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 169 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 183 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 322 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 224 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 220 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 306 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 303 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 261 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 267 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 162 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 176 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 229 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 258 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 218 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 268 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 217 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 280 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 193 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 196 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 234 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 225 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 187 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 159 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 318 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 171 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 274 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 197 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 192 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 244 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 211 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 240 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 246 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 313 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 283 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 198 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 210 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 242 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 290 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 314 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 308 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 200 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 263 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 248 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 212 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 184 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 199 21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 12 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0) 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List() 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List() 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents 21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21 stored as values in memory (estimated size 14.6 KB, free 365.0 MB) 21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 5.7 KB, free 365.0 MB) 21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_21_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.2 MB) 21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:1184 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(1)) 21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 15.0 with 1 tasks 21/06/30 13:54:56 INFO storage.BlockManager: Removing RDD 19 21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 15.0 (TID 20, localhost, executor driver, partition 1, ANY, 8353 bytes) 21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 15.0 (TID 20) 21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet, range: 0-435428, partition values: [empty row] 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned RDD 19 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 307 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 168 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 243 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 289 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 247 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 275 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 209 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 157 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 286 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 173 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 331 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 241 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 251 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 158 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 294 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 179 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 273 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 180 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 323 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 181 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 250 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 304 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 327 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 315 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 255 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 223 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 189 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 281 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 188 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 230 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 207 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 312 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 265 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 317 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 325 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 208 21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 260 21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz] 21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 15.0 (TID 20). 1476 bytes result sent to driver 21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 15.0 (TID 20) in 66 ms on localhost (executor driver) (1/1) 21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0) finished in 0.082 s 21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 12 finished: showString at NativeMethodAccessorImpl.java:0, took 0.086565 s +-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name| id|category|number| create_time|create_date| update_time| +-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+ | 20210630135447| 20210630135447_1_2| 31| 2021/06/30|49a56794-40a5-45c...| 31| E| 0.5|1624989817000| 2021/06/30|1624989817000| | 20210630135447| 20210630135447_1_3| 13| 2021/06/30|49a56794-40a5-45c...| 13| E| 0.5|1624987648000| 2021/06/30|1624987648000| | 20210630135447| 20210630135447_1_4| 22| 2021/06/30|49a56794-40a5-45c...| 22| E| 0.5|1624988926000| 2021/06/30|1624988926000| | 20210630135447| 20210630135447_0_1| 6| 2021/06/29|20421052-ee81-442...| 6| E| 0.5|1624976243000| 2021/06/29|1624976243000| +-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+ 21/06/30 13:54:56 INFO spark.SparkContext: Invoking stop() from shutdown hook 21/06/30 13:54:56 INFO server.AbstractConnector: Stopped Spark@6260df1d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041} 21/06/30 13:54:56 INFO ui.SparkUI: Stopped Spark web UI at http://hdp-jk-1:4041 21/06/30 13:54:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/06/30 13:54:56 INFO memory.MemoryStore: MemoryStore cleared 21/06/30 13:54:56 INFO storage.BlockManager: BlockManager stopped 21/06/30 13:54:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 21/06/30 13:54:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/06/30 13:54:56 INFO spark.SparkContext: Successfully stopped SparkContext 21/06/30 13:54:56 INFO util.ShutdownHookManager: Shutdown hook called 21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8f70e7ed-c33c-460b-ace9-787227e65264 21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c/pyspark-cb620c5f-4452-4845-8ac0-a25b0dc606fa 21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c
这篇关于Spark操作数据表入门【进行数据写入和读出】————附带详细步骤的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-12-27OpenFeign服务间调用学习入门
- 2024-12-27OpenFeign服务间调用学习入门
- 2024-12-27OpenFeign学习入门:轻松掌握微服务通信
- 2024-12-27OpenFeign学习入门:轻松掌握微服务间的HTTP请求
- 2024-12-27JDK17新特性学习入门:简洁教程带你轻松上手
- 2024-12-27JMeter传递token学习入门教程
- 2024-12-27JMeter压测学习入门指南
- 2024-12-27JWT单点登录学习入门指南
- 2024-12-27JWT单点登录原理学习入门
- 2024-12-27JWT单点登录原理学习入门