Spark操作数据表入门【进行数据写入和读出】————附带详细步骤

2021/7/9 6:07:55

本文主要是介绍Spark操作数据表入门【进行数据写入和读出】————附带详细步骤,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

文章目录

  • 0 准备
  • 1 使用脚本运行
  • 2 使用shell执行
  • 3 使用脚本执行的结果

0 准备

运行路径为:/usr/app/spark-2.4.7-bin-hadoop2.7

1 使用脚本运行

执行脚本运行下面的python文件:

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export  PATH=$PATH:$LD_LIBRARY_PATH

bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.3 bin/testSpark.py


/usr/app/spark-2.4.7-bin-hadoop2.7/bin/spark-submit --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4 /usr/app/spark-2.4.7-bin-hadoop2.7/bin/testSpark.py

spark对数据进行读取和创建的python文件:

# coding=utf-8
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
spark = SparkSession.builder \
      .master("local[*]") \
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
      .config("spark.default.parallelism", 2) \
      .appName("hudi-datalake-test") \
      .getOrCreate()
sparkContext = spark.sparkContext;

basePath = '/user/hive/warehouse/test_spark_mor'
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('原始文件:')
df=spark.sql("select * from hudi_trips_snapshot where category = 'E'")
df.show()

# 写数据:
savePath = '/user/hive/warehouse/test_spark_mor10'

hudi_options = {
  'hoodie.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.recordkey.field': 'id',
  'hoodie.datasource.write.partitionpath.field': 'create_date',
  'hoodie.datasource.write.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.operation': 'insert',
  'hoodie.datasource.write.precombine.field': 'ts',
  'hoodie.upsert.shuffle.parallelism': 2,
  'hoodie.insert.shuffle.parallelism': 2
}

df.write.format("hudi"). \
  options(**hudi_options). \
  mode("append"). \
  save(savePath)

# 再读新数据
basePath =  savePath
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('新数据:')
spark.sql("select * from hudi_trips_snapshot ").show()

2 使用shell执行

启动的shell的方法如下:

bin/pyspark --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4

启动后执行的指令如下:

basePath = '/user/hive/warehouse/test_spark_mor'
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('原始文件')
df=spark.sql("select * from hudi_trips_snapshot where category = 'E'")
df.show()

# 写数据:
savePath = '/user/hive/warehouse/test_spark_mor10'

hudi_options = {
  'hoodie.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.recordkey.field': 'id',
  'hoodie.datasource.write.partitionpath.field': 'create_date',
  'hoodie.datasource.write.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.operation': 'insert',
  'hoodie.datasource.write.precombine.field': 'ts',
  'hoodie.upsert.shuffle.parallelism': 2,
  'hoodie.insert.shuffle.parallelism': 2
}

df.write.format("hudi"). \
  options(**hudi_options). \
  mode("append"). \
  save(savePath)

# 再读新数据
basePath =  savePath
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('新数据:')
spark.sql("select * from hudi_trips_snapshot ").show()

3 使用脚本执行的结果

部分结果如下:

21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152), which has no missing parents
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 23.9 KB, free 365.4 MB)
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 11.4 KB, free 365.4 MB)
21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on hdp-jk-1:36125 (size: 11.4 KB, free: 366.2 MB)
21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8527 bytes)
21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:49 INFO codegen.CodeGenerator: Code generated in 39.268569 ms
21/06/30 13:54:49 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 1400 bytes result sent to driver
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 283 ms on localhost (executor driver) (1/1)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: ResultStage 2 (isEmpty at HoodieSparkSqlWriter.scala:181) finished in 0.309 s
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Job 2 finished: isEmpty at HoodieSparkSqlWriter.scala:181, took 0.317609 s
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210630135447 action: commit
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210630135447__commit__REQUESTED]
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED]]
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:49 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/06/30 13:54:49 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/06/30 13:54:49 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Registering RDD 11 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 0
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Got job 3 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 26.8 KB, free 365.3 MB)
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 12.8 KB, free 365.3 MB)
21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on hdp-jk-1:36125 (size: 12.8 KB, free: 366.2 MB)
21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes)
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 4, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes)
21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3)
21/06/30 13:54:49 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 4)
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_0 stored as values in memory (estimated size 196.0 B, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_0 in memory on hdp-jk-1:36125 (size: 196.0 B, free: 366.2 MB)
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_1 stored as values in memory (estimated size 251.0 B, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_1 in memory on hdp-jk-1:36125 (size: 251.0 B, free: 366.2 MB)
21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1370 bytes result sent to driver
21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 4). 1370 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 392 ms on localhost (executor driver) (1/2)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 4) in 391 ms on localhost (executor driver) (2/2)
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/06/30 13:54:50 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.432 s
21/06/30 13:54:50 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/06/30 13:54:50 INFO scheduler.DAGScheduler: running: Set()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4)
21/06/30 13:54:50 INFO scheduler.DAGScheduler: failed: Set()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 3.7 KB, free 365.3 MB)
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 2.1 KB, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on hdp-jk-1:36125 (size: 2.1 KB, free: 366.2 MB)
21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/06/30 13:54:50 INFO executor.Executor: Running task 1.0 in stage 4.0 (TID 5)
21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 4.0 (TID 6)
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 4.0 (TID 5). 1098 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 5) in 128 ms on localhost (executor driver) (1/2)
21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 4.0 (TID 6). 1176 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 143 ms on localhost (executor driver) (2/2)
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/06/30 13:54:50 INFO scheduler.DAGScheduler: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.185 s
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.737207 s
21/06/30 13:54:50 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=4, numUpdates=0}, partitionStat={2021/06/29=WorkloadStat {numInserts=1, numUpdates=0}, 2021/06/30=WorkloadStat {numInserts=3, numUpdates=0}}, operationType=INSERT}
21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit.requested
21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight
21/06/30 13:54:50 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/06/30 13:54:50 INFO spark.SparkContext: Starting job: collectAsMap at UpsertPartitioner.java:252
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Got job 4 (collectAsMap at UpsertPartitioner.java:252) with 2 output partitions
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252)
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251), which has no missing parents
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11 stored as values in memory (estimated size 231.2 KB, free 365.1 MB)
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 82.6 KB, free 365.0 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on hdp-jk-1:36125 (size: 82.6 KB, free: 366.1 MB)
21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 7733 bytes)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 5.0 (TID 8, localhost, executor driver, partition 1, PROCESS_LOCAL, 7733 bytes)
21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 7)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 5.0 (TID 8)
21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 5.0 (TID 8). 705 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 5.0 (TID 8) in 73 ms on localhost (executor driver) (1/2)
21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 7). 705 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 7) in 142 ms on localhost (executor driver) (2/2)
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
21/06/30 13:54:51 INFO scheduler.DAGScheduler: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252) finished in 0.225 s
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Job 4 finished: collectAsMap at UpsertPartitioner.java:252, took 0.235333 s
21/06/30 13:54:51 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:51 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/29 Small Files => []
21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 1, totalInsertBuckets => 1, recordsPerBucket => 122880
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/29 => [(InsertBucket {bucketNumber=0, weight=1.0},1.0)]
21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/30 Small Files => []
21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 3, totalInsertBuckets => 1, recordsPerBucket => 122880
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/30 => [(InsertBucket {bucketNumber=1, weight=1.0},1.0)]
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total Buckets :2, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=20421052-ee81-4427-ab8f-464d81b40b8f, partitionPath=2021/06/29}, 1=BucketInfo {bucketType=INSERT, fileIdPrefix=49a56794-40a5-45c2-bd4a-08a566590703, partitionPath=2021/06/30}},
Partition to insert buckets => {2021/06/29=[(InsertBucket {bucketNumber=0, weight=1.0},1.0)], 2021/06/30=[(InsertBucket {bucketNumber=1, weight=1.0},1.0)]},
UpdateLocations mapped to buckets =>{}
21/06/30 13:54:51 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210630135447
21/06/30 13:54:51 INFO spark.SparkContext: Starting job: count at HoodieSparkSqlWriter.scala:470
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Registering RDD 15 (mapToPair at BaseSparkCommitActionExecutor.java:192) as input to shuffle 1
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Got job 5 (count at HoodieSparkSqlWriter.scala:470) with 2 output partitions
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192), which has no missing parents
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12 stored as values in memory (estimated size 253.2 KB, free 364.8 MB)
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 92.7 KB, free 364.7 MB)
21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on hdp-jk-1:36125 (size: 92.7 KB, free: 366.0 MB)
21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 9, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes)
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 10, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 6.0 (TID 10)
21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 9)
21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_0 locally
21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_1 locally
21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 6.0 (TID 9). 1370 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 9) in 139 ms on localhost (executor driver) (1/2)
21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 6.0 (TID 10). 1327 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 10) in 131 ms on localhost (executor driver) (2/2)
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/06/30 13:54:51 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (mapToPair at BaseSparkCommitActionExecutor.java:192) finished in 0.195 s
21/06/30 13:54:51 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/06/30 13:54:51 INFO scheduler.DAGScheduler: running: Set()
21/06/30 13:54:51 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: failed: Set()
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470), which has no missing parents
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13 stored as values in memory (estimated size 325.0 KB, free 364.3 MB)
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 118.5 KB, free 364.2 MB)
21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on hdp-jk-1:36125 (size: 118.5 KB, free: 365.9 MB)
21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 11, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 12, localhost, executor driver, partition 1, ANY, 7662 bytes)
21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 7.0 (TID 11)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 7.0 (TID 12)
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records
21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread
21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread
21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records
21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records
21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet.marker.CREATE
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet.marker.CREATE
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz]
21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz]
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 99
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 102
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 85
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 155
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 128
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 93
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 111
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 94
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 101
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 105
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 140
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 83
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 131
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 72
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 84
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 59
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 123
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned shuffle 0
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 134
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 115
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 97
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 139
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_11_piece0 on hdp-jk-1:36125 in memory (size: 82.6 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 67
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 124
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 138
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 103
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 81
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 82
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 95
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 79
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 156
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 89
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 98
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 146
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 77
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 57
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 96
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 141
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 100
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 86
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on hdp-jk-1:36125 in memory (size: 12.8 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 117
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 147
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 150
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 66
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 114
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 65
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 116
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 109
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 113
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 60
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 62
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 78
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 143
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 104
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 135
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 112
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 58
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 63
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_10_piece0 on hdp-jk-1:36125 in memory (size: 2.1 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 142
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 76
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 107
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 121
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 71
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 129
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 137
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 91
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 125
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 74
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 148
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 133
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 144
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 73
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 64
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 88
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 120
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 149
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 122
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 92
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 87
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 126
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 154
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 106
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 136
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 152
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 61
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 108
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 118
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_12_piece0 on hdp-jk-1:36125 in memory (size: 92.7 KB, free: 366.1 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 145
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 151
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 70
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 90
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 119
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 110
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 127
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 80
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on hdp-jk-1:36125 in memory (size: 11.4 KB, free: 366.1 MB)
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/30 with fileId 49a56794-40a5-45c2-bd4a-08a566590703-0
21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/29 with fileId 20421052-ee81-4427-ab8f-464d81b40b8f-0
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 68
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 132
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 69
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 153
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 130
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 75
21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 20421052-ee81-4427-ab8f-464d81b40b8f-0 as we are done with all the records 1
21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 240
21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 49a56794-40a5-45c2-bd4a-08a566590703-0 as we are done with all the records 3
21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 461
21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/30 fileID 49a56794-40a5-45c2-bd4a-08a566590703-0, took 1012 ms.
21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads
21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_1 stored as values in memory (estimated size 301.0 B, free 365.0 MB)
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_1 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB)
21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/29 fileID 20421052-ee81-4427-ab8f-464d81b40b8f-0, took 1044 ms.
21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads
21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_0 stored as values in memory (estimated size 301.0 B, free 365.0 MB)
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_0 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB)
21/06/30 13:54:52 INFO executor.Executor: Finished task 1.0 in stage 7.0 (TID 12). 1428 bytes result sent to driver
21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 12) in 1256 ms on localhost (executor driver) (1/2)
21/06/30 13:54:52 INFO executor.Executor: Finished task 0.0 in stage 7.0 (TID 11). 1428 bytes result sent to driver
21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 11) in 1261 ms on localhost (executor driver) (2/2)
21/06/30 13:54:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/06/30 13:54:52 INFO scheduler.DAGScheduler: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470) finished in 1.351 s
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Job 5 finished: count at HoodieSparkSqlWriter.scala:470, took 1.556432 s
21/06/30 13:54:52 INFO hudi.HoodieSparkSqlWriter$: No errors. Proceeding to commit the write.
21/06/30 13:54:52 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Got job 6 (collect at SparkRDDWriteClient.java:120) with 2 output partitions
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 9 (collect at SparkRDDWriteClient.java:120)
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 8)
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Submitting ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14 stored as values in memory (estimated size 325.5 KB, free 364.6 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 118.8 KB, free 364.5 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on hdp-jk-1:36125 (size: 118.8 KB, free: 366.0 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with 2 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 9.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 9.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 9.0 (TID 13)
21/06/30 13:54:53 INFO executor.Executor: Running task 1.0 in stage 9.0 (TID 14)
21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_1 locally
21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_0 locally
21/06/30 13:54:53 INFO executor.Executor: Finished task 1.0 in stage 9.0 (TID 14). 1486 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 9.0 (TID 14) in 125 ms on localhost (executor driver) (1/2)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 9.0 (TID 13). 1486 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 (TID 13) in 130 ms on localhost (executor driver) (2/2)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 9 (collect at SparkRDDWriteClient.java:120) finished in 0.219 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 6 finished: collect at SparkRDDWriteClient.java:120, took 0.225906 s
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO util.CommitUtils: Creating  metadata for INSERT numWriteStats:2numReplaceFileIds:0
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 7 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 10 (collect at HoodieSparkEngineContext.java:78)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15 stored as values in memory (estimated size 72.1 KB, free 364.4 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.4 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_15_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 366.0 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 10.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 10.0 (TID 15)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 10.0 (TID 15). 833 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 15) in 40 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 10 (collect at HoodieSparkEngineContext.java:78) finished in 0.071 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 7 finished: collect at HoodieSparkEngineContext.java:78, took 0.075028 s
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Committing 20210630135447 action commit
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 8 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 11 (collect at HoodieSparkEngineContext.java:78)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 72.1 KB, free 364.4 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.3 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 365.9 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 11.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 11.0 (TID 16)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 11.0 (TID 16). 876 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 11.0 (TID 16) in 36 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 11 (collect at HoodieSparkEngineContext.java:78) finished in 0.067 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 8 finished: collect at HoodieSparkEngineContext.java:78, took 0.073743 s
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210630135447__commit__INFLIGHT]
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Completed [==>20210630135447__commit__INFLIGHT]
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: foreach at HoodieSparkEngineContext.java:83
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 9 (foreach at HoodieSparkEngineContext.java:83) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 71.2 KB, free 364.3 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 25.7 KB, free 364.2 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on hdp-jk-1:36125 (size: 25.7 KB, free: 365.9 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 12.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 17, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 12.0 (TID 17)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 12.0 (TID 17). 666 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 17) in 37 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83) finished in 0.066 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 9 finished: foreach at HoodieSparkEngineContext.java:83, took 0.071339 s
21/06/30 13:54:53 INFO table.MarkerFiles: Removing marker directory at /user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED], [==>20210630135447__commit__INFLIGHT], [20210630135447__commit__COMPLETED]]
21/06/30 13:54:53 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210630135453
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hive/warehouse/test_spark_mor10. Server=hdp-jk-1:35915, Timeout=300
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:53 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:53 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://hdp-jk-1:35915/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3)
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:54 INFO service.RequestHandler: TimeTakenMillis[Total=92, Refresh=81, handle=10, Check=1], Success=true, Query=basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3, Host=hdp-jk-1:35915, synced=false
21/06/30 13:54:54 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/06/30 13:54:54 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaner started
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:54 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Committed 20210630135447
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Commit 20210630135447 successful!
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Config.inlineCompactionEnabled ? false
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Compaction Scheduled is Optional.empty
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Is Async Compaction Enabled ? false
21/06/30 13:54:54 INFO client.AbstractHoodieClient: Stopping Timeline service !!
21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/06/30 13:54:54 INFO service.TimelineService: Closing Timeline Service
21/06/30 13:54:54 INFO javalin.Javalin: Stopping Javalin ...
21/06/30 13:54:54 INFO javalin.Javalin: Javalin has stopped
21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:54 INFO service.TimelineService: Closed Timeline Service
21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 19 from persistence list
21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 19
21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 9 from persistence list
21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 9
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO hudi.DataSourceUtils: Getting table path..
21/06/30 13:54:55 INFO util.TablePathUtils: Getting table path from path : hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/.aux/.bootstrap/.fileids
21/06/30 13:54:55 INFO hudi.DefaultSource: Obtained hudi table path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO hudi.DefaultSource: Is bootstrapped table => false
21/06/30 13:54:55 WARN hudi.DefaultSource: Loading Base File Only View.
21/06/30 13:54:55 INFO hudi.DefaultSource: Constructing hoodie (as parquet) data source with options :Map(path -> /user/hive/warehouse/test_spark_mor10/*/*/*/*, hoodie.datasource.query.type -> snapshot)
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading Active commit timeline for hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/29)
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/29) =2, Time taken =2
21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/29, #FileGroups=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/29) =4
21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/30)
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/30) =2, Time taken =1
21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/30, #FileGroups=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/30) =4
21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO datasources.InMemoryFileIndex: It took 93 ms to list leaf files for 6 paths.
21/06/30 13:54:55 INFO spark.SparkContext: Starting job: resolveRelation at DefaultSource.scala:193
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Got job 10 (resolveRelation at DefaultSource.scala:193) with 1 output partitions
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (resolveRelation at DefaultSource.scala:193)
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193), which has no missing parents
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 73.5 KB, free 364.2 MB)
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 26.7 KB, free 364.1 MB)
21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on hdp-jk-1:36125 (size: 26.7 KB, free: 365.9 MB)
21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Adding task set 13.0 with 1 tasks
21/06/30 13:54:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 18, localhost, executor driver, partition 0, PROCESS_LOCAL, 7869 bytes)
21/06/30 13:54:55 INFO executor.Executor: Running task 0.0 in stage 13.0 (TID 18)
21/06/30 13:54:55 INFO executor.Executor: Finished task 0.0 in stage 13.0 (TID 18). 1308 bytes result sent to driver
21/06/30 13:54:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 18) in 197 ms on localhost (executor driver) (1/1)
21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/06/30 13:54:55 INFO scheduler.DAGScheduler: ResultStage 13 (resolveRelation at DefaultSource.scala:193) finished in 0.227 s
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Job 10 finished: resolveRelation at DefaultSource.scala:193, took 0.230370 s
新数据:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Pruning directories with:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Post-Scan Filters:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Output Data Schema: struct<_hoodie_commit_time: string, _hoodie_commit_seqno: string, _hoodie_record_key: string, _hoodie_partition_path: string, _hoodie_file_name: string ... 9 more fields>
21/06/30 13:54:55 INFO execution.FileSourceScanExec: Pushed Filters:
21/06/30 13:54:55 INFO codegen.CodeGenerator: Code generated in 63.038365 ms
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19 stored as values in memory (estimated size 292.0 KB, free 363.8 MB)
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 25.9 KB, free 363.8 MB)
21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on hdp-jk-1:36125 (size: 25.9 KB, free: 365.9 MB)
21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 19 from showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4629786 bytes, open cost is considered as scanning 4194304 bytes.
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 205
21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 238
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 11 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 256
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 305
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 266
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 167
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 298
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 264
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_15_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 365.9 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 182
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 165
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 302
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 301
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 172
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 299
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_13_piece0 on hdp-jk-1:36125 in memory (size: 118.5 KB, free: 366.0 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 204
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 232
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 161
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_17_piece0 on hdp-jk-1:36125 in memory (size: 25.7 KB, free: 366.0 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 163
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 316
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 206
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 166
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 175
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 170
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 186
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 201
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 214
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 297
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 300
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 277
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 287
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 292
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 219
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 235
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 249
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 239
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 185
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 254
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 326
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 330
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned shuffle 1
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 222
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 194
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 311
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 278
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 276
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 328
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 215
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20 stored as values in memory (estimated size 14.6 KB, free 364.5 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_16_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 366.1 MB)
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 5.7 KB, free 364.5 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.1 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 310
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 270
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 291
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 160
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 195
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 288
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 296
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 237
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 227
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 279
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 178
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 236
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 202
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 282
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 226
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 271
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 319
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 284
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 309
21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 233
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 257
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 259
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 293
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 19, localhost, executor driver, partition 0, ANY, 8353 bytes)
21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 14.0 (TID 19)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_14_piece0 on hdp-jk-1:36125 in memory (size: 118.8 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet, range: 0-435537, partition values: [empty row]
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 320
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 174
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 177
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 190
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 245
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 262
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 164
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 253
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 295
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 285
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 191
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 213
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 216
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 203
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 228
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 231
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 269
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 324
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 221
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 329
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_18_piece0 on hdp-jk-1:36125 in memory (size: 26.7 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 14.0 (TID 19). 1555 bytes result sent to driver
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 19) in 63 ms on localhost (executor driver) (1/1)
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool
21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0) finished in 0.091 s
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 11 finished: showString at NativeMethodAccessorImpl.java:0, took 0.130874 s
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 272
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 321
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 252
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 169
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 183
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 322
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 224
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 220
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 306
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 303
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 261
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 267
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 162
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 176
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 229
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 258
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 218
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 268
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 217
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 280
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 193
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 196
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 234
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 225
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 187
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 159
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 318
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 171
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 274
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 197
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 192
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 244
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 211
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 240
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 246
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 313
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 283
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 198
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 210
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 242
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 290
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 314
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 308
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 200
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 263
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 248
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 212
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 184
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 199
21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 12 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0)
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21 stored as values in memory (estimated size 14.6 KB, free 365.0 MB)
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 5.7 KB, free 365.0 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_21_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(1))
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 15.0 with 1 tasks
21/06/30 13:54:56 INFO storage.BlockManager: Removing RDD 19
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 15.0 (TID 20, localhost, executor driver, partition 1, ANY, 8353 bytes)
21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 15.0 (TID 20)
21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet, range: 0-435428, partition values: [empty row]
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned RDD 19
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 307
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 168
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 243
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 289
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 247
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 275
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 209
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 157
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 286
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 173
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 331
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 241
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 251
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 158
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 294
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 179
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 273
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 180
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 323
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 181
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 250
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 304
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 327
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 315
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 255
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 223
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 189
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 281
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 188
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 230
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 207
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 312
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 265
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 317
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 325
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 208
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 260
21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 15.0 (TID 20). 1476 bytes result sent to driver
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 15.0 (TID 20) in 66 ms on localhost (executor driver) (1/1)
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool
21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0) finished in 0.082 s
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 12 finished: showString at NativeMethodAccessorImpl.java:0, took 0.086565 s
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name| id|category|number|  create_time|create_date|  update_time|
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+
|     20210630135447|  20210630135447_1_2|                31|            2021/06/30|49a56794-40a5-45c...| 31|       E|   0.5|1624989817000| 2021/06/30|1624989817000|
|     20210630135447|  20210630135447_1_3|                13|            2021/06/30|49a56794-40a5-45c...| 13|       E|   0.5|1624987648000| 2021/06/30|1624987648000|
|     20210630135447|  20210630135447_1_4|                22|            2021/06/30|49a56794-40a5-45c...| 22|       E|   0.5|1624988926000| 2021/06/30|1624988926000|
|     20210630135447|  20210630135447_0_1|                 6|            2021/06/29|20421052-ee81-442...|  6|       E|   0.5|1624976243000| 2021/06/29|1624976243000|
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+

21/06/30 13:54:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
21/06/30 13:54:56 INFO server.AbstractConnector: Stopped Spark@6260df1d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
21/06/30 13:54:56 INFO ui.SparkUI: Stopped Spark web UI at http://hdp-jk-1:4041
21/06/30 13:54:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/06/30 13:54:56 INFO memory.MemoryStore: MemoryStore cleared
21/06/30 13:54:56 INFO storage.BlockManager: BlockManager stopped
21/06/30 13:54:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/06/30 13:54:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/06/30 13:54:56 INFO spark.SparkContext: Successfully stopped SparkContext
21/06/30 13:54:56 INFO util.ShutdownHookManager: Shutdown hook called
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8f70e7ed-c33c-460b-ace9-787227e65264
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c/pyspark-cb620c5f-4452-4845-8ac0-a25b0dc606fa
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c


这篇关于Spark操作数据表入门【进行数据写入和读出】————附带详细步骤的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!


扫一扫关注最新编程教程