Caused by: java.io.IOException: On-disk size without header provided is 6, but block header contains
2021/6/30 14:26:20
本文主要是介绍Caused by: java.io.IOException: On-disk size without header provided is 6, but block header contains,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
Caused by: java.io.IOException: On-disk size without header provided is 6
- 前言
- 问题的来源:
- 问题分析
- 问题解决:
前言
本片报错信息博主粘的比较详细,还请耐心查看。
问题的来源:
本人写了一个spark程序,调用hbase的API接口进行scan获取数据。起先可以正常导出按批数据,可是当进行了某某批次的时候,就会爆出如下的错误:
WARN TaskSetManager: Lost task 8.0 in stage 79.0 (TID 38487, cnbjsjqpsgjdn171, executor 4): org.apache.hadoop.hbase.client .RetriesExhaustedException: Failed after attempts=35, exceptions: Tue Jun 29 21:10:51 CST 2021, RpcRetryingCaller{globalStartTime=1624972251344, pause=100, retries=35}, java.io .IOException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://names ervice1/hbase/data/default/packet_v2/2b827f7067bbba7cf08dcb187d643c44/cf/7199e17027454ed99f637eb41142e8da, com pression=lzo, cacheConf=blockCache=LruBlockCache{blockCount=8851, currentSize=625112608, freeSize=9585108448, maxSize=10210221056, heapSize=625112608, minSize=9699709952, minFactor=0.95, multiSize=4849854976, multiFactor =0.5, singleSize=2424927488, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnW rite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false , firstKey=2uy5q_1624691294000_1/cf:data/1624691294413/Put, lastKey=2v2l9_1624707579000_4/cf:verify/1624707579 921/Put, avgKeyLen=39, avgValueLen=77, entries=1537704, length=66621977, cur=null] to key 2v1gr_1451577600000/ cf:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/seqid=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:218) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:199) at org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2120) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2110) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5617) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2637) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2604) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2392) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientPr otos.java:33648) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) Caused by: java.io.IOException: On-disk size without header provided is 62081, but block header contains 0. Bl ock offset: 55383835, data starts with: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1705 ) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1548) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:446) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBl ockIndex.java:266) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:593) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:297) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200) ... 14 more
问题分析
从报错信息中看到有RpcRetryingCaller,博主第一时间以为是自己设置的rpc超时时间太小,于是对比了当时hbase的读写请求、运行情况等,发现确实有明显的拨动。如下图,此图并非出现报错时的hbase图片,只是给大家一个参考的思路。
于是便开始修改spark源代码,修改rpc超时时间。再次运行,又一次出现了相同的错误。错误日志如下:
21/06/29 10:17:54 WARN TaskSetManager: Lost task 8.0 in stage 39.0 (TID 18287, cnbjsjqpsgjdn61, exe cutor 12): org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions: Tue Jun 29 10:08:43 CST 2021, RpcRetryingCaller{globalStartTime=1624932523329, pause=100, retries=35}, java.io.IOException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://nameservice1/hbase/data/default/packet_v2/08edc75f91a5465776db18092d3035ce/cf/14261d5166da4af59 247f94b0fa77582, compression=lzo, cacheConf=blockCache=LruBlockCache{blockCount=8770, currentSize=615286456, freeSize=9594934600, maxSize=10210221056, heap Size=615286456, minSize=9699709952, minFactor=0.95, multiSize=4849854976, multiFactor=0.5, singleSize=2424927488, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, fir stKey=98877a5b6a834c4bbca9706fa9eb696e_1624155462000_1/cf:data/1624155462557/Put, lastKey=988d3aeca4ce43ce9836e4d600415330_1624180198000_4/cf:verify/162418 0198123/Put, avgKeyLen=66, avgValueLen=79, entries=719284, length=31701109, cur=null] to key 9889f92362d64e608afa838e402b8f68_1451577600000/cf:/LATEST_TIME STAMP/DeleteFamily/vlen=0/seqid=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:218) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:199) at org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2120) at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2110) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5617) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2637) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2604) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2392) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) Caused by: java.io.IOException: On-disk size without header provided is 18784, but block header contains 0. Block offset: 12257382, data starts with: \x00\ x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1705) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1548) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:446) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:593) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:297) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200) ... 14 more
对比两次日志我们发现有相同地方。报错信息中有
Could not seek StoreFileScanner
说明我们的hbase的API确实去hbase库中去拉取数据了,但是问题是扫描不到这块数据,再加上报错日志中的:
On-disk size without header provided is 62081, but block header contains 0. Block offset: 55383835, data starts with: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
我们可以初步断定应该和hbase存储的磁盘有关联。
查看两次报错的信息中所读取的hadoop数据为:
第一次读取的数据: hdfs://nameservice1/hbase/data/default/packet_v2/2b827f7067bbba7cf08dcb187d643c44/cf/7199e17027454ed99f637eb41142e8da 第二次读取的数据: hdfs://nameservice1/hbase/data/default/packet_v2/08edc75f91a5465776db18092d3035ce/cf/14261d5166da4af59247f94b0fa77582
打开我们的hadoop的web管理界面,查看一下这两个数据究竟是存储到了hadoop集群的那台服务器上面。
对比两张图,我们发现都出现了cnbjsjqpsgjdn224,说明数据都有存储在这台服务器上。为了验证确实是cnbjsjqpsgjdn224出了问题,我又将我之前写好的批量读取hbase数据的脚本又执行了几次,发现出错的数据信息都一致的定位到了cnbjsjqpsgjdn224这台集群服务器上面。进而我们可以断定问题就是出在cnbjsjqpsgjdn224这台节点。通过Cloudera Management 又一次验证了我的想法,他还是hbase中的一台RegionServer节点。根据集群读取数据的就近原则,又一次证实了我们之前的判断。就是这台节点存储数据出现了问题。
问题解决:
- 使用“Cloudera Management Service”标记hd01为维护模式
- 使用“Cloudera Management Service”停止hd01的所有角色
- 备份分区的关键数据(HDFS数据节点由于有冗余,所以我们选择不备份)
- 卸载挂于“/data”的“/dev/mapper/ds-data”分区
- 使用e2fsck命令修复损坏的坏道
问题的解决也可以参考一下两篇博客,根据自己的实际情况进行修复
https://www.cmdschool.org/archives/10817 https://blog.csdn.net/shekey92/article/details/46895357
希望对您的问题有所帮助,如有疑问,可在评论区发表您的意见,博主第一时间看到就会给与回复。
这篇关于Caused by: java.io.IOException: On-disk size without header provided is 6, but block header contains的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-23Springboot应用的多环境打包入门
- 2024-11-23Springboot应用的生产发布入门教程
- 2024-11-23Python编程入门指南
- 2024-11-23Java创业入门:从零开始的编程之旅
- 2024-11-23Java创业入门:新手必读的Java编程与创业指南
- 2024-11-23Java对接阿里云智能语音服务入门详解
- 2024-11-23Java对接阿里云智能语音服务入门教程
- 2024-11-23JAVA对接阿里云智能语音服务入门教程
- 2024-11-23Java副业入门:初学者的简单教程
- 2024-11-23JAVA副业入门:初学者的实战指南