运行Hadoop自带的wordcount单词统计程序
2021/6/21 17:26:03
本文主要是介绍运行Hadoop自带的wordcount单词统计程序,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
2018.11.19测试,可行
0.前言
前面一篇《Hadoop初体验:快速搭建Hadoop伪分布式环境》搭建了一个Hadoop的环境,现在就使用Hadoop自带的wordcount程序来做单词统计的案例。
http://www.linuxidc.com/Linux/2017-09/146694.htm
1.使用示例程序实现单词统计
(1)wordcount程序
wordcount程序在hadoop的share目录下,如下:
[root@linuxidc mapreduce]# pwd
/usr/local/hadoop/share/hadoop/mapreduce
[root@linuxidc mapreduce]# ls
hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar
hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar
hadoop-mapreduce-client-hs-2.6.5.jar lib
hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples
hadoop-mapreduce-client-jobclient-2.6.5.jar sources
就是这个hadoop-mapreduce-examples-2.6.5.jar程序。
(2)创建HDFS数据目录
创建一个目录,用于保存MapReduce任务的输入文件:
[root@linuxidc ~]# hadoop fs -mkdir -p /data/wordcount
创建一个目录,用于保存MapReduce任务的输出文件:
[root@linuxidc ~]# hadoop fs -mkdir /output
查看刚刚创建的两个目录:
[root@linuxidc ~]# hadoop fs -ls /
drwxr-xr-x - root supergroup 0 2017-09-01 20:34 /data
drwxr-xr-x - root supergroup 0 2017-09-01 20:35 /output
(3)创建一个单词文件,并上传到HDFS
创建的单词文件如下:
[root@linuxidc ~]# cat myword.txt
linuxidc yyh
yyh xplinuxidc
katy ling
yeyonghao linuxidc
xpleaf katy
上传该文件到HDFS中:
[root@linuxidc ~]# hadoop fs -put myword.txt /data/wordcount
在HDFS中查看刚刚上传的文件及内容:
[root@linuxidc ~]# hadoop fs -ls /data/wordcount
-rw-r--r-- 1 root supergroup 57 2017-09-01 20:40 /data/wordcount/myword.txt
[root@linuxidc ~]# hadoop fs -cat /data/wordcount/myword.txt
linuxidc yyh
yyh xplinuxidc
katy ling
yeyonghao linuxidc
xpleaf katy
(4)运行wordcount程序
执行如下命令:
[root@linuxidc ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount
...
17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully
17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=585940
FILE: Number of bytes written=1099502
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=114
HDFS: Number of bytes written=48
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=97
Map output materialized bytes=78
Input split bytes=112
Combine input records=10
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=78
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=92
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241049600
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=57
File Output Format Counters
Bytes Written=48
(5)查看统计结果
如下:
[root@linuxidc ~]# hadoop fs -cat /output/wordcount/part-r-00000
katy 2
linuxidc 2
ling 1
xplinuxidc 2
yeyonghao 1
yyh 2
效果:
这篇关于运行Hadoop自带的wordcount单词统计程序的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-12-24怎么切换 Git 项目的远程仓库地址?-icode9专业技术文章分享
- 2024-12-24怎么更改 Git 远程仓库的名称?-icode9专业技术文章分享
- 2024-12-24更改 Git 本地分支关联的远程分支是什么命令?-icode9专业技术文章分享
- 2024-12-24uniapp 连接之后会被立马断开是什么原因?-icode9专业技术文章分享
- 2024-12-24cdn 路径可以指定规则映射吗?-icode9专业技术文章分享
- 2024-12-24CAP:Serverless?+AI?让应用开发更简单
- 2024-12-23新能源车企如何通过CRM工具优化客户关系管理,增强客户忠诚度与品牌影响力
- 2024-12-23原创tauri2.1+vite6.0+rust+arco客户端os平台系统|tauri2+rust桌面os管理
- 2024-12-23DevExpress 怎么实现右键菜单(Context Menu)显示中文?-icode9专业技术文章分享
- 2024-12-22怎么通过控制台去看我的页面渲染的内容在哪个文件中呢-icode9专业技术文章分享