idea编写wordcount程序及spark-submit运行
2021/4/23 20:25:21
本文主要是介绍idea编写wordcount程序及spark-submit运行,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1、idea中新建一个Maven项目
自定义名称
2、编辑pom文件,包括spark、scala、hadoop等properties版本信息、dependencies依赖、和plugins 插件信息
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>sparktest</artifactId> <version>1.0-SNAPSHOT</version> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <scala.version>2.11.8</scala.version> <spark.version>2.4.5-hw-ei-302002</spark.version> <hadoop.version>3.1.1-hw-ei-302002</hadoop.version> <hbase.version>2.2.3-hw-ei-302002</hbase.version> </properties> <dependencies> <!--导入scala的依赖 --> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <!--导入spark的依赖 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> </dependency> <!--指定hadoop-client API的版本 --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.scala-tools</groupId> <artifactId>maven-scala-plugin</artifactId> <version>2.15.2</version> <configuration> <recompileMode>modified-only</recompileMode> </configuration> <executions> <execution> <id>main-scalac</id> <phase>process-resources</phase> <goals> <goal>add-source</goal> <goal>compile</goal> </goals> </execution> <execution> <id>scala-test-compile</id> <phase>process-test-resources</phase> <goals> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <executions> <execution> <phase>compile</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> <directory>target</directory> <outputDirectory>target/classes</outputDirectory> <testOutputDirectory>target/test-classes</testOutputDirectory> </build> </project>
创建derectory,命名:scala。看颜色同java不一样,此时还不是源码包,不能添加class
3、点击reimport ,也就是小圆圈,变成源码包
4、创建scala class,编写程序
创建object,自定义名称
5、编辑好程序,双击package打包,即可生成jar包
package cn.study.spark.scala import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext, rdd} object WordCount{ def main(args: Array[String]): Unit = { //创建spark配置,设置应用程序名称 val conf = new SparkConf().setAppName("ScalaWordCount") //创建spark执行的入口 var sc = new SparkContext(conf) //指定读取创建RDD(弹性分布式数据集)的文件 //sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile(args(1)) val lines:RDD[String] = sc.textFile(args(0)) //切分压平 val words:RDD[String] = lines.flatMap(_.split(" ")) //将单词和一组合 val wordAndOne:RDD[(String,Int)] = words.map((_,1)) //按key进行聚合 val reduced:RDD[(String,Int)] = wordAndOne.reduceByKey(_+_) val sorted:RDD[(String,Int)] = reduced.sortBy(_._2,false) //将结果保存到hdfs中 sorted.saveAsTextFile(args(1)) //释放资源 sc.stop() } }
6、jar包上传集群节点,hdfs上传存放word的文件
7、运行程序跑起来
spark-submit --master yarn --deploy-mode client --class cn.study.spark.scala.WordCount sparktest-1.0-SNAPSHOT.jar /tmp/test/t1.txt /tmp/test/t2
8、查看结果
这篇关于idea编写wordcount程序及spark-submit运行的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-07-06有没有什么开源的py项目可以对图像进行分类-icode9专业技术文章分享
- 2024-07-05feign默认connecttimeout和readtimeout是多少-icode9专业技术文章分享
- 2024-07-05idea控制台,日志太多,导致部分想看得日志被刷走 搜不到-icode9专业技术文章分享
- 2024-07-05The server selected protocol version Tls10 is not accepted by client preferences [TLs12]-icode9专业技术文章分享
- 2024-07-05怎么清理项目缓存-icode9专业技术文章分享
- 2024-07-04安装 Eyoucms详细图文教程-icode9专业技术文章分享
- 2024-07-04ueditor 复制文章时,图片的链接是一个下载图片地址,该如何处理?-icode9专业技术文章分享
- 2024-07-04怎样判断host有没有对wordpress有缓存呢-icode9专业技术文章分享
- 2024-07-04具有编译功能的系统make后,无法ssh连接-icode9专业技术文章分享
- 2024-07-04make后如何升级ssh-icode9专业技术文章分享