[10-论文笔记][03] MS MARCO数据集整理

2022/5/30 23:22:57

本文主要是介绍[10-论文笔记][03] MS MARCO数据集整理,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

MS MARCO数据集整理

论文地址:https://arxiv.org/pdf/1611.09268.pdf. NIPS2016
相关介绍:

  • 2016|重磅 | 微软发布数据集MS MARCO,打造阅读理解领域的「ImageNet」
  • 10W question dataset
    • NLG
    • passage ranking
    • keyphrase extraction
    • conversion search

任务1: Document Retrieval(2020/11/08-现在) 文档检索任务

Based the questions in the Question Answering Dataset(原始MRC数据集) and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. 基于MRC任务进一步构建 query, 网页回答排序任务,基于相关性, 320W 网页检索

Relevance labels are derived from what passages was marked as having the answer in the QnA dataset making this one of the largest relevance datasets ever. 相关性标签来源:QnA数据集; 具体见MS MARCO网站介绍;

This dataset is the focus of the 2020 and 2019 TREC Deep Learning Track and has been used as a teaching aid for ACM SIGIR/SIGKDD AFIRM Summer School on Machine Learning for Data Mining and Search. 数据集在竞赛/会议中使用;

任务2:



这篇关于[10-论文笔记][03] MS MARCO数据集整理的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!


扫一扫关注最新编程教程