transformers库学习笔记(一):安装与测试
2020/6/22 6:26:28
本文主要是介绍transformers库学习笔记(一):安装与测试,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
印象中觉得transformers是一个庞然大物,但实际接触后,却是极其友好,感谢huggingface大神。原文见tmylla.github.io。
安装
我的版本号:python 3.6.9;pytorch 1.2.0;CUDA 10.0。
pip install transformers 复制代码
pip之前确保安装pytorch1.1.0+。
测试
验证代码与结果
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
在命令行输入如上命令后,transformers会自动下载依赖模型。输出以下结果,安装成果。
[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
transformer pipeline下载模型文件说明
transformers自动下载模型的保存位置:C:\Users\username\.cache\torch\,在模型下载以后,可以保存到其他位置。各文件的说明如下:
-
json文件包含对应文件的‘url’和‘etag’标签。
-
‘a41...’为配置文件:distilbert-base-uncased-config。
-
‘26b...’为词典文件:bert-base-uncased-vocab。
-
‘437...’为finetuned-sst-2的配置文件:distilbert-base-uncased-finetuned-sst-2-english-config,注意其与‘a41...’文件的不同。
-
‘57d...’为Modelcard文件:distilbert-base-uncased-finetuned-sst-2-english-modelcard。
-
‘dd7...’为模型参数文件:distilbert-base-uncased-finetuned-sst-2-english-pytorch_model.bin。
pipeline()简介
可以看到,通过执行pipeline('sentiment-analysis')('I hate you')
,transformers自动下载GLUE中sst2数据集的distilbert-base-uncased-finetuned-sst-2模型,对'I hate you'进行情感分析。
Pipeline是一个简捷的NLP任务接口,执行Input -> Tokenization -> Model Inference -> Post-Processing (Task dependent) -> Output一系列操作。目前支持Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering等任务。
以Question Answering为例:
from transformers import pipeline nlp = pipeline("question-answering") context = "Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune a model on a SQuAD task, you may leverage the `run_squad.py`." print(nlp(question="What is extractive question answering?", context=context)) print(nlp(question="What is a good example of a question answering dataset?", context=context)) 复制代码
对QA任务,transformers使用SQuAD数据集的distilbert-base-cased-distilled-squad模型,模型文件同上文介绍。
移动模型到自定义文件夹
以QA为例:
-
首先我们建立一个文件夹,命名为distilbert-base-cased-distilled-squad,然后将词典文件、模型配置文件、模型参数文件三个文件放入这个文件夹,并且将文件重命名为config.json、vocab.txt、pytorch_model.bin即可。
-
在代码中定义模型目录,
DISTILLED = './distilbert-base-cased-distilled-squad'
,完整代码如下。from transformers import AutoTokenizer, AutoModelForQuestionAnswering import torch DISTILLED = './distilbert-base-cased-distilled-squad' tokenizer = AutoTokenizer.from_pretrained(DISTILLED) model = AutoModelForQuestionAnswering.from_pretrained(DISTILLED) text = """ Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. """ questions = [ "How many pretrained models are available in Transformers?", "What does Transformers provide?", "Transformers provides interoperability between which frameworks?", ] for question in questions: inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt") input_ids = inputs["input_ids"].tolist()[0] text_tokens = tokenizer.convert_ids_to_tokens(input_ids) answer_start_scores, answer_end_scores = model(**inputs) answer_start = torch.argmax(answer_start_scores) # Get the most likely beginning of answer with the argmax of the score answer_end = torch.argmax(answer_end_scores) + 1 # Get the most likely end of answer with the argmax of the score answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])) print(f"Question: {question}") print(f"Answer: {answer}\n") 复制代码
参考
huggingface.co/transformer…
这篇关于transformers库学习笔记(一):安装与测试的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-12-27揭秘Fluss 与 Kafka、Paimon 的区别与联系
- 2024-12-27顶级情感分析接口:7个高效API推荐
- 2024-12-26从零开始学习贪心算法
- 2024-12-26线性模型入门教程:基础概念与实践指南
- 2024-12-25探索随机贪心算法:从入门到初级应用
- 2024-12-25树形模型进阶:从入门到初级应用教程
- 2024-12-25搜索算法进阶:新手入门教程
- 2024-12-25算法高级进阶:新手与初级用户指南
- 2024-12-25随机贪心算法进阶:初学者的详细指南
- 2024-12-25贪心算法进阶:从入门到实践