spaCy无法下载预训练模型问题解决

spaCy是一款非常好用的自然语言处理工具,不过也许是因为一些原因,无法正常下载spaCy官方的预训练模型了,网络连接被重置,报错ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))。为了解决该问题,可以尝试手动选择链接下载模型资源。

spaCy无法下载预训练模型问题解决

1 问题描述

在开展涉及自然语言处理的研究中,需要对自然语言数据进行一系列处理,因此需要使用spaCy。

不过近期发现,在使用命令下载spaCy的预训练模型时,会遭遇网络连接重置的问题,导致无法正常使用该工具。

具体报错如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
(pytorch) shenjiayun@server3 ~/Dev/VisualEntailment $ python -m spacy download en_core_web_sm
Traceback (most recent call last):
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connection.py", line 421, in connect
tls_in_tls=tls_in_tls,
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 429, in ssl_wrap_socket
sock, context, tls_in_tls, server_hostname=server_hostname
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/retry.py", line 531, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/connection.py", line 421, in connect
tls_in_tls=tls_in_tls,
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 429, in ssl_wrap_socket
sock, context, tls_in_tls, server_hostname=server_hostname
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/__main__.py", line 33, in <module>
plac.call(commands[command], sys.argv[1:])
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/plac_core.py", line 348, in call
cmd, result = parser.consume(arglist)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/plac_core.py", line 217, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/cli/download.py", line 44, in download
shortcuts = get_json(about.__shortcuts__, "available shortcuts")
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/spacy/cli/download.py", line 95, in get_json
r = requests.get(url)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

2 问题分析

不便分析。

相关issue:

Connection Error while installing nlp models #1510

3 问题解决

正常情况下,应该使用官方的命令来安装spaCy工具和最匹配的预训练模型。

1
2
pip install spacy
python -m spacy download en_core_web_sm

但现在网络连接被重置,因此只能通过手动处理了。

spaCy在GitHub上同步存放了可下载的模型。

spaCy models

This repository contains releases of models for the spaCy NLP library. For more info on how to download, install and use the models, see the models documentation.

在无法自动安装的情况下,可以手动选择安装指定的.tar.gz包,例如:

1
2
3
# pip install .tar.gz archive from path or URL
pip install /Users/you/en_core_web_sm-2.1.0.tar.gz
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz

采用此方法,意味着需要手动翻阅release中的资源,找出合适的预训练模型。例如,当前最新的en_core_web_sm模型是en_core_web_sm-2.3.1

也可通过spaCy官方的链接确定合适的预训练模型版本,以en_core_web_sm为例:

English

Available pretrained statistical models for English

执行后可以成功下载和安装预训练模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(pytorch) shenjiayun@server3 ~/Dev/VisualEntailment $ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz (12.0 MB)
|████████████████████████████████| 12.0 MB 427 kB/s
Requirement already satisfied: spacy<2.4.0,>=2.3.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from en-core-web-sm==2.3.1) (2.3.5)
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (7.4.5)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (4.55.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.25.1)
Requirement already satisfied: setuptools in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (51.0.0.post20201207)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.0.4)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (3.0.2)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.5)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.1.0)
Requirement already satisfied: numpy>=1.15.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.19.2)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.0)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (0.4.1)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (0.8.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.0.5)
Requirement already satisfied: importlib-metadata>=0.20 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.0.0)
Requirement already satisfied: zipp>=0.5 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from importlib-metadata>=0.20->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (3.4.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (1.26.2)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /home/shenjiayun/miniconda3/envs/pytorch/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en-core-web-sm==2.3.1) (2.10)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... done
Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.1-py3-none-any.whl size=12047106 sha256=dd9f847a5f35d1760f70b07ab8e8a663ae2a10364a8d43ee93d2b0eab246de3d
Stored in directory: /home/shenjiayun/.cache/pip/wheels/b7/0d/f0/7ecae8427c515065d75410989e15e5785dd3975fe06e795cd9
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.3.1