ExpatError not well-formed (invalid token)

xmltodict是一个用于解析xml的Python第三方库,非常好用。

问题

最近在项目中使用xmltodict时碰到1个bug,代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import xmltodict

s = """<?xml version="1.0" encoding="UTF-8"?>
<prefetch-result>
<items>
<item>
<path>https://blog.yeyuqiu.com/test?Signature=Z58tstqe&AccessKeyId=319a693</path>
<status>success</status>
</item>
</items>
</prefetch-result>
"""

res = xmltodict.parse(s)

代码运行后报错信息为 ExpatError: not well-formed (invalid token)

原因

xml中 '<' '&' 属于特殊字符,当url中包含这2个特殊字符时,必须进行转义。

修复

下面给出一种常见的解决方案,以供参考:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import xmltodict
from xml.parsers.expat import ExpatError

s = """<?xml version="1.0" encoding="UTF-8"?>
<prefetch-result>
<items>
<item>
<path>https://blog.yeyuqiu.com/test?Signature=Z58tstqe&AccessKeyId=319a693</path>
<status>success</status>
</item>
</items>
</prefetch-result>
"""

try:
    ret = xmltodict.parse(s)
except ExpatError:
    ret = xmltodict.parse(s.replace("&", "&amp;").replace("<", "&lt;"))

参考资源

  1. What are the special characters in XML?
  2. Can't parse URL attributes with & symbol in it
updatedupdated2020-05-152020-05-15
加载评论