python - Tags with : in name in lxml -
i'm trying use lxml.etree parse wordpress export document (it's xml, rss like). i'm interested in published posts, i'm using following loop through published posts:
for item in data.findall("item"): if item.find("wp:post_type").text != "post": continue if item.find("wp:status").text != "publish": continue write_post(item)
where data
tag item
tags found in. item
tags contain posts, pages, , drafts. problem lxml can't find tags have :
in name (e.g. wp:post_type
). when try item.find("wp:post_type")
error:
traceback (most recent call last): file "<input>", line 1, in <module> file "lxml.etree.pyx", line 1279, in lxml.etree._element.find (src/lxml/lxml.e tree.c:38124) file "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 210, in f ind = iterfind(elem, path) file "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 200, in terfind selector = _build_path_iterator(path) file "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 184, in _ build_path_iterator selector.append(ops[token[0]](_next, token)) keyerror: ':'
i assume keyerror : ':'
refers colon in name of tag being invalid. there way can escape colon lxml finds right tag? :
have special meaning in context? or doing wrong? appreciated.
the :
xml namespace separator. escape colon in lxml, need replace namespace url within curly braces, in item.find("{http://example.org/}status").text
.
Comments
Post a Comment