{"id":6,"date":"2014-02-14T12:39:46","date_gmt":"2014-02-14T04:39:46","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=6"},"modified":"2026-02-18T01:20:37","modified_gmt":"2026-02-17T17:20:37","slug":"lxml-parse-html-or-xml-via-xpath","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/lxml-parse-html-or-xml-via-xpath.html","title":{"rendered":"lxml: Parse HTML or XML with XPath in Python"},"content":{"rendered":"<p><code>lxml<\/code> is the most feature-rich and easy-to-use library for processing XML and HTML in Python.<\/p>\n<pre class=\"line-numbers\"><code class=\"language-py\">import lxml\nimport lxml.html\n\n# HTML\nhtml_text = 'some html string'\ndoc = lxml.html.fromstring(html_text)\nimage_urls = doc.xpath('\/\/img\/@src')\n\nurl = 'http:\/\/www.gotceleb.com\/rosie-huntington-whiteley-vogue-brazil-magazine-april-2013-2013-03-29.html'\ndoc = lxml.html.parse(url)\nimage_urls = doc.xpath('\/\/img\/@src')\n\n# XML\nxml_text = 'some xml string'\ndoc = lxml.etree.fromstring(xml_text)\nimage_urls = doc.xpath('\/\/img\/@src')<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>It might look old school, but `lxml` is the most feature-rich and easy-to-use library for processing XML and HTML in Python.<\/p>\n","protected":false},"author":1,"featured_media":758,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,116],"tags":[2,54],"class_list":["post-6","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-python","category-about-web-development","tag-python","tag-web-crawler"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/6","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=6"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/6\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/758"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=6"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=6"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=6"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}