The only difference is that you will have no way of using XPath. * Actually this solution works with ElementTree, too, which is great if you do not want to depend upon lxml. Nonetheless, this code is still fragile, since > is a perfectly valid char in XML, even inside attributes.Īnyway, I have to acknowledge that MattH solution is the real, general solution. Then rsplitting it: > tostring(element).split('>', 1).rsplit('text with data in it.', 'text>\n']Īnd finally getting the first result: > tostring(element).split('>', 1).rsplit('text with data in it.' Get the second resulting string: > tostring(element).split('>', 1) A possible yet still limited solution is to split the string at the first >: > tostring(element).split('>', 1) The solution, of course, is to do everything at once: > tostring(element).replace(''%element.tag, '', 1).replace(''%element.tag, '', -1)ĮDIT: made a good point: this code is fragile since the tag can have attributes. Now, instead of 1, we pass -1 to replace: > tostring(element).replace(''%element.tag, '', -1) Note that str.replace() received 1 as the third parameter, so it will remove only the first occurrence of the opening tag. However, you do not want the external elements, so we can remove them with a simple str.replace() call: > tostring(element).replace(''%element.tag, '', 1) The tostring() function returns a text representation of your element: > tostring(element) That is considerably easy with lxml*, using the parse() and tostring() functions: from lxml.etree import parse, tostringįirst you parse the doc and get your element (I am using XPath, but you can use whatever you want): doc = parse('test.xml') I looked over minidom, etree, lxml and BeautifulSoup but couldn't find a solution for this case (whole content, including inner tags). I lean towards a XML parser based solution. It spans multiple lines: one, two or more.įor now I use regular expressions but it get's kinda messy and I don't like this approach. What I want is the content between the two text tags, including any tags: Some text with data in it. Getting the content in straight cases like title below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags? I try to get the whole content between an opening xml tag and it's closing counterpart.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |