XXE (XML External Entity) Attacks

What are XXE attacks?

Easy
20 min

Unexpected Features

Vulnerabilities come in a plethora of different names, but they can generally be divided into a significantly smaller number of basic concepts. One of these concepts is "unexpected features", which refers to a vulnerability that arises from a developer using a software component, tool, protocol, etc. whose functionality they do not fully understand.

Here is one excellent example of a 20,000€ bug bounty that GitLab paid on the Hackerone platform for a remote code execution vulnerability found in their markdown editor. The vulnerability was found in their markdown parser (KramDown) which they were not aware of, and it allowed for performing malicious actions by inputting a certain format of markdown into the application. https://hackerone.com/reports/1125425

However, much more common vulnerability in this category is XXE, which we learn in this course. XXE is an abbreviation for XML External Entity, which means the external entity of the XML protocol. So, it is a part of the XML protocol that developers often are not aware of.

What is XML?

XML (Extensible Markup Language) is a text-based protocol used to represent structured data. If you have seen HTML code, you are probably already familiar with the structure of XML, as HTML is an XML-based language.

XML mainly consists of tags and attributes. Tags look like this <TAG START>TAG CONTENT</TAG END>. For example, a car with a brand "Lada" and a year model "1955" could be represented in XML as a car tag, inside which there is a brand tag with a value of Lada, and a year model tag with a value of 1955.

<car>
  <character>Load</character>
  <model year>1955</model year>
</auto>

Attributes, on the other hand, look like this:

<car brand="Lada" year model="1955"/>

Usually XML has both mixed together. So the structure is: tags, text or tags inside the tags, and all tags can have attributes.

DTD

DTD is an abbreviation for Document Type Definition and it can be used to define the structure of an XML document. The definition is done with the DOCTYPE tag as follows.

<!DOCTYPE root element [
    elements and entities
]>

Here is an example of a completed DOCTYPE auto, which tells that:

  • <!DOCTYPE auto: The root of the XML document is the auto tag
  • <!ELEMENT auto (make, model)>: the auto tag must contain the make and model tags
  • <!ELEMENT character (#PCDATA)>: The type of the character tag is #PCDATA
  • <!ELEMENT vuosimalli (#PCDATA)>: The type of the vuosimalli tag is #PCDATA.
<!DOCTYPE car
[
  <!ELEMENT car (make, model year)>
  <!ELEMENT token (#PCDATA)>
  <!ELEMENT Year Model (#PCDATA)>
]>
<car>
  <character>Load</character>
  <model year>1955</model year>
</auto>

PCDATA means text (parseable character data).

Entities

XML is a text-based protocol in which certain characters have special meaning. For example, the < character opens a new tag. So, if you wanted to create a car whose make is Volvo<XC60>, it wouldn't work directly because the XML syntax would break.

<car>
  <brand>Volvo<XC60></brand>
  <year model>2013</year model>
</auto>

For this reason, certain characters are represented as entities in XML syntax if you do not want the characters to become part of the XML structure. For example, the < character can be expressed as the entity &lt; and the > character as the entity &gt;.

<car>
  <brand>Volvo&lt;XC60&gt;</brand>
  <year model>2013</year model>
</auto>

DTD Entities

Entities can also be defined by yourself. For example, you can define that where the entity &gt; means the > character, the entity &lempimerkki; means the value Volvo. This can be done in the DTD declaration as follows:

<!DOCTYPE car
[
  <!ENTITY nickname "Volvo">
]>
<car>
  <character>&favourite;</character>
  <model year>1955</model year>
</auto>

Then the XML handler would output:

<car>
  <brand>Volvo</brand>
  <model year>1955</model year>
</auto>

External Entities

XML also supports the use of external entities, so that XML documents can be assembled from several different parts. An external entity is defined by the keyword SYSTEM or PUBLIC following the name of the entity. Think of a book, for example, with three different chapters. The book could be made up of XML documents book.xml, chapter1.xml, chapter2.xml, and chapter3.xml.

The content of the paragraphs could be for example:

<paragraph>
  It was a dark and stormy night...
</paragraph>

And the content of the book.xml could be:

<!DOCTYPE book[
<!ENTITY body1 SYSTEM "body1.xml">
<!ENTITY body2 SYSTEM "body2.xml">
<!ENTITY paragraph3 SYSTEM "paragraph3.xml">
]>
<book>
        &paragraph1;
        &paragraph2;
        &paragraph3;
</book>

This is how an XML handler would output something like this, and the structure of the book and paragraphs remain nicely separated in their own files.

<book>
  <paragraph>
    It was a dark and stormy night...
  </paragraph>
  <paragraph>
    Upon reaching our heroes...
  </paragraph>
  <paragraph>
    Out of the blue...
  </paragraph>
</book>

Vulnerability

If you have been following along this far, XXE vulnerability is quite simple to understand. The vulnerability involves:

  • The application has code that handles XML.
  • The application's XML handler has not been configured to prevent the use of external entities.
  • The attacker is able to input XML to the application for processing.
  • An attacker is able to refer to files on the server's disk with external entities.
  • The attacker can read the processed XML which contains the content of the file desired by the attacker.

So in such cases, it is possible for the attacker to read files from the server's disk.

Good to know at this stage

There are many variations in XXE vulnerabilities and ways to exploit them. Depending on the programming language, XML handler, and platform on which the application is running, XXE can be used to do different things. It is not necessary to see the XML response either, as there are techniques to leak information over the network. We will revisit these topics later in this course.

Exercise

Do the exercise below. With the knowledge you have gained so far, it is possible to solve the task and read the secret recipe file from the server.

Prescription theft

In this exercise, you get to try XXE attack against an XML-based pizza ordering service.

Objective

Steal the secret recipe from the file /secret-recipe.txt

Exercises

Flag

Find the flag from the lab environment and enter it below.

hakatemia pro

Ready to become an ethical hacker?
Start today.

As a member of Hakatemia you get unlimited access to Hakatemia modules, exercises and tools, and you get access to the Hakatemia Discord channel where you can ask for help from both instructors and other Hakatemia members.