Find, Extract and Save Embedded XML File in PDF

1 week ago 4
ARTICLE AD BOX

I am currently trying to extract and save an embedded file from a PDF. It's about the German "ZUGFeRD" PDF (electronic invoice). In this PDF, there is a XML file named zugferd-invoice.xml or factur-x.xml. I need to extract this from the PDF and save it as Factur1234.xml.

After a lot of searching, I found iText Core 9.5. Now I am trying to solve my problem with it. But I can't find a way to save the XML file.

I only can find the file with this code:

PdfDocument pdfDocument = new PdfDocument(new PdfReader(SRC)); PdfNameTree names = pdfDocument.GetCatalog().GetNameTree(PdfName.EmbeddedFiles); foreach (var entry in names.GetNames()) { if (entry.Key.ToString() == "zugferd-invoice.xml" || entry.Key.ToString() == "factur-x.xml") { //do something } }

But I can't find a way to save this as a file.

Or I can find and save all streams in the PDF. But there I can't find a way to get the names of the streams to save ONLY the correct file.

Here is my attempt:

int numberOfPdfObject = pdfDocument.GetNumberOfPdfObjects(); for (int i = 1; i <= numberOfPdfObject; i++) { PdfObject obj = pdfDocument.GetPdfObject(i); if (obj != null && obj.IsStream()) { byte[] b; try { b = ((PdfStream)obj).GetBytes(); } catch (PdfException) { b = ((PdfStream)obj).GetBytes(false); } using (FileStream fos = new FileStream(String.Format(DEST + "/extract_streams{0}.dat", i), FileMode.Create)) { fos.Write(b, 0, b.Length); } } }

How can I save only the XML file with the correct name?

I forgot to mention my tech stack:

C# Visual Studio 2026 iText 9.5.0 Win 10 64bit VMware 25.0.1.25219725

Have a nice evening

Greg

Read Entire Article