PDF/A-3 supports the embedding of any file type into PDF documents. This enables the transition from electronic paper to an electronic container that stores both the human and machine-readable versions of a document. Applications can extract the machine-readable portion of a PDF document and process it. A PDF/A-3 document can include an unlimited number of embedded documents for various processes.
This article will teach you how to embed a plain text file in a PDF document and how to extract the attachment from it.
Adding Attachments
This sample code demonstrates how TX Text Control can be used to attach a text file to a PDF document.
// create a non-UI ServerTextControl instance using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) { tx.Create(); // set dummy content tx.Text = "PDF Document Content"; // read the content of the attachment string sAttachment = System.IO.File.ReadAllText("attachment.txt"); // create the attachement TXTextControl.EmbeddedFile attachment = new TXTextControl.EmbeddedFile( "attachment.txt", sAttachment, null) { Description = "My Text File", Relationship = "Unspecified", MIMEType = "application/txt", CreationDate = DateTime.Now, }; // attached the embedded file tx.DocumentSettings.EmbeddedFiles = new TXTextControl.EmbeddedFile[] { attachment }; // save as PDF/A tx.Save("document.pdf", TXTextControl.StreamType.AdobePDFA); }
The attached file is represented by the EmbeddedFile object. The constructor accepts the file name, data, and additional meta data. In addition, the MIME type of the attachment (application/text in this case), a textual description, a relationship, and the creation date are provided.
The relationship is an optional string that describes the relationship between the embedded file and the containing document. It can be a predefined value or adhere to the rules for second-class names (ISO 32000-1, Annex E). Predefined values include “Source”, “Data”, “Alternative”, “Supplement”, and “Unspecified”.
When you open the document in Adobe Acrobat Reader, the attachment will appear in the Attachments side panel.
Extracting Attachments
The code below loads the created PDF file in order to locate the attachment by looping through all embedded files. The found attachment is extracted and saved as a text file.
// create a non-UI ServerTextControl instance using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) { tx.Create(); // load the PDF document TXTextControl.LoadSettings ls = new TXTextControl.LoadSettings(); tx.Load("document.pdf", TXTextControl.StreamType.AdobePDF, ls); // read the attachments TXTextControl.EmbeddedFile[] files = ls.EmbeddedFiles; // find the specific attachment and save it foreach(TXTextControl.EmbeddedFile file in files) { if (file.Description == "My Text File") { string sAttachment = Encoding.UTF8.GetString((byte[])file.Data); System.IO.File.WriteAllText("attachment_read.txt", sAttachment); break; } } }