PDF/A-3 supports the embedding of any file type into PDF documents. This enables the transition from electronic paper to an electronic container that stores both the human and machine-readable versions of a document. Applications can extract the machine-readable portion of a PDF document and process it. A PDF/A-3 document can include an unlimited number of embedded documents for various processes.

This article will teach you how to embed a plain text file in a PDF document and how to extract the attachment from it.

Adding Attachments

This sample code demonstrates how TX Text Control can be used to attach a text file to a PDF document.

        // create a non-UI ServerTextControl instance
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {

  tx.Create();
  // set dummy content
  tx.Text = "PDF Document Content";

  // read the content of the attachment
  string sAttachment = System.IO.File.ReadAllText("attachment.txt");

  // create the attachement
  TXTextControl.EmbeddedFile attachment =
     new TXTextControl.EmbeddedFile(
        "attachment.txt",
        sAttachment,
        null) {
       Description = "My Text File",
       Relationship = "Unspecified",
       MIMEType = "application/txt",
       CreationDate = DateTime.Now,
     };

  // attached the embedded file
  tx.DocumentSettings.EmbeddedFiles =
     new TXTextControl.EmbeddedFile[] { attachment };

  // save as PDF/A
  tx.Save("document.pdf", TXTextControl.StreamType.AdobePDFA);
}

The attached file is represented by the EmbeddedFile object. The constructor accepts the file name, data, and additional meta data. In addition, the MIME type of the attachment (application/text in this case), a textual description, a relationship, and the creation date are provided.

The relationship is an optional string that describes the relationship between the embedded file and the containing document. It can be a predefined value or adhere to the rules for second-class names (ISO 32000-1, Annex E). Predefined values include “Source”, “Data”, “Alternative”, “Supplement”, and “Unspecified”.

When you open the document in Adobe Acrobat Reader, the attachment will appear in the Attachments side panel.

Extracting Attachments

The code below loads the created PDF file in order to locate the attachment by looping through all embedded files. The found attachment is extracted and saved as a text file.

// create a non-UI ServerTextControl instance
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {

  tx.Create();

  // load the PDF document
  TXTextControl.LoadSettings ls = new TXTextControl.LoadSettings();
  tx.Load("document.pdf", TXTextControl.StreamType.AdobePDF, ls);

  // read the attachments
  TXTextControl.EmbeddedFile[] files = ls.EmbeddedFiles;

  // find the specific attachment and save it
  foreach(TXTextControl.EmbeddedFile file in files) {
    if (file.Description == "My Text File") {
      string sAttachment = Encoding.UTF8.GetString((byte[])file.Data);
      System.IO.File.WriteAllText("attachment_read.txt", sAttachment);

      break;
    }
  }
}

Leave a comment

Your email address will not be published. Required fields are marked *