Auto Classification in Records Management

The landscape of Records Management (RM) has drastically changed in the last 20 years. Once a manageable and largely manual task limited to storage rooms and archivists, records management has grown to exponential proportions, with larger and larger amounts of new records being created and stored digitally every day. High storage costs and the risk of unauthorized access to sensitive information are more common than ever. And yet, the advantages of investing in sound RM tools are often overlooked or undervalued.

What is Records Management?

Records Management implies the supervision and administration of physical and digital business records throughout their life-cycle, from creation and distribution to archival and disposal. RM programs establish how business records are used throughout the organization and ensure that all legal, administrative, fiscal and historical requirements established by regulatory entities such as the Government are met.

One of the main tools of RM programs is the retention schedule, which establishes how long a record should be retained within the organization, as well as how and when the record should be disposed.

Although RM is often viewed as a mundane regulatory task that doesn’t contribute to a business’ bottom line, robust RM plans not only reduce storage costs but also reduce potential compliance and litigation costs such as sanctions, fines, and eDiscovery fees. Record Management Plans also:

  • Ensure regulatory compliance
  • Enhance audit performance
  • Provide sound defense for records and retention management

The challenges of Records Management today

The challenges of Records Management today are greater than they used to be.  Deciding on what is, or is not a record is only the beginning.  Records Managers have additional considerations such as:

  • Keeping storage costs low
  • Managing large volumes of content
  • Managing inherited or historical content:
    • Incorporating content from legacy systems
    • Incorporating content from mergers or acquisitions
  • Separating records from transient or non-business-related content (“non-records”)

Record Management and Classification

With the right AI and Analytics technology, much of the burden for these challenges can be addressed, particularly through automated classification.

Manual Classification

Manual classification is time-consuming, resource-intensive, and cognitively taxing for those carrying it out manually. Here are some characteristics of a manual classification process.

  • High cost: Manual classification projects are time-consuming and require a considerable amount of manpower
  • Lack of engagement: Data classification is often perceived as a tedious, low-value, extra task that doesn’t contribute to a business’ bottom line. Faced with this, an unspecialized employee’s natural response is to find ways to expedite the task, resulting in low-quality tagging. (Inaccuracy and inconsistency)
  • High risk for error: Even with the best intentions, humans do not do well on tedious or repetitive tasks, and monotony will eventually result in error. (Inaccuracy)
  • Subjectivity: When problems are complex, humans rely on experience and intuition to solve them. As no two human experiences are alike, trying to reach and follow a consensus is often difficult. (Inconsistency)

So, what does one do when data is too abundant to classify manually? How does one deal with a necessary classification that is tedious and expensive?


Auto-classification is an automated process in which the contents of a document are scanned and automatically given a tag or label based on examples, keywords and a classification scheme established by an expert. Autoclassification provides insight into the kinds of content a company possesses and simplifies the record management process.

Auto-classification simplifies and improves a company’s capacity to understand large, and potentially unfamiliar bodies of unstructured content and facilitates:

  • Inherited record exploration. (Improved legacy, merger and acquisition record management)
  • Metadata indexing. (Improved search speed, improved relevance in results, improved filtering accuracy)
  • Consistent tagging. (Improved accuracy and defensibility)
  • Disposal of ROT (redundant, obsolete, trivial) content. (Reduced storage costs)
  • Application of retention schedules. (Improved regulatory compliance)
  • E-discovery. (Reduced legal costs)
  • Identification of potentially harmful content. (Reduced legal risk)

Magellan Text Mining

OpenText™ Magellan™ Text Mining offers automated classification, and is a keyword-based solution that leverages supervised machine learning and expert-defined rules to define and identify categories based on semantics (meaning and themes) and textual markers (keywords).  Even files containing text as images can be pre-processed with an integrated optical character recognition (OCR) solution.

How can we leverage Magellan to automatically augment Records Management classifications in a content services platform, like OpenText™ Extended ECM, OpenText™ Documentum™ or OpenText™ Core?  Enter the Magellan Content Enrichment Solution.

Magellan Content Enrichment Solution

OpenText Magellan’s Content Enrichment Solution utilizes text mining technology to deliver improved efficiencies in content findability, information governance and records management through data-driven processes and automated workflows.

Unlock metadata-driven ECM experiences to:

  • Automatically tag content with relevant metadata
  • Search with semantic facets and filters
  • Discover your organization’s content
  • Improve records management processes

To learn more about OpenText Magellan solutions, contact us

Author: Alexandra Freeman is a Computational Linguist with Professional Services. Now specializing in autoclassification, ontology assessment, and machine learning, Alexandra holds a master’s degree in Applied Linguistics and has several years of postgraduate research experience in Forensic Linguistics.

Leave a Reply

Your email address will not be published. Required fields are marked *