This magic phrase, Data Integrity

Very many people, if not everyone, are now hearing the charming phrase “Data Integrity”. This intuitively recognized concept has quickly burst into pharmaceutical use as a fairly large number of manuals devoted to this issue have been published in recent years:

WHO TRS 996, Annex 5, Guidance on good data and record management practices

Data Integrity and Compliance With Drug CGMP – Questions and Answers – Guidance for Industry

MHRA GxP Data Integrity Definitions and Guidance for Industry


In February 2019, the Russian Guidelines on Data Integrity and Validation of Computerized Systems ’developed in SIDGP with the participation of PQE experts was published. This is a unique document in Russian (considering that all other documents referred to in this context are in English and/or of foreign origin) [current link in English]. In addition to the data integrity, this Manual links that phrase with the actual validation of computerized systems, as these concepts are hardly considered apart from each other .

Of course, I would note that the data integrity concept in the pharmaceutical industry also includes paper documents. However, bear with me while I state two judgments:

 1) hardcopy data integrity is what one should strive for, but it’s actually an utopia when strictly considering this issue and trying to demonstrate adherence to the ALCOA & ALCOA+ principles on the evidence basis;

 2) the number of software solutions in the framework of business process automation almost explicitly leads to discussion of the data integrity in the context of computerized systems and their validation. Therefore, the SIDGP Guide was issued just at the appropriate time and is grouped in a very useful way. If we take foreign analogues, they only supplement numerous basic guides like GAMP 5, PIC/S, OECD, EDQM, etc.

Referring to this Manual, I want to touch on only one, quite innovative aspect. ALCOA & ALCOA+ principles are no longer such in our rapidly changing world. That’s these principles are intuitively understood, and they only actually consolidate related GMP requirements “spread as a thin layer” throughout the entire GMP text. Attributable, Legible, Contemporaneous, Original and Accurate, as well as Complete, Consistent, Enduring and Available – these are the components of the above-mentioned acronyms, the meaning of which is disclosed not only in the Manual, but also in a number of other documents preceding it.

As the author of this outline, I seemed much more serious the following passage in the Manual, which however is not causal one, but also present in the latest Data Integrity Guide from MHRA. This is DIRA (data integrity risk assessment) acronym. Now, when only the lazy one does not mention the risk assessment in season and out of season, this paragraph (6.2 in the SIDGP Manual) can be perceived as the term of the day. However, let’s read the following sentence:

“An example of an acceptable approach is a data integrity risk assessment (DIRA), in which the processes that produce the data or lead to the data generation are mapped, critical impacts are identified, and the inherent risks are documented.”

What is meant by “process mapping”? Does anyone have a clear idea of how this will look in practice? The MHRA states almost literally the same:

«An example of a suitable approach is to perform a data integrity risk assessment (DIRA) where the processes that produce data or where data is obtained are mapped out and each of the formats and their controls are identified and the data criticality and inherent risks documented».

However, the prompts are present literally right next door:

“Risk assessment should be focused on the business process, e.g., production, quality control.”

The fact is that not “mapping”, but modeling of business processes has been very well developed since the 60s of the last century, and already in the 1990s we had software solutions allowing to perform this modelling not only as a kitchen-table effort.

 In the 1960s, the functional modeling approach (SADT = structured analysis and design technique) has been developed; and in the 1990s, the IDEFx family of standards was born. In addition, other business process modeling notations had been developed, e.g. ARIS, DFD, etc. Currently, the BPMN 2.0 methodology has come in full force, and here is a brief outline [in Russian] of these methodologies in relation to the pharma. All of them share one property – this is a kind of “crutch for the brain”, because it is almost impossible to formulate an information model for a future computerized system without special tools at least at the level of requirements. To be more precise, you can try, but it is only paper QMS that successfully overcomes the inconsistencies and conflicts as the paper doesn’t blush. An attempt to give informational support to a “cluttered” business process will result in a “stillborn” application, for example, in terms of either workflow, or accounting for laboratory activities, or the status of raw materials, supplies and finished products, which will be implemented “until the end of time”.

It would be very valuable to know the opinions of the authors of the Manual or expert community, what do they meant under such terms as “process mapping”. I seem that everything logically comes down to the standard business analytics. An attempt to perform the declared actions in some other way will be like to the invention of the wheel.

Here is an abstract example starting from a literal formulation “the processes that produce data are mapped“. How many processes are on the typical pharmaceutical facility? Of course it depends on the decomposition level, but nonetheless? Obviously, these business processes need to be “counted”, “inventoried”.

 Then, speaking of the data. How much data and what data does the typical pharmaceutical facility generate? The response is similar. If someone tries to do this using creative resources, “with a pencil in a notebook”, he’ll inevitably burrow in a routine. These may be sufficient both for a notebook or for a “formal” document with a DIRA title page. But this is not to say you cannot build a functional computerized system in this way, but its development and testing will entail unacceptably high costs for resolving ill-conceived conflicts or contradictions. Indeed, if, for example, your system model was not rationalized regarding how 1C will work, then an error in the model will cause the program to interrupt its function or execute it incorrectly. Moreover, logical errors can be sought for years. And in the meantime, the system will not create the form, will not transfer the function to the next step, will not give authority to the responsible person, etc.

Take a look at the example of a standard (relatively simple) business process in the demo version of the PBPMS CE ELMA system:

This is an example of a business process improvement. How to read this chart made in BPMN 2.0 notation? In broad strokes, there are tracks (pools) of responsibility: Initiator, Process Owner, Contractor.

There are key stages and types of activity: the start of the process (green circle), various operations (user task, notification, etc.) and the logical completion of the process (red circle). This notation allows to find out “who was standing on whom” (c). What is a drawback of paperwork? An uncertainty. Who does fill out a log or form, at what time does he fill it, what data is generated, etc. The BPMN 2.0 notation will largely address such issues, since the models are checked for consistency before they are published. This is the key difference from the “cave art” made in Visio or other graphic editors. In the above simple example, you can readily “intuitively” rough out a scenario. But what if you have thousands of such business processes? And most of the scenarios are intermodular ones? However, the reader should already have a clear answer by this example. The advantage of any business modeling notation is the built-in verifications of the model consistency performed before its implementation.

The value of this notation is that it formalizes the complex of all business processes at the facility, superimposing it on the formed and supported organizational structure. And its advantage in relation to other notations is that the business logic can be automatically translated into code (so-called low code programming) and you can get a final solution at the output, whose model meets everyone needs. For example, it may be an in-house portal.

An additional advantage is that many issues regarding data integrity will be resolved as early as at the design stage. An inconsistent model simply will not go on. And the most common questions, i.e., who, at what moment and what data has entered or has the right to enter, will be addressed at the model verification level.

I will try to develop this topic in the upcoming Russian Validation Week. It is obvious that such issues will become increasingly relevant. I think that relying on standard business process modeling practices is a feasible solution to these issues.

This publication is done specillay for and dedicated for Eastern Economic From -2019. The translation from Russian into English performed by INOPHARMA team.

Leave a Reply

Your email address will not be published. Required fields are marked *