Root Cause:
The database load increased due to an abnormally high number of database locks which started generating errors on some queries related to study uploads. These locks were related to some database optimization work hitting backend limits and causing longer than expected locks on the system.
Remediation:
The following actions were performed:
• Increased the existing database cluster resources.
• Created a secondary database cluster to receive new study uploads (ingestion.)
• This scaling helped immediately with newly sent studies. However, pending studies with failures were not able to process until removed a newly introduced configuration that was causing the increased locks that was part of a set of database optimization work. Its removal does not impact the end benefit.