Issue:
After the recent InteleShare release, a small percent of transcoding requests began failing intermittently, in a way that caused the thread that was handling that request to enter an infinite loop waiting for additional data that would never arrive. As these stuck threads accumulated, their combined resources eventually led to performance degradation, and finally to errors or timeouts when the maximum thread pool limit was reached.
Root Cause:
The recent release of InteleShare included updates to a client library used for internal network communication. The new library improved overall performance, but had different timeout behavior which could sometimes cause slow connections to be closed but without passing the error through to other components.
Resolution:
We have adjusted our configuration settings so that the new library behaves similarly to the previous library and the system is now stable.