Skip to content

[Bug]: PostgreSQL container intermittently fails to start with "Wait strategy failed. Container is removed" (TimeoutException) in CI environment #11860

Description

@vpelikh

Module

Core

Testcontainers version

2.0.5

Using the latest Testcontainers version?

Yes

Host OS

Linux (Jenkins agent running inside a Kubernetes pod)

Host Arch

amd64

Docker version

Client: Docker Engine - Community
  Version:           24.0.6
  API version:       1.43
  Go version:        go1.20.7
  Git commit:        ed223bc
  Built:             Mon Sep  4 12:32:10 2023
  OS/Arch:           linux/amd64
  Context:           default
 
 Server: Docker Engine - Community
  Engine:
   Version:          28.3.3
   API version:      1.51 (minimum version 1.24)
   Go version:       go1.24.5
   Git commit:       bea959c
   Built:            Fri Jul 25 11:33:59 2025
   OS/Arch:          linux/amd64
   Experimental:     true
  containerd:
   Version:          1.7.27
   GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
  runc:
   Version:          1.2.5
   GitCommit:        v1.2.5-0-g59923ef
  docker-init:
   Version:          0.19.0
   GitCommit:        de40ad0

What happened?

We are using a plain PostgreSQLContainer (no custom wait strategies, no extra performance flags) in our integration tests. The container sometimes fails to start with a timeout, and the error is flaky – it happens a few times a day on our CI (Jenkins), while locally it works consistently.

When the failure occurs, we see:

java.lang.IllegalStateException: Wait strategy failed. Container is removed
Caused by: org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException

The container is removed without any stdout/stderr logs available. The issue reproduces with official images (postgres:16, postgres:17, postgres:18).

What we have tried (without success):

  • Increasing startup timeout (up to 3 minutes).
  • Using different PostgreSQL image tags.
  • Running docker pull and docker run manually on the same agent – the container starts fine.
  • Removing all custom container settings (memory limits, environment variables).
  • The error is intermittent – sometimes tests pass, sometimes they fail.

Relevant log output

Caused by:
org.testcontainers.containers.ContainerLaunchException: Container startup failed for image postgres:18
    at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:346)
    at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317)
    at org.springframework.boot.testcontainers.lifecycle.TestcontainersStartup.start(TestcontainersStartup.java:109)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.boot.testcontainers.lifecycle.TestcontainersStartup$1.start(TestcontainersStartup.java:50)
    at org.springframework.boot.testcontainers.lifecycle.TestcontainersLifecycleBeanPostProcessor.start(TestcontainersLifecycleBeanPostProcessor.java:127)
    at org.springframework.boot.testcontainers.lifecycle.TestcontainersLifecycleBeanPostProcessor.initializeStartables(TestcontainersLifecycleBeanPostProcessor.java:115)
    at org.springframework.boot.testcontainers.lifecycle.TestcontainersLifecycleBeanPostProcessor.postProcessAfterInitialization(TestcontainersLifecycleBeanPostProcessor.java:88)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsAfterInitialization(AbstractAutowireCapableBeanFactory.java:442)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1818)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:603)
    ... 44 more

Caused by:
org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
    at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
    at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:331)
    ... 54 more

Caused by:
org.testcontainers.containers.ContainerLaunchException: Could not create/start container
    at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:551)
    at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:341)
    at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
    ... 55 more

Caused by:
java.lang.IllegalStateException: Wait strategy failed. Container is removed
    at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:498)
    ... 57 more

Caused by:
org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException
    at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:70)
    at org.rnorth.ducttape.timeouts.Timeouts.doWithTimeout(Timeouts.java:60)
    at org.testcontainers.containers.wait.strategy.WaitAllStrategy.waitUntilReady(WaitAllStrategy.java:54)
    at org.testcontainers.postgresql.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:142)
    at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:487)
    ... 57 more

Caused by:
java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:206)
    at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:65)
    ... 61 more

Additional Information

Additional Information

  • Java version: 25.0.1
  • CI environment: Jenkins agent inside a Kubernetes pod
  • Relevant environment variables:
    • TESTCONTAINERS_RYUK_DISABLED=true
    • TESTCONTAINERS_HOST_OVERRIDE=<docker-host-ip> (set to the host's IP)
    • TESTCONTAINERS_REUSE_ENABLE=true

Questions:

  1. Are there known issues with Testcontainers 2.0.5 and Docker server 28.x, or with cgroup v2 on the host?
  2. Could the environment variables interfere with container startup or waiting?
  3. Is there a way to get more verbose logs from Testcontainers (e.g., enabling DEBUG logging for org.testcontainers, or any system property)?
  4. What could cause the container to be removed before the wait strategy completes – maybe resource pressure on the Kubernetes pod?
  5. Are there recommended workarounds for flaky PostgreSQL startup in CI?

We are happy to provide more details or run diagnostic commands. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions