Skip to content

fix: Tune NiFi startup defaults (election timeout, archive retention)#936

Merged
lfrancke merged 3 commits into
mainfrom
fix/nifi-startup-defaults
May 26, 2026
Merged

fix: Tune NiFi startup defaults (election timeout, archive retention)#936
lfrancke merged 3 commits into
mainfrom
fix/nifi-startup-defaults

Conversation

@lfrancke

Copy link
Copy Markdown
Member

Description

  • Remove the operator override of nifi.cluster.flow.election.max.wait.time, letting NiFi's upstream default of 5 mins take effect. The previous 1-min override was left over from a "for testing" TODO in the operator and may have caused flow election to settle on incomplete vote sets in cold-start scenarios.

  • Set nifi.content.repository.archive.max.retention.period to "3 days". Previously empty, which NiFi interprets as Long.MAX_VALUE and disables time-based archive purge entirely. Without a time-based ceiling the content archive directory can accumulate millions of files, which makes the synchronous startup directory scan in FileSystemRepository very slow. Users requiring a longer content-replay window can override via configOverrides. The provenance audit trail is stored separately and is unaffected by this setting.

Definition of Done Checklist

Author

  • Integration tests passed (for non trivial changes)
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated

Acceptance

  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added

@lfrancke

Copy link
Copy Markdown
Member Author

I did not run the tests (yet) and I won't have time to do so today. I believe this should also not be necessary. If anyone could kick off a test run if you think it's needed I'd appreciate that.

This is the result of a customer support issue.

@lfrancke lfrancke moved this to Development: Waiting for Review in Stackable Engineering May 20, 2026
@lfrancke

Copy link
Copy Markdown
Member Author

For reference: The retention period was default 7 days in NiFi 1.x and changed to 3 hours in 2.x

https://issues.apache.org/jira/browse/NIFI-12132

The reason there was to make it more convenient to run locally but should be adjusted for prod so I did that.
The percentage default changed to 90% but for production use cases I believe 50% to probably be better.

@maltesander maltesander self-requested a review May 20, 2026 11:48
@maltesander maltesander moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering May 20, 2026

@maltesander maltesander left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread rust/operator-binary/src/config/mod.rs
Comment thread CHANGELOG.md Outdated
- Remove the operator override of nifi.cluster.flow.election.max.wait.time,
  letting NiFi's upstream default of 5 mins take effect. The previous 1-min
  override was left over from a "for testing" TODO in the operator and may
  have caused flow election to settle on incomplete vote sets in cold-start
  scenarios.

- Set nifi.content.repository.archive.max.retention.period to "3 days".
  Previously empty, which NiFi interprets as Long.MAX_VALUE and disables
  time-based archive purge entirely. Without a time-based ceiling the
  content archive directory can accumulate millions of files, which makes
  the synchronous startup directory scan in FileSystemRepository very slow.
  Users requiring a longer content-replay window can override via
  configOverrides. The provenance audit trail is stored separately and is
  unaffected by this setting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lfrancke lfrancke force-pushed the fix/nifi-startup-defaults branch from 8bdfe8c to 27671a7 Compare May 20, 2026 15:51

@maltesander maltesander left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lfrancke lfrancke added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit dcdcdd8 May 26, 2026
12 checks passed
@lfrancke lfrancke deleted the fix/nifi-startup-defaults branch May 26, 2026 16:13
Comment on lines -603 to -606
properties.insert(
"nifi.cluster.flow.election.max.wait.time".to_string(),
"1 mins".to_string(),
);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often lower this as it's a real blocker for development.
WDYT of setting this to a very low value in case only a single replica is configured?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle ping @lfrancke :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that usecase there's a better property nifi.cluster.flow.election.max.candidates which we can set to 1 if there is only a single replica. It looks like we don't have the replica count in this part of the code though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great news! It makes so much sense to just set nifi.cluster.flow.election.max.candidates to the number of NiFi nodes (in the 99% case they are static): #953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

3 participants