io.atomix.copycat.server.storage.compaction.MajorCompactionTask Maven / Gradle / Ivy
/*
* Copyright 2015 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License
*/
package io.atomix.copycat.server.storage.compaction;
import io.atomix.catalyst.util.Assert;
import io.atomix.copycat.server.Commit;
import io.atomix.copycat.server.storage.Segment;
import io.atomix.copycat.server.storage.SegmentDescriptor;
import io.atomix.copycat.server.storage.SegmentManager;
import io.atomix.copycat.server.storage.entry.Entry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.function.Predicate;
/**
* Removes tombstones from the log and combines {@link Segment}s to reclaim disk space.
*
* Major compaction is a more heavyweight compaction task which is responsible both for removing tombstone
* {@link Entry entries} from the log and combining groups of neighboring log {@link Segment}s together.
*
* Combining segments
*
* As entries are written to the log and the log rolls over to new segments, entries are compacted out of individual
* segments by {@link MinorCompactionTask}s. However, the minor compaction process only rewrites individual segments
* and doesn't combine them. This will result in an ever growing number of open file pointers. During major compaction,
* the major compaction task rewrites groups of segments provided by the {@link MajorCompactionManager}. For each group
* of segments, a single compact segment will be created with the same {@code version} and starting {@code index} as
* the first segment in the group. All entries from all segments in the group that haven't been
* {@link io.atomix.copycat.server.storage.Log#clean(long) cleaned} will then be written to the new compact segment.
* Once the rewrite is complete, the compact segment will be locked and the set of old segments deleted.
*
* Removing tombstones
*
* Tombstones are {@link Entry entries} in the log which amount to state changes that remove state. That is,
* tombstones are an indicator that some set of prior entries no longer contribute to the state of the system. Thus,
* it is critical that tombstones remain in the log as long as any prior related entries do. If a tombstone is removed
* from the log before its prior related entries, rebuilding state from the log will result in inconsistencies.
*
* A significant objective of the major compaction task is to remove tombstones from the log in a manor that ensures
* failures before, during, or after the compaction task will not result in inconsistencies when state is rebuilt from
* the log. In order to ensure tombstones are removed only after any prior related entries, the major compaction
* task simply compacts segments in sequential order from the {@link Segment#firstIndex()} of the first segment to the
* {@link Segment#lastIndex()} of the last segment. This ensures that if a failure occurs during the compaction process,
* only entries earlier in the log will have been removed, and potential tombstones which erase the state of those entries
* will remain.
*
* Nevertheless, there are some significant potential race conditions that must be considered in the implementation of
* major compaction. The major compaction task assumes that state machines will always clean related entries
* in monotonically increasing order. That is, if a state machines receives a {@link io.atomix.copycat.server.Commit}
* {@code remove 1} that deletes the state of a prior {@code Commit} {@code set 1}, the state machine will call
* {@link Commit#clean()} on the {@code set 1} commit before cleaning the {@code remove 1} commit. But even if applications
* clean entries from the log in monotonic order, and the major compaction task compacts segments in sequential order,
* inconsistencies can still arise. Consider the following history:
*
* - {@code set 1} is at index {@code 1} in segment {@code 1}
* - {@code remove 1} is at index {@code 12345} in segment {@code 8}
* - The major compaction task rewrites segment {@code 1}
* - The application cleans {@code set 1} at index {@code 1} in the rewritten version of segment {@code 1}
* - The application cleans {@code remove 1} at index {@code 12345} in segment {@code 8}, which the compaction task
* has yet to compact
* - The compaction task compacts segments {@code 2} through {@code 8}, removing tombstone entry {@code 12345} during
* the process
*
*
* In the scenario above, the resulting log contains {@code set 1} but not {@code remove 1}. If we replayed those entries
* as {@link Commit}s to the log, it would result in an inconsistent state. Worse yet, not only is this server's state
* incorrect, but it will be inconsistent with other servers which are likely to have correctly removed both entry
* {@code 1} and entry {@code 12345} during major compaction.
*