All Downloads are FREE. Search and download functionalities are using the official Maven repository.

io.atomix.copycat.server.storage.compaction.MajorCompactionTask Maven / Gradle / Ivy

There is a newer version: 1.2.8
Show newest version
/*
 * Copyright 2015 the original author or authors.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License
 */
package io.atomix.copycat.server.storage.compaction;

import io.atomix.catalyst.util.Assert;
import io.atomix.copycat.server.Commit;
import io.atomix.copycat.server.storage.Segment;
import io.atomix.copycat.server.storage.SegmentDescriptor;
import io.atomix.copycat.server.storage.SegmentManager;
import io.atomix.copycat.server.storage.entry.Entry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.function.Predicate;

/**
 * Removes tombstones from the log and combines {@link Segment}s to reclaim disk space.
 * 

* Major compaction is a more heavyweight compaction task which is responsible both for removing tombstone * {@link Entry entries} from the log and combining groups of neighboring log {@link Segment}s together. *

* Combining segments *

* As entries are written to the log and the log rolls over to new segments, entries are compacted out of individual * segments by {@link MinorCompactionTask}s. However, the minor compaction process only rewrites individual segments * and doesn't combine them. This will result in an ever growing number of open file pointers. During major compaction, * the major compaction task rewrites groups of segments provided by the {@link MajorCompactionManager}. For each group * of segments, a single compact segment will be created with the same {@code version} and starting {@code index} as * the first segment in the group. All entries from all segments in the group that haven't been * {@link io.atomix.copycat.server.storage.Log#clean(long) cleaned} will then be written to the new compact segment. * Once the rewrite is complete, the compact segment will be locked and the set of old segments deleted. *

* Removing tombstones *

* Tombstones are {@link Entry entries} in the log which amount to state changes that remove state. That is, * tombstones are an indicator that some set of prior entries no longer contribute to the state of the system. Thus, * it is critical that tombstones remain in the log as long as any prior related entries do. If a tombstone is removed * from the log before its prior related entries, rebuilding state from the log will result in inconsistencies. *

* A significant objective of the major compaction task is to remove tombstones from the log in a manor that ensures * failures before, during, or after the compaction task will not result in inconsistencies when state is rebuilt from * the log. In order to ensure tombstones are removed only after any prior related entries, the major compaction * task simply compacts segments in sequential order from the {@link Segment#firstIndex()} of the first segment to the * {@link Segment#lastIndex()} of the last segment. This ensures that if a failure occurs during the compaction process, * only entries earlier in the log will have been removed, and potential tombstones which erase the state of those entries * will remain. *

* Nevertheless, there are some significant potential race conditions that must be considered in the implementation of * major compaction. The major compaction task assumes that state machines will always clean related entries * in monotonically increasing order. That is, if a state machines receives a {@link io.atomix.copycat.server.Commit} * {@code remove 1} that deletes the state of a prior {@code Commit} {@code set 1}, the state machine will call * {@link Commit#clean()} on the {@code set 1} commit before cleaning the {@code remove 1} commit. But even if applications * clean entries from the log in monotonic order, and the major compaction task compacts segments in sequential order, * inconsistencies can still arise. Consider the following history: *

    *
  • {@code set 1} is at index {@code 1} in segment {@code 1}
  • *
  • {@code remove 1} is at index {@code 12345} in segment {@code 8}
  • *
  • The major compaction task rewrites segment {@code 1}
  • *
  • The application cleans {@code set 1} at index {@code 1} in the rewritten version of segment {@code 1}
  • *
  • The application cleans {@code remove 1} at index {@code 12345} in segment {@code 8}, which the compaction task * has yet to compact
  • *
  • The compaction task compacts segments {@code 2} through {@code 8}, removing tombstone entry {@code 12345} during * the process
  • *
*

* In the scenario above, the resulting log contains {@code set 1} but not {@code remove 1}. If we replayed those entries * as {@link Commit}s to the log, it would result in an inconsistent state. Worse yet, not only is this server's state * incorrect, but it will be inconsistent with other servers which are likely to have correctly removed both entry * {@code 1} and entry {@code 12345} during major compaction. *

* In order to prevent such a scenario from occurring, the major compaction task takes an immutable snapshot of the * cleaned offsets underlying all the segments to be compacted prior to rewriting any entries. This ensures that any * entries cleaned after the start of rewriting segments will not be considered for compaction during the execution * of this task. * * @author (groups.size()); for (List group : groups) { List> groupCleaners = new ArrayList<>(group.size()); for (Segment segment : group) { groupCleaners.add(segment.cleanPredicate()); } cleaners.add(groupCleaners); } } /** * Compacts all compactable segments. */ private void compactGroups() { for (int i = 0; i < groups.size(); i++) { List group = groups.get(i); List> groupCleaners = cleaners.get(i); Segment segment = compactGroup(group, groupCleaners); updateCleaned(group, groupCleaners, segment); deleteGroup(group); } } /** * Compacts a group. */ private Segment compactGroup(List segments, List> cleaners) { // Get the first segment which contains the first index being cleaned. The clean segment will be written // as a newer version of the earliest segment being rewritten. Segment firstSegment = segments.iterator().next(); // Create a clean segment with a newer version to which to rewrite the segment entries. Segment compactSegment = manager.createSegment(SegmentDescriptor.builder() .withId(firstSegment.descriptor().id()) .withVersion(firstSegment.descriptor().version() + 1) .withIndex(firstSegment.descriptor().index()) .withMaxSegmentSize(segments.stream().mapToLong(s -> s.descriptor().maxSegmentSize()).max().getAsLong()) .withMaxEntries(segments.stream().mapToInt(s -> s.descriptor().maxEntries()).max().getAsInt()) .build()); compactGroup(segments, cleaners, compactSegment); // Replace the rewritten segments with the updated segment. manager.replaceSegments(segments, compactSegment); return compactSegment; } /** * Compacts segments in a group sequentially. * * @param segments The segments to compact. * @param compactSegment The compact segment. */ private void compactGroup(List segments, List> cleaners, Segment compactSegment) { // Iterate through all segments being compacted and write entries to a single compact segment. for (int i = 0; i < segments.size(); i++) { compactSegment(segments.get(i), cleaners.get(i), compactSegment); } } /** * Compacts the given segment. * * @param segment The segment to compact. * @param compactSegment The segment to which to write the compacted segment. */ private void compactSegment(Segment segment, Predicate cleaner, Segment compactSegment) { for (long i = segment.firstIndex(); i <= segment.lastIndex(); i++) { compactEntry(i, segment, cleaner, compactSegment); } } /** * Compacts the entry at the given index. * * @param index The index at which to compact the entry. * @param segment The segment to compact. * @param compactSegment The segment to which to write the cleaned segment. */ private void compactEntry(long index, Segment segment, Predicate cleaner, Segment compactSegment) { try (Entry entry = segment.get(index)) { // If an entry was found, remove the entry from the segment. if (entry != null) { // If the entry has been cleaned, skip the entry in the compact segment. // Note that for major compaction this process includes normal and tombstone entries. long offset = segment.offset(index); if (offset == -1 || cleaner.test(offset)) { compactSegment.skip(1); LOGGER.debug("Cleaned entry {} from segment {}", index, segment.descriptor().id()); } // If the entry hasn't been cleaned, simply transfer it to the new segment. else { compactSegment.append(entry); } } // If the entry has already been compacted, skip the index in the segment. else { compactSegment.skip(1); } } } /** * Updates the new compact segment with entries that were cleaned during compaction. */ private void updateCleaned(List segments, List> cleaners, Segment compactSegment) { for (int i = 0; i < segments.size(); i++) { updateCleanedOffsets(segments.get(i), cleaners.get(i), compactSegment); } } /** * Updates the new compact segment with entries that were cleaned in the given segment during compaction. */ private void updateCleanedOffsets(Segment segment, Predicate cleaner, Segment compactSegment) { for (long i = segment.firstIndex(); i <= segment.lastIndex(); i++) { long offset = segment.offset(i); if (offset != -1 && cleaner.test(offset)) { compactSegment.clean(offset); } } } /** * Completes compaction by deleting old segments. */ private void deleteGroup(List group) { // Delete the old segments. for (Segment oldSegment : group) { oldSegment.delete(); } } @Override public String toString() { return String.format("%s[segments=%s]", getClass().getSimpleName(), groups); } }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy