
com.addthis.basis.chars.CharBuf Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of basis Show documentation
Show all versions of basis Show documentation
AddThis core java classes
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.addthis.basis.chars;
import com.google.common.annotations.Beta;
/**
* A variation on ByteBufs for Character Strings. This variation has three primary goals:
*
* 1. Faster serialization and deserialization. Character Strings that are only
* infrequently treated as anything more than byte sequences waste a lot of CPU
* and (although also sort of CPU) heap garbage. This is especially egregious for
* the all too frequent case of deserializing a string, passing it around a few
* threads, and then serializing it again, but it is almost as bad when the only
* operations are comparisons to other Strings.
*
* 1 (Example). In hydra, bundles are sent from a query worker to the master with
* many String values serialized as byte arrays in the UTF-8 format. It is entirely
* possible for that String to be passed to the user without ever being manipulated.
* That means it was deserialized and then reserialized back to the same byte array
* for essentially no reason. That worst case could be resolved by lazy loading or
* a special un-deserializable value, but this does not scale well for the long tail
* of few, low intensity operations like comparisons to other Character Strings.
* Additionally, a lazy loading implementation would be likely implemented as a wrapper
* class. That would cause another layer of indirection and memory waste. This solution
* is closer to 'lazy loading of chars', which actually turns out to be pretty cheap.
*
* 2. Reduced memory overhead. Standard java char types are 16 bits, but for the common case
* of all or mostly ASCII characters, this is twice (or near that) as much memory as needed.
*
* 3. More flexible char[] semantics similar to the difference between byte[]s and ByteBufs. Eg. decreasing
* the number of readable values is possible as a constant time operation without creating a new array.
* String itself is also really, deeply, into making char[] copies. See AsciiSequence.toString() for
* an example of easy it can be to accidentally make lots of array copies, and how hard it is to avoid even
* when you are trying to. (in hydra, AbstractBufferingHttpBundleEncoder ran into a similar issue where it
* was mistakenly creating an unnecessary copy).
*
* * * *
* Secondary goals/ benefits:
* * * *
*
* - Specializing in one encoding with one backing structure allows for much more efficient
* encode and decode methods than those in the standard library due to abstraction limitations.
*
* - Gets around some of the other more egregious inefficiencies with jdk UTF-8 encoding/ decoding
* like decoding pre-allocating three times as much space as needed for the ASCII only case and
* then cutting down by re-allocating to the smaller char array. This implementation allows and
* encourages providing hints about how much to allocate, and should be able to more easily support
* correcting under-estimates (as far as I can tell, the JDK NIO coding library does support that --
* it just isn't actually used anywhere I can find. Possibly because benchmarks showed it wasn't worth
* it, but it is also possible that was due to limitations we do not have here).
*
* - Using CharSequence here and other places gives us more options with respect to optimizing
* things like sub-string semantics (shared/ unshared), and efficient streaming cache hit
* detection.
*
* - Using ByteBufs directly makes integration with other ByteBuf based IO easy and efficient.
*
* This interface combines several related ones and additionally imposes the following contracts:
*
* - all backing data should be stored in UTF-8 format only. UTF-8 is the one
* true format, and heretics will be persecuted without remorse.
*
* - hashCode and equals should return consistent values across implementations
* for the same underlying logic character sequence.
* -- for lack of other motivations, but for possibly no actual benefit, this
* will be the same values that an equivilent String representation would return.
*
* - compareTo should perform lexicographical string comparison.
* -- Note that while such comparisons are likely to be consistent with other
* CharSequence implementations, we cannot actually guarantee that to be the
* case because CharSequence does not require it. Accordingly, we do not derive
* much benefit from declaring Comparable of type CharSequence because eg.
* native Strings declare Comparable only for other Strings.
* -- Also note that the UTF-8 format (which you are required to implement)
* should be able to do lexicographical comparisons without converting to chars
* (byte-wise comparison should suffice).
*
* Component reasoning
*
* CharSequence: to sub in for arbitrary String usages
*
* Appendable: Convenient for building CharSequences, and CharBufs are likely efficient at doing so
*
* Comparable: so that CharBuf only CharSequence environments can use sorted data structures
*
* ByteBufHolder: subject to change, but helpful for resource management, and exposing
* the underlying data store for more efficient operations than per-char method calls.
* Possible replacements for ByteBufHolder might be directly extending ByteBuf with more/
* different char methods, or simply creating a whole char based equivalent with conversions.
*
* Maybe add Iteratable Character, or primitive equivalent?
*/
@Beta
public interface CharBuf extends ReadableCharBuf, Appendable {
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy