Class CompressionHeaderEncodingMap


  • public class CompressionHeaderEncodingMap
    extends Object
    Maintains a map of DataSeries to EncodingDescriptor, and a second map that contains the compressor to use for each EncodingDescriptor that represents an EXTERNAL encoding. There are two constructors; one populates the map from scratch using the default encodings chosen by this (htsjdk) implementation, used when writing a new CRAM; one populates the map from a serialized CRAM stream resulting in encodings chosen by the implementation that wrote that CRAM. Although the CRAM spec defines a fixed list of data series, individual CRAM implementations may choose to use only a subset of these. Therefore, the actual set of encodings that are instantiated can vary depending on the source. Notes on the htsjdk CRAM write implementation: This implementation encodes ALL DataSeries to external blocks, (although some of the external encodings split the data between core and external; see ByteArrayLenEncoding, and does not use the 'BB' or 'QQ' DataSeries when writing CRAM at all. Relies heavily on GZIP and RANS for compression. See EncodingFactory for details on how an EncodingDescriptor is mapped to the codec that actually transfers data to and from underlying Slice blocks.
    • Constructor Detail

      • CompressionHeaderEncodingMap

        public CompressionHeaderEncodingMap​(CRAMEncodingStrategy encodingStrategy)
        Constructor used to create the default encoding map for writing CRAMs. The encoding strategy parameter values are used to set compression levels, etc, but any encoding map embedded is ignored since this uses the default strategy.
        Parameters:
        encodingStrategy - CRAMEncodingStrategy containing parameter values to use when creating the encoding map
      • CompressionHeaderEncodingMap

        public CompressionHeaderEncodingMap​(InputStream inputStream)
        Constructor used to discover an encoding map from a serialized CRAM stream.
        Parameters:
        inputStream - the CRAM input stream to be consumed
    • Method Detail

      • putTagBlockCompression

        public void putTagBlockCompression​(int tagId,
                                           ExternalCompressor compressor)
        Add an external compressor for a tag block
        Parameters:
        tagId - the tag as a content ID
        compressor - compressor to be used for this tag block
      • getEncodingDescriptorForDataSeries

        public EncodingDescriptor getEncodingDescriptorForDataSeries​(DataSeries dataSeries)
        Get the encoding params that should be used for a given DataSeries.
        Parameters:
        dataSeries -
        Returns:
        EncodingDescriptor for the DataSeries
      • getExternalIDs

        public List<Integer> getExternalIDs()
        Get a list of all external IDs for this encoding map
        Returns:
        list of all external IDs for this encoding map
      • createCompressedBlockForStream

        public Block createCompressedBlockForStream​(Integer contentId,
                                                    ByteArrayOutputStream outputStream)
        Given a content ID, return a Block for that ID by obtaining the contents of the stream, compressing it using the compressor for that contentID, and converting the result to a Block.
        Parameters:
        contentId - contentID to use
        outputStream - stream to compress
        Returns:
        Block containing the compressed contends of the stream
      • write

        public void write​(OutputStream outputStream)
                   throws IOException
        Write the encoding map out to a CRAM Stream
        Parameters:
        outputStream - stream to write
        Throws:
        IOException
      • getBestExternalCompressor

        public ExternalCompressor getBestExternalCompressor​(byte[] data,
                                                            CRAMEncodingStrategy encodingStrategy)
        Return the best external compressor to use for the provided byte array (compressor that results in the smallest compressed size). Note that this does not necessarily mean this is the best compression to use for the source data series, as it does not consider the size of the alphabet (2 byte int, 4 byte int) since its only choosing from EXTERNAL compressors.
        Parameters:
        data - byte array to compress
        encodingStrategy - encoding strategy parameters to use
        Returns:
        the best ExternalCompressor to use for this data
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object