All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.hadoop.fs.swift.package.html Maven / Gradle / Ivy

Go to download

This module contains code to support integration with OpenStack. Currently this consists of a filesystem client to read data from and write data to an OpenStack Swift object store.

The newest version!





    Swift Filesystem Client for Apache Hadoop



Swift Filesystem Client for Apache Hadoop

Introduction

This package provides support in Apache Hadoop for the OpenStack Swift Key-Value store, allowing client applications -including MR Jobs- to read and write data in Swift.
Design Goals
  1. Give clients access to SwiftFS files, similar to S3n:
  2. maybe: support a Swift Block store -- at least until Swift's support for >5GB files has stabilized.
  3. Support for data-locality if the Swift FS provides file location information
  4. Support access to multiple Swift filesystems in the same client/task.
  5. Authenticate using the Keystone APIs.
  6. Avoid dependency on unmaintained libraries.

Supporting multiple Swift Filesystems

The goal of supporting multiple swift filesystems simultaneously changes how clusters are named and authenticated. In Hadoop's S3 and S3N filesystems, the "bucket" into which objects are stored is directly named in the URL, such as s3n://bucket/object1. The Hadoop configuration contains a single set of login credentials for S3 (username and key), which are used to authenticate the HTTP operations. For swift, we need to know not only which "container" name, but which credentials to use to authenticate with it -and which URL to use for authentication. This has led to a different design pattern from S3, as instead of simple bucket names, the hostname of an S3 container is two-level, the name of the service provider being the second path: swift://bucket.service/ The service portion of this domain name is used as a reference into the client settings -and so identify the service provider of that container.

Testing

The client code can be tested against public or private Swift instances; the public services are (at the time of writing -January 2013-), Rackspace and HP Cloud. Testing against both instances is how interoperability can be verified.




© 2015 - 2024 Weber Informatics LLC | Privacy Policy