org.apache.hadoop.fs.swift.package.html Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of hadoop-openstack Show documentation
Show all versions of hadoop-openstack Show documentation
This module contains code to support integration with OpenStack.
Currently this consists of a filesystem client to read data from
and write data to an OpenStack Swift object store.
The newest version!
Swift Filesystem Client for Apache Hadoop
Swift Filesystem Client for Apache Hadoop
Introduction
This package provides support in Apache Hadoop for the OpenStack Swift
Key-Value store, allowing client applications -including MR Jobs- to
read and write data in Swift.
Design Goals
- Give clients access to SwiftFS files, similar to S3n:
- maybe: support a Swift Block store -- at least until Swift's
support for >5GB files has stabilized.
- Support for data-locality if the Swift FS provides file location information
- Support access to multiple Swift filesystems in the same client/task.
- Authenticate using the Keystone APIs.
- Avoid dependency on unmaintained libraries.
Supporting multiple Swift Filesystems
The goal of supporting multiple swift filesystems simultaneously changes how
clusters are named and authenticated. In Hadoop's S3 and S3N filesystems, the "bucket" into
which objects are stored is directly named in the URL, such as
s3n://bucket/object1
. The Hadoop configuration contains a
single set of login credentials for S3 (username and key), which are used to
authenticate the HTTP operations.
For swift, we need to know not only which "container" name, but which credentials
to use to authenticate with it -and which URL to use for authentication.
This has led to a different design pattern from S3, as instead of simple bucket names,
the hostname of an S3 container is two-level, the name of the service provider
being the second path: swift://bucket.service/
The service
portion of this domain name is used as a reference into
the client settings -and so identify the service provider of that container.
Testing
The client code can be tested against public or private Swift instances; the
public services are (at the time of writing -January 2013-), Rackspace and
HP Cloud. Testing against both instances is how interoperability
can be verified.