All Downloads are FREE. Search and download functionalities are using the official Maven repository.

jars.nosqlunit-documentation.1.0.0-rc.2.source-code.hbase.xml Maven / Gradle / Ivy

There is a newer version: 1.0.0
Show newest version
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="hbase" xmlns="http://docbook.org/ns/docbook"
	xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
	xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML"
	xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">

	<title>HBase Engine</title>

	<section>
		<title>HBase</title>

		<para>
			<application>Apache HBase </application>
			is an open-source, distributed, versioned, column-oriented store.
		</para>

		<para>
			<emphasis role="bold">NoSQLUnit</emphasis>
			supports
			<emphasis>HBase</emphasis>
			by using next classes:
		</para>
		<para>
			<table border="1">
				<caption>Lifecycle Management Rules</caption>

				<tr>
					<td>Embedded</td>

					<td>
						<classname>com.lordofthejars.nosqlunit.hbase.EmbeddedHBase
						</classname>
					</td>
				</tr>

				<tr>
					<td>Managed</td>

					<td>
						<classname>com.lordofthejars.nosqlunit.hbase.ManagedHBase
						</classname>
					</td>
				</tr>
			</table>
		</para>
		<para>
			<table border="1">
				<caption>Manager Rule</caption>

				<tr>
					<td>NoSQLUnit Management</td>

					<td>
						<classname>com.lordofthejars.nosqlunit.hbase.HBaseRule
						</classname>
					</td>
				</tr>
			</table>
		</para>

		<section>
			<title>Maven Setup</title>

			<para>
				To use
				<emphasis role="bold">NoSQLUnit</emphasis>
				with
				<application>HBase</application>
				you only need to add next
				dependency:
			</para>

			<example xml:id="conf.nosqlunit_hbase_dep">
				<title>NoSqlUnit Maven Repository</title>

				<programlisting language="xml"><![CDATA[<dependency>
	<groupId>com.lordofthejars</groupId>
	<artifactId>nosqlunit-hbase</artifactId>
	<version>${version.nosqlunit}</version>
</dependency>]]></programlisting>
			</example>
		</section>

		<section>
			<title>Dataset Format</title>

			<para>
				Default dataset file format in
				<emphasis>HBase</emphasis>
				module is json. Dataset in HBase is the same used by
				<application>
					<link xlink:href="https://github.com/jsevellec/cassandra-unit/">Cassandra-Unit</link>
				</application>
				but not all fields are supported. Only fields available in TSV HBase
				application can be set into dataset.
			</para>

			<para>
				So as summary datasets must have next
				<link linkend="ex.hbase_dataset">
					format
				</link>
				:
			</para>

			<example xml:id="ex.hbase_dataset">
				<title>Example of HBase Dataset</title>

				<programlisting language="json"><![CDATA[{
    "name" : "tablename",
    "columnFamilies" : [{
        "name" : "columnFamilyName",
        "rows" : [{
            "key" : "key1",
            "columns" : [{
                "name" : "columnName",
                "value" : "columnValue"
            },
            ...
            ]
        },
        ...
        ]
    },
    ...
    ]
}]]></programlisting>
			</example>
		</section>


		<section>
			<title>Getting Started</title>

			<section>
				<title>Lifecycle Management Strategy</title>

				<para>
					First step is defining which lifecycle management strategy is
					required for your tests. Depending on kind of test you are
					implementing (unit test, integration test, deployment test, ...)
					you will require an embedded approach, managed approach or remote
					approach.
				</para>

				<section>
					<title>Embedded Lifecycle</title>
					<para>
						To configure
						<emphasis role="bold">embedded</emphasis>
						approach you should only instantiate next
						<link linkend="program.hbase_embedded_conf">rule</link>
						:
					</para>

					<example xml:id="program.hbase_embedded_conf">
						<title>Embedded HBase</title>

						<programlisting language="java"><![CDATA[@ClassRule
public static EmbeddedHBase embeddedHBase = newEmbeddedHBaseRule().build();]]></programlisting>
					</example>

					<para>
						By default embedded
						<emphasis>Embedded</emphasis>
						rule uses
						<classname>HBaseTestingUtility</classname>
						default values:
					</para>

					<table>
						<caption>Default Embedded Values</caption>
						<tr>
							<td>
								Target path
							</td>
							<td>
								This is the directory where
								<emphasis>HBase</emphasis>
								stores data and is
								<constant>target/data</constant>
								.
							</td>
						</tr>
						<tr>
							<td>
								Host
							</td>
							<td>
								localhost
							</td>
						</tr>
						<tr>
							<td>
								Port
							</td>
							<td>
								By default port used is 60000.
							</td>
						</tr>
						<tr>
							<td>File Permissions</td>
							<td>
								Depending on your umask configuration,
								<classname>HBaseTestingUtility</classname>
								will create some directories that will not be accessible during
								runtime. By default this value is set to 775, but depending on
								your OS you may require a different value.
							</td>
						</tr>
					</table>
				</section>
				<section>
					<title>Managed Lifecycle</title>
					<para>
						To configure
						<emphasis role="bold">managed</emphasis>
						approach you should only instantiate next
						<link linkend="program.hbase_managed_conf">rule</link>
						:
					</para>

					<example xml:id="program.hbase_managed_conf">
						<title>Managed HBase</title>

						<programlisting language="java"><![CDATA[@ClassRule
public static ManagedHBase managedHBase = newManagedHBaseServerRule().build();]]></programlisting>
					</example>

					<para>
						By default managed
						<emphasis>HBase</emphasis>
						rule uses next
						default values but can be configured
						programmatically:
					</para>

					<table>
						<caption>Default Managed Values</caption>
						<tr>
							<td>
								Target path
							</td>
							<td>
								This is the directory where
								<emphasis>HBase</emphasis>
								server is started and is
								<constant>target/hbase-temp</constant>
								.
							</td>
						</tr>
						<tr>
							<td>
								CassandraPath
							</td>
							<td>
								<emphasis>HBase</emphasis>
								installation directory which by default is
								retrieved from
								<varname>HBASE_HOME</varname>
								system environment
								variable.
							</td>
						</tr>
						<tr>
							<td>
								Port
							</td>
							<td>
								By default port used is 60000. If port is changed in
								<emphasis>HBase</emphasis>
								configuration file, this port should be configured too here.
							</td>
						</tr>
					</table>
					<warning>
						To start
						<emphasis>HBASE</emphasis> 
						<varname>JAVA_HOME</varname>
						must be set. Normally this variable is already configured, so you
						would need to do nothing.
					</warning>
				</section>
				<section>
					<title>Remote Lifecycle</title>
					<para>
						Configuring
						<emphasis role="bold">remote</emphasis>
						approach
						does not require any special rule because you (or System
						like
						<application>Maven</application>
						) is the responsible of starting and
						stopping the server. This mode
						is used in deployment tests where you
						are testing your application
						on real environment.
					</para>
				</section>
			</section>

			<section>
				<title>Configuring HBase Connection</title>

				<para>
					Next step is configuring
					<emphasis role="bold">HBase</emphasis>
					rule in charge of maintaining
					<emphasis>HBase</emphasis>
					columns into known state by inserting and deleting defined
					datasets.
					You must register
					<classname>HBaseRule</classname>
					<emphasis>JUnit</emphasis>
					rule class, which
					requires a configuration parameter with
					some
					information.
				</para>

				<para>To make developer's life easier and code more readable, a
					fluent
					interface can be used to create these configuration objects.
					Three
					different kind of configuration builders exist.
				</para>
				<section>
					<title>Embedded Connection</title>
					<para>
						The first one is for configuring a connection to embedded
						<emphasis>HBase</emphasis>
						.
					</para>

					<example xml:id="program.embedded_connection_hbase_parameters">
						<title>HBase with embedded configuration</title>

						<programlisting language="java"><![CDATA[import static com.lordofthejars.nosqlunit.hbase.EmbeddedHBase.EmbeddedHBaseRuleBuilder.newEmbeddedHBaseRule;

@Rule
public HBaseRule hBaseRule = newHBaseRule().defaultEmbeddedHBase();]]></programlisting>
					</example>
					<para>
						Embedded HBase does not require any special parameter.
						Configuration object is copied from Embedded rule directly to
						HBaseRule.
					</para>
				</section>

				<section>
					<title>Managed Connection</title>
					<para>
						This is for configuring a connection to managed
						<emphasis>HBase</emphasis>
						.
					</para>

					<example xml:id="program.managed_connection_hbase_parameters">
						<title>HBase with managed configuration</title>

						<programlisting language="java"><![CDATA[import static com.lordofthejars.nosqlunit.hbase.ManagedHBaseConfigurationBuilder.newManagedHBaseConfiguration;

@Rule
public HBaseRule hbaseRule = new HBaseRule(newManagedHBaseConfiguration().build());]]></programlisting>
					</example>
					<para>
						By default configuration used is the one loaded by calling
						HBaseConfiguration.create() method.
						<link
							xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration.html#create()">HBaseConfiguration.create()</link>
						which uses hbase-site.xml and hbase-default.xml classpath files.
					</para>
					<para>
						But also a method
						<function>setProperty</function>
						method is provided to modify any parameter of generated
						configuration object.
					</para>
				</section>

				<section>
					<title>Remote Connection</title>
					<para>
						Configuring a connection to remote
						<emphasis>HBase</emphasis>
						uses same approach like ManagedHBase configuration object but
						using
						<classname>com.lordofthejars.nosqlunit.hbase.RemoteHBaseConfigurationBuilder
						</classname>
						class instead of
						com.lordofthejars.nosqlunit.hbase.ManagedHBaseConfigurationBuilder.
						.
					</para>

				</section>

				<warning>
					<para>
						Working with Apache HBase required a bit of knowledge about
						how it works. For example your /etc/hosts file cannot contain a
						reference to your host name with ip 127.0.1.1.
					</para>
					<para>
						Moreover
						<emphasis role="bold">NoSQLUnit</emphasis>
						uses
						<emphasis>HBase-0.94.1</emphasis>
						and this version should be also installed in your computer to work
						with managed or remote approach. If you install another version, you should
						exclude these artifacts from
						<emphasis role="bold">NoSQLUnit</emphasis>
						dependencies, and add the new ones manually to your pom file.
					</para>
				</warning>


			</section>

			<section>
				<title>Verifying Data</title>
				<para>
					<classname>@ShouldMatchDataSet</classname>
					is also supported for
					<emphasis>HBase</emphasis>
					data but we should keep in mind some considerations.
				</para>
				<para>
					If you plan to verify data with
					<classname>@ShouldMatchDataSet</classname>
					in Managed and Remote approach, you should enable Aggregate
					coprocessor by editing hbase-site-xml file and adding next lines:
				</para>
				<example xml:id="program.coprocessor_hbase_parameters">
						<title>HBase with coprocessor</title>

						<programlisting language="xml"><![CDATA[<property>
    <name>hbase.coprocessor.user.region.classes</name>
    <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>]]></programlisting>
					</example>


			</section>

			<section>
				<title>Full Example</title>

				<para>
					To show how to use
					<emphasis role="bold">NoSQLUnit</emphasis>
					with
					<emphasis>HBase</emphasis>
					,
					we are going to create a
					very simple application.
				</para>
				<para>
					<link linkend="program.person_hbase_manager">PersonManager</link>
					is the business class responsible of getting and updating person's
					car.
				</para>
				<example xml:id="program.person_hbase_manager">
					<title>PersonCar cassandra with manager.</title>

					<programlisting language="java"><![CDATA[public class PersonManager {

	private Configuration configuration;
	
	public PersonManager(Configuration configuration) {
		this.configuration = configuration;		
	}
	
	public String getCarByPersonName(String personName) throws IOException {
		HTable table = new HTable(configuration, "person");
		Get get = new Get("john".getBytes());
		Result result = table.get(get);
		
		return new String(result.getValue(toByteArray().convert("personFamilyName"), toByteArray().convert("car")));
	}
	
	private Converter<String, byte[]> toByteArray() {
		return new Converter<String, byte[]>() {

			@Override
			public byte[] convert(String element) {
				return element.getBytes();
			}
		};
	}
	
}]]></programlisting>
				</example>

				<para>
					And now one unit test is written:
				</para>
				<para>
					For
					<link linkend="program.person_hbase_unit">unit</link>
					test we are going to use embedded approach:
				</para>
				<example xml:id="program.person_hbase_unit">
					<title>HBase with embedded configuration</title>

					<programlisting language="java"><![CDATA[public class WhenPersonWantsToKnowItsCar {

	@ClassRule
	public static EmbeddedHBase embeddedHBase = newEmbeddedHBaseRule().build();
	
	@Rule
	public HBaseRule hBaseRule = newHBaseRule().defaultEmbeddedHBase(this);
	
	@Inject
	private Configuration configuration;
	
	
	@Test
	@UsingDataSet(locations="persons.json", loadStrategy=LoadStrategyEnum.CLEAN_INSERT)
	public void car_should_be_returned() throws IOException {

		PersonManager personManager = new PersonManager(configuration);
		String car = personManager.getCarByPersonName("john");
		
		assertThat(car, is("toyota"));		
	}
	
}]]></programlisting>
				</example>

				<para>And dataset used is:
				</para>
				<example xml:id="program.expected_hbase_file">
					<title>persons.json HBase file</title>

					<programlisting language="json"><![CDATA[{
    "name" : "person",
    "columnFamilies" : [{
        "name" : "personFamilyName",
        "rows" : [{
            "key" : "john",
            "columns" : [{
                "name" : "age",
                "value" : "22"
            },
            {
                "name" : "car",
                "value" : "toyota"
            }]
        },
        {
            "key" : "mary",
            "columns" : [{
                "name" : "age",
                "value" : "33"
            },
            {
                "name" : "car",
                "value" : "ford"
            }]
        }]
    }]
}]]></programlisting>
				</example>

			</section>


		</section>

	</section>


</chapter>




© 2015 - 2025 Weber Informatics LLC | Privacy Policy