Archive for the ‘software’ Category
Cool Integration between GMail and Google Calendar
Well, I am not sure whether this has always existed but I just noticed it a couple of days back.
I was exchanging a couple of e-mails with one of my friends (via GMail / Google Mail) about meeting someplace for lunch and I noticed these links on the top-right hand corner to “add a meeting” to Google Calendar (see image below).
I clicked on one of those links, and it opened up Google Calendar with an appointment (to be created) that was exactly what I wanted – it was just unbelievable that Google had automatically parsed all this information from the e-mail exchanges (see image below).
I am a big fan of good integration and usability – and the way this integration worked so flawlessly was very refreshing ! So, what is next ? Google automatically figures out the location (I had just mentioned Karlsplatz with no mention of the city) from the IPs (of the sender and the receiver) and recommends a few restaurants !!
Is that too cool or too much invasion of privacy ? Or is it a bit of both 🙂
Web/App-server scalability with memcached (Part 1)
This is a two part series in which I am going to explain how to use caching to improve the scalability of your web applications. Most of this can be found on the Internet as different articles or blogs and I have tried to condense all the information from my experiences into these posts. In this post I will try to describe the problem and provide a brief introduction/tutorial on the caching framework (or is it a tool) of my choice – memcached.
A typical web-based application – unless it is really really trivial – usually has a significant number of users using the application and hence at some point of time or other, one has to address the issue of application scalability (i.e. handling large load – usually large number of users simultaneously using the system). Let us walk through a couple of examples to provide a better description of what I am trying to say here.
Let us consider an e-commerce site like Target and when each user visits the main web page (www.target.com), the web-server needs to figure out what are the specials being advertised (as big flash ads, e.g. HDTV, Indoors, etc.), what are the “New Products” at Target, and so on. The web-server looks up this information in a database and serves up these ads (with text, images, etc.). This content is pretty much static for every user who visits the main page of Target, and considering that target.com probably gets more than a million visitors per day, that is a lot of load on the database for the same kind (in fact, identical) of information. This usually implies slower response times, which in turn has disastrous consequences (customers who are ready to pay for something usually have very little patience with slow websites).
Now let us consider an online brokerage site where an user logs in and is presented with a summary of his/her net value (cash + stock), and then he/she can click on various tabs to view cash deposits/withdrawals, recent stock transactions, place trades, and so on. Once again all this information is stored in the database, and this is different from the Target example, as the information will be different for every user that is logged in. And once again when there are a lot of users logged in, the database will have a lot of load (i.e. lot of queries executed against it) when these users go about clicking on different tabs to look at their total assets or trade history.
The most common solution to both of the above problems is caching, and that is what I am going to talk about in these series of posts – specifically about implementing a distributed caching solution using memcached. The memcached project was started at Danga but is currently being actively hosted on Google Code. I haven’t seen any binary downloads offered (yet) on the downloads page, but one can play around with a Win32 binary (version 1.2.6) from here. Just go to the middle of the page where it says memcached-1.2.6 and click on the memcached-1.2.6-win32-bin.zip link. Just unzip the download and place the memcached.exe at an appropriate location on your computer (see image below).
.Now you can start memcached by bringing up a command prompt (cmd window) and typing : “memcached -m 1024 -p 11211”. This starts memcached on your local machine on port 11211 and assigns it 1024 MB (1 GB) of memory (this is the maximum amount of memory it uses for storing objects in its cache). You can start storing objects in memcached by using various client APIs (PHP, Perl, Java, etc.) – I prefer coding in Java, so I decided to use the Java client API (Version 2.0.1.) from here. This how-to is really useful, and since I don’t have access to multiple machines (on which I can run memcached), I basically start three different instances of memcached on the same machine but on different ports (see image below). Then I use a java class (see code snippet below – slightly modified from the how-to) to add a couple of objects (key:”foo”, value:”This is test String foo”, and key:”bar”, value:”This is test String bar”) and retrieve them from the cache.
Sample client code that adds and retrieves objects from memcached cluster.
package org.karticks.memcache;
import com.danga.MemCached.MemCachedClient;
import com.danga.MemCached.SockIOPool;
// Modified version of the original example at
// http://www.whalin.com/memcached/HOWTO.txt
public class ClientExample
{
// create a static client as most installs only need a single instance
protected static MemCachedClient mcc = new MemCachedClient();
// set up connection pool once at class load
static
{
// server list and weights
String[] servers = { "localhost:11211", "localhost:11212", "localhost:11213" };
Integer[] weights = { 3, 3, 2 };
// grab an instance of our connection pool
SockIOPool pool = SockIOPool.getInstance();
// set the servers and the weights
pool.setServers(servers);
pool.setWeights(weights);
// set some basic pool settings
// 5 initial, 5 min, and 250 max conns
// and set the max idle time for a conn
// to 6 hours
pool.setInitConn(5);
pool.setMinConn(5);
pool.setMaxConn(250);
pool.setMaxIdle(1000 * 60 * 60 * 6);
// set the sleep for the maint thread
// it will wake up every x seconds and
// maintain the pool size
pool.setMaintSleep(30);
// set some TCP settings
// disable nagle
// set the read timeout to 3 secs
// and don't set a connect timeout
pool.setNagle(false);
pool.setSocketTO(3000);
pool.setSocketConnectTO(0);
// initialize the connection pool
pool.initialize();
// lets set some compression on for the client
// compress anything larger than 64k
mcc.setCompressEnable(true);
mcc.setCompressThreshold(64 * 1024);
}
// from here on down, you can call any of the client calls
public static void main(String[] args)
{
try
{
String input = args[0];
if (input.equalsIgnoreCase("one"))
{
mcc.set("foo", "This is test String foo");
String foo = (String) mcc.get("foo");
System.out.println("Value of foo : " + foo + ".");
Thread.sleep(10000);
String bar = (String) mcc.get("bar");
System.out.println("Value of bar : " + bar + ".");
}
else if (input.equalsIgnoreCase("two"))
{
mcc.set("bar", "This is test String bar");
String bar = (String) mcc.get("bar");
System.out.println("Value of bar : " + bar + ".");
Thread.sleep(10000);
String foo = (String) mcc.get("foo");
System.out.println("Value of foo : " + foo + ".");
}
else
{
System.out.println("Invalid input parameter (" + input + ").");
}
}
catch (Throwable t)
{
System.out.println("Caught an exception. Error message : " + t.getMessage() + ".");
t.printStackTrace();
}
}
}
The nice thing about memcached is that it has a telnet interface by which you can test the cache. One can telnet to a memcached instance (e.g. telnet localhost 11211) and execute various commands. In my case, I telnet-ed to each of my memcached instances and typed “get foo” and “get bar”. One of my memcached instances has cached these objects and printed out their values. It is interesting to note that only one of the instances had cached these values and not all the instances. So if the instance that is holding your cached object goes down, and you cannot get back a value, you basically treat it as a cache-miss, and get the object from the real-persistence-layer and store it back in memcached. Note : As a best practice (regardless of whether you are using consistent hashing or not), you will need to detect that one of your memcached servers went down, and you will need to have a hot-standby (with the same IP / Hostname).
Memcached is so simple and easy to setup and use that one can install it on commodity machines and keep adding more machines to the memcached cluster (as load goes up) – and you can pretty much have a very cost-effective and scalable solution to handle large amounts of load.
More on this in the next post. Until next time, stay tuned.
Using Map/Reduce for Sorting
In my previous post – Demystifying Map/Reduce – I had talked about what is Map/Reduce and a couple of its applications : word counting and PageRank. In this post I will try to go over a couple of sorting applications of Map/Reduce.
Let us imagine that we have a huge dataset (i.e. 100s of files, and each file itself is also quite big) of integers that we have to sort. One can use any number of sorting algorithms from literature including external sort (see previous post) to sort these files without assuming anything about the data. Now if the data itself were not widely distributed i.e. the integers lie between a certain range and this range is quite small compared to size of the data, then we can use Map/Reduce. Let us see why with the help of an example.
Let us assume that our data set (integers) is constrained between 100 to 200 and we have 5 files each containing 1000 random integers between 100 and 200 (so a total of 5000 integers between 100 and 200). We read each file into a Map and then in the Reduce phase, we produce a final Map which contains the count of all the integers. Now if we sort all the integers from the final Map and output it
into a list data structure in the form of <Integer, Count> then we have sorted all the data (see figure below). Aside : In Java, you don’t even have to come up with the data-structure that I am talking about, if you just use a TreeMap in the final Reduce phase, then all the keys (i.e. data) are already sorted as long as the key type (e.g. String, Integer, etc.) implements the Comparable interface (Hadoop has something similar called WritableComparable and I am using a TreeMap that takes Strings as keys in Reducer.java).
What is the complexity of the above sorting algorithm ? The Map phase is an order “n” algorithm (where n is the size of the data). The reduce phase is an order “m” algorithm where “m” is the number of unique integers in our data set. The sort phase after the Reduce phase will be an order “mlogm” operation (if use a sort algorithm like heap sort). Now if “m” is small compared to “n” (e.g. the size of the data set is 100,000 and the actual number of unique integers is only 100), then the complexity of the Reduce phase and final sort phase is actually quite small compared to the Map phase. So the total complexity of a Map/Reduce phase is of order “n” if the number of unique integers is quite small compared to the size of the total number of integers to be sorted. However, if the number of unique integers is comparable to the size of the data, then the complexity of the Reduce phase and the final sort phase is no longer small (compared to the complexity of the Map phase) and hence it is better to use a traditional sort algorithm instead of using Map/Reduce (to avoid the overhead of the additional order “n” Map phase).
The Map/Reduce project has an example that reads integers from five files (each containing 5000 integers) and sorts them. The total number of unique integers is 20 and the figure below shows the output of the result in “Integer (Count)” format. As one can see the output is sorted and the sum of counts adds up to 25,000 (the size of the data set – 5000 integers in 5 files). It is a small and trivial example but I hope you find it useful to understand the application of Map/Reduce to sorting.
Until next time. Cheers !!
The Map/Reduce design pattern demystified
Over the last 5-6 years there has been quite a lot of buzz about the Map/Reduce design pattern (yes, it is a design pattern), specially after the famous paper from Google. Now there is an open-source project – Hadoop – that helps you implement Map/Reduce on a cluster, Amazon’s EC2 offers Map/Reduce, Cloudera offers commercial support for Hadoop, and so on.
But what really is Map/Reduce, and that is what I will try to answer in this post.
Basic Concept
Let us start with an example of external sorting. Let us say you have a large file (say about 100 GB) of integers and you have been given the job of sorting those integers but you have only access to 1GB of memory. The general solution is that you will read 1GB of data into memory, do an in-place sort (using any sorting algorithm of your choice), write the sorted data out into a file, and then read the next 1GB of data from the master file, sort them, write the data out, and so on and so forth. After this process is finished, you will end up with 100 files (1GB each) and the data in each file is sorted. Now you will do a merge sort i.e. read an integer from each of the 100 files, find out which is the minimum integer, and write that integer to a new file, and we keep doing this until all the data from the 100 files are read. The new file will contain integers that are all sorted. The image below tries to provide an overview of the above algorithm (for more details take a look at Knuth’s fascinating discussion on external sorts in his classic : The Art of Computer Programming Volume 3, Sorting and Searching).
So what was the purpose of the above example ? The key pattern in the above example is that a huge task is broken down into smaller tasks, and each small task after it has finished produces an intermediate result, and these intermediate results are combined to produce the final result. That is the core of the Map/Reduce design pattern. The processing of small tasks to produce intermediate results is referred to as the Map phase, and the processing of the intermediate results to produce the final result is referred to as the Reduce phase. The key difference is that the Map/Reduce design pattern handles data as key-value pairs, and even the intermediate results are also produced as key-value pairs. Let us go over a couple of examples to understand what I mean by key-value pairs.
Examples
Word Counting :
You have been tasked to find out how many times each word occurs in a document. Very simple, you create a hashtable, and then you read each word from the document, and check if the word exists in the hashtable. If the word does not exist in the hashtable, you insert it (using it as the key) along with a counter that is initialized to 1 (this is the value). If the word exists in the hashtable, you get its counter, increment it by one, and insert the word back into the hashtable with the new counter value. After you have finished reading the document, you iterate over the keys in the hashtable, and for every key (i.e. each word), you lookup its value (the number of times it has occurred) and you have the word count for each word in the document. Now, let us say, you have to find out how many times each word occurs in a set of 100 books. The above process will be extremely slow and time consuming. An efficient solution would be to go through the above process for each book, and producing the word count for each book (the Map phase) and then processing the results (the Reduce phase) from all the 100 hashtables – one for each book – to produce the overall word count for all the 100 books (for details look at the video in the Implementation section below).
Page Rank :
Imagine that we have to parse about a million web pages (which is quite a small number considering the size of the World Wide Web) and we have to calculate how many times every URL occurs. For example, an article on Hibernate, might contain a link to Java 6, a link to the Hibernate home page, and a link to the MySQL website. For every such link, our job is to find out how many times does that link appear in these million web pages (I will call it the URL-Count, similar to word count). The higher the URL-Count of a specific URL, the more popular is that URL (this is the foundation of Google’s PageRank algorithm). Once again, we will divide this task into 100 Map phases, where every Map phase will look at 10,000 web pages. Every time a Map phase sees a URL in a web page, it will insert it into it’s hashtable and increment the counter associated with that URL (just like the above word count example). After all the Map phases are finished, each hashtable contains the URL-Count of all the URLs that occurred in each 10,000 webpage set. The Reduce phase iterates over each hashtable and combines the results (counters) of URLs that occur over multiple hashtables to produce the final URL-Count of each URL that occured in our million web pages.
Implementation
A simple implementation of the Map-Reduce design pattern consists of a Mapper interface that takes an InputStream as an input and returns a Map as an output. The actual implementation of this interface will know how to read and handle the contents of the stream – e.g. extracting words, removing punctuation, parsing URLs, etc. The Reducer class just aggregates the results from all the Map phases and produces the final result Map.
The Mapper interface –
public interface Mapper
{
/**
* Parses the contents of the stream and updates the contents of the <code>Map</code>
* with the relevant information. For example, an implementation to count
* words will extract words from the stream (will have to handle punctuation,
* line breaks, etc.), or an implementation to mine web-server log files
* will have to parse URL patterns, etc. The resulting <code>Map</code> will contain
* the relevant information (words, URLs, etc.) and their counts.
*
* @param is A <code>InputStream</code> that contains the content that needs to be parsed
* @return A <code>Map</code> that contains relevant patterns (words, URLs, etc.) and their counts
*/
public Map<String, Integer> doMap(InputStream is);
}
The Reducer class –
public class Reducer
{
/**
* Executes the Reduce phase of the Map-Reduce pattern by iterating over
* all the input Maps to find common keys and aggregating their results.
* Stores and returns the final results in the output Map.
*
* @param inputMaps A list of results from Map phase
* @return A Map that contains the final result
*/
public Map
int mapIndex = 0;
// outer loop – iterate over all maps
for (Map
{
mapIndex++;
Iterator
while (it.hasNext())
{
String key = it.next();
// Get the value from the current map
Integer value = map.get(key);
// Now iterate over the rest of maps. The mapIndex variable starts
// at 1 and keeps increasing because once we are done with all the
// keys in the first map, we don’t need to inspect it any more, and
// the same holds for the second map, third map, and so on.
for (int j = mapIndex; j < inputMaps.size(); j++)
{
Integer v = inputMaps.get(j).get(key);
// if you find a value for the key, add it to the current value
// and then remove that key from that map.
if (v != null)
{
value += v;
inputMaps.get(j).remove(key);
}
}
// finished aggregating all the values for this key, now store it
// in the output map
outputMap.put(key, value);
}
}
return outputMap;
}
}
[/sourcecode]
The following video (best viewed in HD mode) shows a simple word counting application using Map-Reduce that walks through the steps of implementing the Mapper interface and passing the results of the Map phase to the Reduce phase and finally validating that the results are correct.
[wpvideo 9w4ElCW9]
The above code and all the code discussed in the video can be found in the MapReduce sub-project of the DalalStreet open source project in Google Code. Here are the links to the files in case you want to take a detailed look at the code.
Complexity
So if Map/Reduce is really that simple, then why does Google consider it’s implementation as its intellectual property (IP), or why is there an open-source project (Hadoop) around it, or even a company (Cloudera) that is trying to commercialize this pattern ?
The answer lies in the application of Map/Reduce to huge data sets (URL-Count of the entire World Wide Web, log analysis, etc.) over a cluster of machines. When one runs Map/Reduce over a cluster of machines, one has worry about getting notified when each Map phase or job finishes (either successfully or with an error), transferring the results – which can be huge – of the Map phase over to the Reduce phase (over the network), and other problems that are typically associated with a distributed application. Map/Reduce by itself is not complex, but the associated set of supporting services that enable Map/Reduce to be distributed, is what makes a Map/Reduce “framework” (such as Hadoop) complex (and valuable). Google takes it a step further by running redundant Map phases (to account for common error conditions like disk failures, network failures, etc.) and its IP lies in how it manages these common failures, results from redundant jobs, etc.
Conclusion
Map/Reduce has definitely opened up new possibilities for companies that want to analyze their huge data sets, and if you want to give it a test drive, you might want to checkout Amazon’s EC2 Map/Reduce harness (running Hadoop). You might want to try out the word count example by downloading a few books from Project Gutenberg.
Happy crunching !!
“Communications of the ACM” articles …
For the last one year or so, I have been an avid reader of the “Communications of the ACM” Magazine (http://cacm.acm.org/). I find it quite refreshing that in every issue there are atleast a few articles that are relevant to every day software development and in this post, I have made a short-list of articles that I found really interesting as well as useful.
Whither Sockets : A look at the Sockets API, its origins, how it has evolved, and its drawbacks (June 2009, Vol.52, No.6).
API Design Matters : An extremely well written article on how to design APIs (May 2009, Vol. 52, No.5).
Scalable Synchronous Queues: This article is not available in its entirety on the website (need to be a member). So I have linked it to a PDF from the author’s website (May 2009, Vol. 52, No.5).
ORM in Dynamic Languages : A fascinating article on how Hibernate is used in GORM (the persistence component of Grails). So many of these ideas can be quite easily transferred over to Java and make Hibernate usage a lot easier in Java (April 2009, Vol.52, No.4)
Concurrent Programming with Erlang : Once again, a member-only accessible article, but available via ACM Queue (March 2009, Vol.52, No.3).
Happy reading !!
Hibernate Bidirectional One-To-Many Mapping via Annotations
In one of my previous posts, I had talked about handling inheritance with Hibernate Annotations. We had talked about an AccountTransaction entity that had two sub-classes, MoneyTransaction and StockTransaction. In this post, I am going to talk about how we are going to link the AccountTransaction entity with the Customer entity.
As always, all the code mentioned here is available via the Google Code Project – DalalStreet.
Let us first start by asking the question – Why would one want to link the AccountTransaction entity with the Customer entity. Well, since we are building stock portfolio management software, it would be interesting to know the transactions (stock as well as money) for a specific customer. This naturally leads one to model this relationship as a one-to-many relationship i.e. a Customer has many (more than zero) AccountTransactions. Is this the only way this relationship can be modeled ? What if I wanted to find out the Customer information from an AccountTransaction ? Why would anyone want to do that ?
Consider the following use case : Let us say one day DalalStreet becomes quite a popular software package, and it is used by an Indian bank to handle the portfolios of its clients. Now, if a top-level manager in this bank wants to find out who were the top-10 clients who had the maximum amount (in terms of actually money traded) of transactions in the last 24 hours, how would you go about finding that information ? You would get all the AccountTransactions in the last 24 hours, and for each AccountTransaction you would find the Customer, and group all the AccountTransactions that belonged to a Customer, and then find out the top-10 Customers. The phrase that is highlighted in bold-text is possible only when you can access the Customer object from the AccountTransaction object. This can be modeled in Hibernate as a bi-directional one-to-many relationship.
So how do we go about doing this bi-directional one-to-many thing-a-majig ?
In the Customer class, you introduce a one-to-many relationship with the AccountTransaction class (see the code snippet below).
@OneToMany (cascade = {CascadeType.ALL}, fetch = FetchType.EAGER)
@JoinColumn (name = "customer_id")
@org.hibernate.annotations.Cascade(value = org.hibernate.annotations.CascadeType.DELETE_ORPHAN)
private Set accountTransactions;
...
public void setAccountTransactions(Set accountTransactions)
{
this.accountTransactions = accountTransactions;
}
public void addAccountTransaction(AccountTransaction transaction)
{
if (accountTransactions == null)
{
accountTransactions = new HashSet();
}
accountTransactions.add(transaction);
}
public Set getAccountTransactions()
{
return accountTransactions;
}
And in the AccountTransaction class, you model the bi-directional relationship using the following annotations.
@ManyToOne
@JoinColumn (name = "customer_id", updatable = false, insertable = false)
private Customer customer;
....
public Customer getCustomer()
{
return customer;
}
public void setCustomer(Customer customer)
{
this.customer = customer;
}
That is all, and you are done – atleast with the annotations. There a couple of things to keep in mind, when you are actually persisting these objects into the database. Let us take a quick look at some persistence code :
Customer customer = setupSingleCustomer();
// save an object
Session session = HibernateUtil.getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Long custID = (Long) session.save(customer);
tx.commit();
MoneyTransaction mt1 = new MoneyTransaction();
...
MoneyTransaction mt2 = new MoneyTransaction();
...
StockTransaction st1 = new StockTransaction();
...
StockTransaction st2 = new StockTransaction();
...
StockTransaction st3 = new StockTransaction();
...
// need to do this - otherwise customer id shows up as null
customer.addAccountTransaction(mt1);
customer.addAccountTransaction(mt2);
customer.addAccountTransaction(st1);
customer.addAccountTransaction(st2);
customer.addAccountTransaction(st3);
// save the account transactions - need to use the same session
Transaction newtx = session.beginTransaction();
Long id1 = (Long) session.save(mt1);
Long id2 = (Long) session.save(mt2);
Long id3 = (Long) session.save(st1);
Long id4 = (Long) session.save(st2);
Long id5 = (Long) session.save(st3);
newtx.commit();
session.close();
System.out.println("IDs : " + id1 + ", " + id2 + ", " + id3 + ", " + id4 + ", " + id5 + ".");
System.out.println("Customer id : " + custID);
There are two things to keep in mind when trying to persist the AccountTransaction objects :
- One should always add the AccountTransaction object to the Customer object (lines 21-26 in the above code snippet).
- One should always use the same session to persist the AccountTransaction object – the same session that was used to retrieve the Customer object from the database (the session object used in line 29 of the above code snippet is the same as the one created in line 04). Otherwise there will be no association in the database between the related entities. To understand the relationship between Hibernate objects and sessions, I strong encourage you to read pages 42-46 from James Elliott’s classic : “Hibernate – A Developer’s Notebook”.
Finally, here are the links to the files in case you want to take a detailed look at the code.
Handling Inheritance via Hibernate Annotations
Getting back to blogging after a long pause (I guess about a month or so) and in this post I am going to discuss how we can handle Java inheritance using Hibernate Annotations. In the process of doing this I discovered a flaw (or I guess the right word should be missing documentation) in the Hibernate documentation which is discussed below.
So I guess the first question is – why should Hibernate even bother trying to support inheritance ? Because, Hibernate is an Object Relational Mapping (ORM) tool and it is quite common to come up with a data model in which classes inherit from one another. For example, a Car and a Truck can be modelled as subclasses of a Vehicle class, a NetworkDevice and a Server can be modelled as subclasses of an ITAsset class, and in this post we discuss a model (for handling a portfolio of stocks) where a parent class – AccountTransaction – has two subclasses, MoneyTransaction and StockTransaction.
As always, all the code mentioned here is available via the Google Code Project – DalalStreet.
So this is how the object model looks like.
There are multiple ways in which such a relationship can be handled by Hibernate and I opted to use the “single table per class heirarchy” approach. This translates into the following annotations. The key annotations are the @Inheritance, @DiscriminatorColumn, and @DiscriminatorValue.
Source code for “AccountTransaction” entity :
@Entity
@Table (name="entity_transaction")
@Inheritance (strategy=InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn (name="transaction_type", discriminatorType=DiscriminatorType.STRING)
public abstract class AccountTransaction
{
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column (name = "transaction_id")
private Long id = null;
@Column (name = "transaction_type", insertable = false, updatable = false)
private String txType = null;
@Column (name = "transaction_date")
private Date txDate = null;
@Column (name = "transaction_desc")
private String txDescription = null;
@Column (name = "transaction_fee")
private Double txFee = null;
public AccountTransaction(String type)
{
txType = type;
txDate = new Date();
}
...
Source code for “StockTransaction” (DiscriminatorValue is “stock”) entity :
@Entity
@DiscriminatorValue("stock")
public class StockTransaction extends AccountTransaction
{
@Column (name = "is_sale")
private boolean isSale = false;
@Column (name = "stock_symbol")
private String stockSymbol = null;
@Column (name = "company_name")
private String companyName = null;
@Column (name = "num_shares")
private Integer numShares = 0;
@Column (name = "price_per_share")
private Double pricePerShare = 0.0;
public StockTransaction()
{
super("stock");
}
...
Source code for “MoneyTransaction” (DiscriminatorValue is “money”) entity :
@Entity
@DiscriminatorValue("money")
public class MoneyTransaction extends AccountTransaction
{
@Column (name = "is_deposit")
private boolean isDeposit = false;
@Column (name = "money_amount")
private Double moneyAmount = 0.0;
public MoneyTransaction()
{
super("money");
}
...
Pay special attention to the annotations that describe “transaction_type” – line 11 of the AccountTransaction.java – the “insertable” and “updatable” attributes need to be set to “false”, otherwise you will get a error like this (this is the part that is missing from Hibernate documentation that I mentioned at the beginning of this post) :
Caught an exception. Error message : null
java.lang.ExceptionInInitializerError
at org.ds.biz.user.HibernateUtil.(HibernateUtil.java:17)
at org.ds.biz.user.TestInheritance.main(TestInheritance.java:60)
Caused by: org.hibernate.MappingException: Repeated column in mapping for entity: org.ds.biz.user.StockTransaction column: transaction_type (should be mapped with insert=”false” update=”false”)
at org.hibernate.mapping.PersistentClass.checkColumnDuplication(PersistentClass.java:652)
at org.hibernate.mapping.PersistentClass.checkPropertyColumnDuplication(PersistentClass.java:674)
at org.hibernate.mapping.PersistentClass.checkColumnDuplication(PersistentClass.java:696)
at org.hibernate.mapping.PersistentClass.validate(PersistentClass.java:450)
at org.hibernate.mapping.SingleTableSubclass.validate(SingleTableSubclass.java:43)
at org.hibernate.cfg.Configuration.validate(Configuration.java:1108)
at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1293)
at org.ds.biz.user.HibernateUtil.(HibernateUtil.java:13)
… 1 more
The output schema from “ant schemaexport” looks something like this :
The values in the database looks like this :
Sample code using the Criteria API to retreive the Transaction objects (AccountTransaction, MoneyTransaction or StockTransaction) can be found in the TestInheritance class. The test class uses the different classes to retrieve the appropriate objects (see code snippet below) :
Criteria criteria1 = session.createCriteria(AccountTransaction.class);
List list1 = criteria1.list();
int size1 = list1.size();
// should print out size as 5
System.out.println("size of list (all transactions) : " + size1);
...
Criteria criteria2 = session.createCriteria(MoneyTransaction.class);
List list2 = criteria2.list();
int size2 = list2.size();
// should print out size as 2
System.out.println("size of list (money transactions) : " + size2);
...
Criteria criteria3 = session.createCriteria(StockTransaction.class);
List list3 = criteria3.list();
int size3 = list3.size();
// should print out size as 3
System.out.println("size of list (stock transactions) : " + size3);
Here are the links to the files in case you want to take a detailed look at the code.
Happy coding, until next time … Take care.
Hibernate One-to-One Mapping using Annotations
In an earlier post I had written about getting your development environment setup to start using Hibernate. I had talked about generating a schema, and now it makes sense to proceed to the next logical step, which is persisting data.
As usual, all the code discussed in this post is available at the Google Code Project – DalalStreet.
The objective of this post is to successfully persist a heirarchy of objects using Hibernate. The model consists of a Customer entity, this Customer entity contains an Address entity and a ContactInformation entity. The Address or ContactInformation entity cannot exist independently without a Customer – or in other words – Customer has a one-to-one relationship with Address and ContactInformation.
Hibernate documentation along with a couple of blogs (1, 2) provide insufficient information on how to model one-to-one relationships in Hibernate.
Unfortunately modeling one-to-one relationships in Hibernate is non-trivial and the correct way to model is illustrated in the following code snippets.
Source code for “Customer” entity :
@Entity
@Table(name = "entity_customer")
public class Customer
{
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column (name = "customer_id")
private Long id = null;
@Column(name = "first_name")
private String firstName = null;
@Column(name = "middle_name")
private String middleName = null;
@Column(name = "last_name")
private String lastName = null;
@Column(name = "salutation")
private String salutation = null;
@Column(name = "account_number")
private String accountNumber = null;
@OneToOne(cascade=CascadeType.ALL)
@JoinColumn (name = "customer_id")
private Address address = null;
@OneToOne(cascade=CascadeType.ALL)
@JoinColumn (name = "customer_id")
private ContactInformation contactInfo = null;
Source code for “Address” entity :
@Entity
@Table(name = "entity_address")
public class Address
{
@Column(name = "street_address1")
private String streetAddress1 = null;
@Column(name = "street_address2")
private String streetAddress2 = null;
@Column(name = "city")
private String city = null;
@Column(name = "state")
private String state = null;
@Column(name = "postal_code")
private String postalCode = null;
@Column(name = "country")
private String country = null;
@Id
@GeneratedValue(generator = "foreign")
@GenericGenerator(name = "foreign", strategy = "foreign", parameters = { @Parameter(value = "customer", name = "property") })
@Column(name = "customer_id")
// this is id of the customer - as an address is always associated with a
// customer (it cannot exist independent of a customer)
private Long customerID = null;
@OneToOne
@JoinColumn(name = "customer_id")
// reference to the customer object. hibernate requires two-way object
// references even though we are modeling a one-to-one relationship.
private Customer customer = null;
In plain-speak this translates into the following :
Address does not have its own “ID” attribute. It will use the “ID” of the
Customer and the Customer table and the Address table are joined via this
“ID” attribute (i.e. “customer_id”).
Test code :
private void testSingleObjectPersistence()
{
Customer customer = setupSingleCustomer();
// save an object
Session session = HibernateUtil.getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Long custID = (Long) session.save(customer);
tx.commit();
session.close();
System.out.println("Customer id : " + custID);
}
private Customer setupSingleCustomer()
{
Address address = new Address();
address.setCity("Austin");
address.setCountry("U.S.A");
address.setPostalCode("78701");
address.setState("Texas");
address.setStreetAddress1("301 Lavaca Street");
ContactInformation ci = new ContactInformation();
ci.setEmailAddress("info@gingermanpub.com");
ci.setWorkPhone("512-473-8801");
Customer customer = new Customer();
customer.setAccountNumber("90000200901");
customer.setFirstName("Gingerman");
customer.setLastName("Pub");
customer.setMiddleName("Beer");
customer.setSalutation("Sir");
customer.setAddress(address);
customer.setContactInfo(ci);
address.setCustomer(customer);
ci.setCustomer(customer);
return customer;
}
Once all the above changes have been made, Hibernate is actually able to persist the objects.
Here are the links to the files in case you want to take a detailed look at the code.
Hopefully you found this post helpful, and as always, please feel free to leave your feedback …
Your first cup of Hibernate …
Hibernate is pretty much the defacto tool/library/framework to implement Object Relational Mapping (ORM) in Java nowadays and it has definitely become a much larger project than when I first started using it in 2004. It has so many downloads (Core, Annotations, Shards, etc.) and so many jars and dependencies that it is quite difficult to decide what is required and what is not required (and what is important and what is optional).
Here I try to provide some clarity by starting with a basic Hibernate project and slowly working up (using more advanced features of Hibernate) and in the process discovering the different features of Hibernate (and their dependencies).
All the code mentioned here is available via the Google Code Project – DalalStreet.
So let us start from scratch – which means no downloads from hibernate.org, unless we absolutely need it – and proceed step by step :
- I downloaded Eclipse 3.4.1 and created a Java Project – DalalStreet.
- I decided to use Java annotations and Eclipse immediately gave me an error (see screenshot below)
- This can be resolved by adding ejb3-persistence.jar to the classpath of your Eclipse project. For this you need to download the hibernate annotations (I decided to use version 3.2.1 – pay special attention to the compatibility matrix) and the ejb3-persistence.jar is located in the lib folder.
- After finishing all the annotations, you will want to export the schema, but before you can do that you need to define a hibernate config file.
- To export the schema, I decided to use the Ant HibernateToolTask. Of course Ant didn’t know where to find this class. For this, you need to download the hibernate tools (I am using version 3.2.4), unzip it, and navigate to the plugins folder. Now navigate to the lib/tools folder inside the org.hibernate.eclipse_<version_number> folder and you will find hibernate-tools.jar (add this to the classpath of ANT).
- After you have added the hibernate-tools.jar to the classpath you will require the following jars (to be added to the classpath of ANT)
- hibernate3.jar from hibernate core (for obvious reasons, download hibernate core and add hibernate3.jar – found in the top level folder)
- commons-logging-1.0.4.jar (in the above hibernate core download, lib folder)
- hibernate-annotations.jar (in hibernate annotations download, root folder)
- dom4j-1.6.1.jar (in hibernate core download, lib folder)
- commons-collection-2.1.1.jar (in hibernate core download, lib folder)
- freemarker.jar (in the hibernate tools download, inside the lib/tools folder of the org.hibernate.eclipse_<version_number> folder)
- and finally the jdbc driver jar which is of course dependent on the database you are using (I am using the MySQL database and the following driver jar : mysql-connector-java-5.1.7-bin.jar)
Once you have all this in place, your ANT task should complete successfully and you should have a valid schema in your database.
A final screenshot with all the jars in your lib folder :
Hope you found this post helpful. The next post will discuss the next logical step – actually persisting some data into the database using Hibernate.
Leave a comment












