Servlets


Table of Contents

1. Introduction
2. Java Servlet concept
3. Webapp Directory structure
4. Recommended Source Directory Structure
5. Using ANT
6. Handling Requests
7. Sample Server
8. Java Server Pages
9. Handling Request Parameters
9.1. Problems
9.2. Get requests
9.3. Accessing files from a servlet
10. Servlet Life Cycle
11. Servlet Exercises
12. Servlet Topics

Java Servlets (and the associated Java Server Pages) are a server side web application service technology. They require no special client side support beyond access to a web browser although they can take advantage of a number of client side technologies. Java Servlet technology is becoming the method of database access most favoured by major database management system vendors: Oracle have announced that they are dropping their command line and client-server (Oracle forms) interfaces in favour of servlet based web interfaces. Servlets are fast, relatively easy to develop once the initial hurdles are overcome, and scalable to the largest of web application needs.

Servlets are Java objects that implement the javax.servlet.Servlet interface. They are loaded by a servlet server such as Tomcat, Jetty, Resin, or BEA's WebLogic (to name but a few) which maps them to some location on the web server's web space. Any web access to that address is picked up by the servlet server and directed to the corresponding servlet, the servlet analyses the request and responds, usually with a html page containing the answer to the request, although more sophisticated response mechanisms are possible.

While most servlet servers can act as full http demons on their own, serving both servlets and standard web pages, they rarely have the range of web servicing facilities and tuned efficiency that a full web server like Apache, Netscape Server or Microsoft's Internet Information Server has. On the other hand, these web servers cannot provide the full servlet serving facilities that the servlet servers are dedicated to. Hence the usual configuration is to run a web server and a servlet server and configure the web server to forward servlet requests to the servlet server. The servlet server does not then act as a httpd demon.

Typically, rather than directly implement the javax.servlet.Servlet interface, servlets will extend javax.servlet.http.HttpServlet, which itself implements the Servlet interface. Much of the "boilerplate" work of extracting information from a request and putting together a response is already implemented for you in HTTPServlet.

A Web Application directory structure is a single directory that contains all the java class files, libraries, html pages, JSP pages, data files and configuration information in the correct structure and formats. If your web application is maintained in this structure, any standards-conforming servlet server can load and run the servlets. Furthermore, a web application in this structure can be packaged up into a ".war" file which is then a single file which is all that is needed to install the application on a servlet server. You can drop such a war file into the appropriate servers webapps directory and the server will unpack it, load the servlets and start serving them without further interaction necessary. This is a very convenient way to distribute and install binary web applications. This convenience is at a slight cost in that you do have to be able to compile and build this structure. However, tool support, particularly ANT, can make this easy.

The directory structure required is simply one which matches the web address structure of the application but with one extra directory at the top level called WEB-INF. This WEB-INF is itself protected from web access but contains the servlet classes, servlet configuration information and libraries that must be loaded and available for the servlets to work and any other data files or information required for the execution of the servlets. The structure is as follows:

In order to maintain the sources and corresponding data files, I recommend the following source directory structure:

Given the directory structure above, to add an extra servlet to the application, you would typically need to:

It should not really be necessary to stop and restart the servlet server but complexities with caching and dynamic loading and unloading of classes means that you can get unexpected behaviours if you don't do so.

You may be already familiar with the program make. Just like make, ant is a program that reads a set of rules on how to do build some software system from its sources, figures out the least amount of work it has to do in order to achieve that (basing its decisions on file modification dates) and builds the system requested. Ant's advantage over make is that it is fully platform independent: so it can work as well on Windows operating system as on Unix/Linux. It is written entirely in Java and its rule file is an XML file, conventionally called build.xml.

It is typically run with a single argument: the name of the target to build. The build.xml file in the sample servlet web application directory has the following targets:

A servlet receives a request from a web browser over the http protocol and must construct some kind of response. The incoming request is usually either a Get, in which any parameters of the request are encoded as part of the URL used to access the servlet, or a Post, in which the parameters are buried in the message protocol and not visible or obvious to the end user. The sevlet has to extract the parameters, whichever way they were encoded, and deal with them, and generate a response by writing out and sending back the HTML text that should be display back on the browser.

In fact the HttpServlet class that we (usually) derive our servlet from does most of the hard work for us: It encapsulates the request and response mechanisms into objects. It provides get methods, to conveniently extract parameters from the request object, and a getWriter method for the response object that returns a PrintWriter object which we can write the HTML response to.

The servlet application is viewable when you have the sevlet server (in this case Resin), running, have set up the RHCShop ODBC connection to Shop.mdb, and have built and installed the application at Basic_Server. Note that we are using the loopback web address to view the web application: you must have Resin running on the physical machine you run the browser on for this to work. Obviously, you cannot run two different instances of Resin on the same machine as they would clash on port numbers. You can, of course, run the servlet server on any machine and change the web address to point at that machine. However, as we'll see, when developing servlets, you need the ability to start and stop the server easily so I strongly recommend developing with your own personal server instance.

Source code for the sample servlet, stored in the source directory structure described above, is here.

Looking first at the DateTime servlet, we see the basic structure of a servlet: To get a working servlet the only code we really need to provide is the derivation from HttpServlet and a doGet method (note that the API documention for servlets is in the J2EE installation and not the J2sdk). Try running it and refreshing the browser page multiple times.

public final class DateTimeServlet extends HttpServlet
{
	public void doGet(HttpServletRequest request, HttpServletResponse response)
		throws IOException, ServletException
	{
		PrintWriter writer = response.getWriter();
		response.setContentType("text/html");

	writer.println();
	writer.println("<head>");
	writer.println();
	writer.println("</head>");
	writer.println();
	writer.println("<table border=\"0\">");
//  ... many more writer.println statements

	writer.println("<p>");
	long millisecs = System.currentTimeMillis() ;
	Timestamp ts = new java.sql.Timestamp(millisecs) ;
	writer.println("The date and time is now: " + ts);
	writer.println("</p>");
	writer.println("</body>");
	writer.println("</html>");
    }
}

Servlets often have to generate a great deal of HTML output to include in the response. This means that the servlet code often contains a little bit of program logic and a large number of output statements that, having to produce HTML, have to encode the HTML with lots of escaping characters. This makes it unpleasant to do the page design work necessary and means that the artistic designers have to either work on Java code themselves or have to request change to the code by the Java programmers.

Java Server Pages (or JSP) is an attempt to overcome this problem. Rather than embedding HTML output statements in Java code, JSPs embed Java statements in HTML code. The first time a servlet server is asked to load a JSP page, it automatically "inverts" the JSP page, translating it into a standard servlet with embedded HTML output statements and all the necessary escaping of characters required. It then compiles the resulting servlet and serves it as if it were a normal servlet.

This means that now the artistic designers can work on the presentation of the web pages directly, using standard web page editting tools, while the Java programmers only need to embed their logic in the JSP.

While this has many advantages, it has disadvantages as well: cluttering up the html page with Java tags can mean that the artistic designers can make changes that damage the program logic without realising it. A whole host of technologies have been developed to assist here (e.g. Tag libraries, Bean libraries, Web Macros, Templates etc.) by minimising the amount of Java code necessary in a JSP page or putting a layer of protection in so that it becomes difficult to break the embedded Java code. However no single technology in this area has yet become the accepted best practice approach. In this course we will concentrate on Servlets as, even if you wish to use JSPs, you still need a basic mastery with Servlets before moving on to JSPs.

The sample web application provides an example of a JSP page that prints out the Customers from the Shop database.

The two servlets above have not taken any input from the user other than the request for a response. The FindCustomers servlet handles a text input parameter from a form in the query HTML page, and uses it in a "LIKE" condition of an SQL query to find the customers whose last names begin with the string that was input. Getting a parameter is easy given the support provided by the HttpServlet and HttpServletRequest classes:

String prefix = request.getParameter("prefix") ;
writer.println("<p>Prefix specified was: " + prefix + "</p>") ;
[Warning]Warning

Actually, here I have oversimplified a significant problem (and ignored it completely previously as well): The user could have entered any string at all to the browser. If we simply echo it, as we do above, then it could break the page display if it contained characters like '<', or even full HTML tags. The problem is actually worse: The table cells being output are essentially being dumped into the HTML response page with lines like:

while (rs.next())
{
    writer.println("<tr>") ;
    for (int i = 1 ; i <= colCount ; i++)
    {
		writer.println("<td>" + rs.getObject(i) + "</td>") ;
    }
    writer.println("</tr>") ;
}

What if one of the fields in the database had been entered with a '<' or with something that happened to be a html tag?

For this reason, all output to a web page that comes from a source that is not explicitly coded by the servlet programmer should first pass through a filter that sanitises the text for HTML display. There are a number of such class libraries available free or it is easy to write your own.

[Warning]Warning

There is a further related problem: in the section where I put the string prefix into the SQL query, I made no check or transformation on the string:

rs = stmt.executeQuery("SELECT * FROM Customers WHERE CustomerLastName LIKE '" + prefix + "%'") ;

Here again problems could occur innocently (e.g. the user enters the string "D'Arc") or less innocently (can you come up with a scenario where some string that the user enters could modify the query in such a way that the resulting query actually causes the database to be modified?). The solution is to program much more careful control over what happens to strings before they are appended to queries for execution.

All the servlets shown do their work entirely within a doGet or doPost method. They create a database connection, do any queries, generate the response and tear down the database connection for every page access to the servlet. Since creating a database connection is very expensive, this slows down the execution reduces the page serving rate very considerably. We would like to be able to avoid all this unnecessary setup work and to only set up the connection once and reuse it for every page access. In order to understand how to do this, we have to understand the servlet life cycle.

When the servlet server starts, it waits for page accesses that request servlets. When one comes in, it loads the servlet class into memory and initialise it if it was not already loaded and creates an object (an instance) of the servlet class. This servlet instance then stays alive and in memory for as long as the servlet server likes: typically the server will kill it off if it has remained idle for a certain amount of time. This servlet instance will typically handle all requests to the servlets web address. [Actually, you get one servlet instance created for each different registered servlet name in the web.xml files, even if the same servlet class file is registered to the different servlet names]. As far as the servlet instance is concerned, each page request to it is from a different thread.This immediately means that synchronisation issues need to be taken into account.

Consider what kind of variable you could use for holding the database connection object. There are three cases:

  1. Local variable of doGet/doPost:

  2. Instance variable of servlet class:

  3. Class (i.e. static) variable of servlet class:

There are many topics on servlets that we could discuss but for which we did not have sufficient time to cover them