May 16, 2013

How to Handle Character Encoding in JSP and Servlets

When writing simple web application you might not want to bother to use some web framework and simply use simple JSP and Servlet. This has been the case for me recently, but there is of course pitfalls with that as everything else in life. And one of those is to handle character encoding.

In you JSP be sure you use the below encoding settings:

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>
<%@ taglib prefix="fn" uri="http://java.sun.com/jsp/jstl/functions" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Insert title here</title>
</head>
<body>

    <h1>Empty Page</h1>

</body>
</html>

These encoding settings are important if you are planning to pass get parameters in the URL and those parameters might contain character not covered in ISO-8859-1 character table. You should here be aware of how the HTTP work, that it is stateless by design, which means that the server has no way of knowing how to interpret the url-encoded GET parameters, so it assumes ISO-8859-1.

The next gotcha is when I JSP call a Servlet. Here again the server has no way of knowing how to interpret the url-encoded GET parameters, therefore you must explicitly tell the server how to url encode the passed parameters. That is done via the methods.

req.setCharacterEncoding("UTF-8");
resp.setCharacterEncoding("UTF-8");

If you are planning to send direct HTML response from the Servlet, do not forget to set the response content type.

resp.setContentType("text/html; charset=UTF-8");

No comments: