OR058: High-Performance Dynamic Pages with Templates, XML, and mod_perl
Day: Fri
Time: 3:45pm
Session chair: None assigned
Duration: 90 minutes
Style: Presentation
Categories: Performance, Perl, XML
Speaker: Sander van Zoest
--
Introduction:
------------
In every group project you would like to split all the tasks that are
associated with the project up between the members of your team and have
each person be responsible for the task that most fits their experience
and/or motivation of the task at hand. Although this is also desired
producing a website, it has always been hard to split up the tasks involved
in developing a web site, since the content, style, user interface and
character sets, delivery and navigational elements are all tightly integrated.
The objective of this session is to provide you with ideas and ways to
combine tools to seperate all these elements as much as possible so each
of these can be developed simultaneously with minimal dependancies between
the groups.
When working on a large website usually you have different groups that
handle different parts of the website. A rough idea of what the break up
could be is as follows:
o A Design Group
That does layout and graphics and navigational elements to
accomidate each browser.
o A Content group
That generate and provide the text and content.
o An Internationalization group
That does translation of all the navigational elements and
content into foreign languages.
o A team of programmers
That work on massaging the content for handling the dynamic changes
in content and potentially providing the means to the design crew
to provide dynamic view to the end user.
o And a Network and System Administration group
That does all the network and system administrative tasks, such
as load balancing, network architecture, switching and system
configuration.
The Network and System Administration group can be split up without too
much trouble when you have a working CM System and a documented process
on how to build and install each type of machine. The group of programmers
can then make specific installation scripts for each type of system. Things
such as Domain Name Service, Load Balancing and Monitor can then be managed
by the Administrative group without intervention of any other group. Since
our topic does not cover any of the above tasks, we will skip over this and
assume you have a good enough idea on how that would be split up.
How Did we get here
-------------------
Now a little bit of history that explains to how we got where we are today.
After the early days of the web, people wanted a consistant style throughout
the website to signify a particular look and feel. This was usually
accomplished with static page generators or Server Side Includes that would
include a standard header and footer file.
With the influx of e-commerce, bringing a catalog of merchandise onto
the web, provided a site for people to shop without ever leaving their
home. Most catalogs were maintained and stored in easy to manipulate
SQL compliant databases. The e-commerce wave provided the need for
dynamic user tracking of potential purchases and to provide an easy
method to browse the sites catalog that would lead to a virtual
check out line.
Most of these online catalogs were either dynamicly generated by custom
CGI scripts written specificly for each site using C, perl, java.
Development with these tools, usually meant that people would embed
stylistic elements within the source code, which made it hard for
the design team to make changes on dynamicly generated content without
bugging the engineering team.
Specificly designed development tools such as Cold Fusion, Microsoft
Active Server Pages (ASP) and PHP/FI made this a little easier, but
did not provide the flexability has the more programmer specific
languages did, which had access to a wealth of packages already
written to do just about anything you ever dreamed of doing. Also
the learning curve on these tools was usually a higher, because they
had their own specific syntax.
The specificly designed development tools usually were logical extensions
to the static page generators and the Server Side Include modules and
were built into the web server. The more people found out about the
web the more overloaded the servers became. The specificly designed
development tools usually out performed the custom CGI scripts, because
the CGI gateway fork() and exec()'d the custom scripts for each
request. This limitation brought about server modules such as
mod_fastcgi, mod_jserv and mod_perl which each in their own way, solved
the fork() and exec() performance hit and allowed developers to continue
development in their programming language of choice.
Things such as co-branding and affiliate marketing, became next few
hot topics on the net. These new topics required the sharing of either
the content or style over onto other sites that were not controlled
by the same group of individuals or even companies. Remote Virtual
Server Includes (SSI), JavaScript and nightly pulls of content and
brands were some of the ways this was accomplished.
A Quick Overview
----------------
There have been a lot of tools created to help meet lots of different
requirements set forth in the way we use the web. Now we will combine
these tools and come up with a robust web publishing infrastructure
that is focused on just-in-time page assembly, performance and
scalablility and allowed to re-use as many tools already in the
market today.
To create a web page visable by a browser, you need to combine content,
design and a particular natural language you want to publish in such
as English.
[PICTURE]
To create the HTML document, you take an English Template/StyleSheet
and apply the English XML content. This can be done on-the-fly by
the apache web server, using a building stylesheet parser module and
URI translation module to link up the appropriate stylesheet with
the requested content. This method allows the Design Team to develop
the stylesheet template and the content by the Content Group. The
Internationalization Group can then take each and provide translations
without needing to work closely with either group.
Perl can then be used by the Engineering Team to provide Dynamic views
of the XML content that will be picked up by the Template for display
to the End User.
Because of the seperation of the style from the content. There is no
reason the Template needs to generate HTML, it could generate HDML,
WML for mobile handheld devices or just about any other format. Any
component of the request can determine which template to use and
which branded design to display.
Note:
This concept was implemented by MP3.com, Inc. using Apache, C and mod_perl
with performance and just-in-time assembly over feature set in mind. For
an implementation of a similar concept using Java and Java Servlets with
a full XSL implementation see The Cocoon Project
http://xml.apache.org/cocoon/ and Stefano Mazzocchi's Nightschool Session
"Adding XML Capabilities with Cocoon".
The Details
-----------
Content
-------
As described above, XML provided us with a most standardized way to
manage and deliver content. Sometimes the XML simply contains pointers
to how to get to the content and other times if the content is embedded.
Some of the other sessions such as Pier Paolo Fumagalli's "XML Publishing
Fundamentals", Philip Grabowski's "XML: An Intensive Introduction" and
Ted Leung's "Everything You Always Wanted to Know About XML Parsing" should
give you a good background and concept of XML, so we won't go into much
detail here.
Generation of XML can be done in several ways. If you already have all your
content in a SQL Database, you can use perl's DBI and XML::Writer modules
to generate XML from your SQL databases that will be used.
Style
-----
At MP3.com, Inc. we developed our own Style Template Language that behaves
similarily to XSL called Yet Another Style Language (YASL).
There is no real reason we can not use XSL or a similar technology
here, but at development time there was no C-based XSL implementation
that suited our needs. We required just-in-time generation, performance
and scalability.
For more information about XSL visit http://www.w3.org/ or the sessions
"XML Publishing Fundamentals" by Pier Paolo Fumagalli and "Practical XSLT
Transformations for Fun and Profit" by Scott Boag.
We have been looking at the XML Apache Projects for future possible
integration so we can use Xerces as our XML parser and to use the more
full featured stylesheet language as provided by Xalan instead. I have not had
too much time lately to devote to Apache XML projects, but definately am
interested in helping the development effort and I am looking forward to
implement a similar solution using these open source tools so that everyone
can benefit from what we have learned.
A Quick Example
---------------
Here is an XML file:
Yasl Template ExampleApacheCon 2000http://www.ApacheCon.com/Orlando, FloridaOR058High-Performance Dynamic Pages with Templates, XML and mod_perlFridaySander van ZoestMP3.com, Inc.High Geek, Founders Groupsander@mp3.comWeb Technology Forumhttp://www.sdsc.edu/WTF/San Diego, CaliforniaSander van ZoestMP3.com, Inc.High Geek, Founders Groupsander@mp3.com
Combined with a Yasl Template written as
This will be displayed through a web browser as::
If you notice the __DEFAULT__ value used for the tag, this
tells YASL to use path information from the URI to find the appropriate
XML data file and with regex match configuration directive defined in
the directory it knows which template to use. This technique allows you
to have a single style template for an unlimited amount of XML content.
Mod_Perl
--------
How does mod_perl interact with the Style Template.
For more information about mod_perl visit http://perl.apache.org/ or the
sessions "Getting Started with Mod Perl (Part I & II)" and "Improving
Script Performance Under mod_perl" by Stas Bekman.
To generate XML content on the fly we use the perl scripting language
using the mod_perl apache module. To talk to the Yasl Stylesheet Language
and XML parser Expat-lite (Build into Apache since version 1.3.9), we
use an internal redirect to the Yasl template to have the generated XML
be displayed in the browser.
To provide Yasl with access to the dynamicly generated XML, we use a
modified version of XML::Writer that uses XS glue to talk to Expat-lite
and mod_yasl.
The constructor of this version of XML::Writer makes a call to mod_yasl
to allocate a pool of memory to be used by Expat to store the XML content
and returns an integer to perl that it can use to reference this new
pool of memory created.
With this new version of XML::Writer the developer can make the usual
calls to XML::Writer to generate XML content except that it will now
be stored in data structure in the memory pool allocated by mod_yasl.
After the developer has written out the appropriate XML content. The
developer will put the integer to reference the memory pool in the
notes field of the request. This tells Yasl which memory pool to use
for the XML content after the perl code makes an internal redirect
to the appropriate Yasl template.
# This mod_perl module implements an example content handler for
# ApacheCon 2000 - http://www.ApacheCon.com/
#
# It operates within the New Publishing Model, so it generates xml
# and lets yasl do the actual rendering.
package ApacheCon::Example;
use strict;
use lib qw(/mp3/tools/yasl /mp3/tools/MP3Com/lib);
use CGI;
use Apache::Constants qw(:common);
use Apache::Log;
use XMLapi::XMLWriter;
use MP3Com::XMLWriter;
sub handler
{
my $r = shift;
# if the url we're called from has xml_raw in it, show the xml as text/plain
my $dir_config = $r->dir_config();
my $raw_xml = $dir_config->{RawXMLMode};
my $template;
my $xw ; #xmlwriter object.
if (!$raw_xml) {
$xw = XMLapi::XMLWriter->new();
if (!defined $xw) {
$r->log->crit("Can't allocate an XML handle.");
return SERVER_ERROR;
}
} else {
$xw = MP3Com::XMLWriter->new(\*STDOUT);
}
my $cgi = new CGI;
$template = $dir_config->{TemplateDir} . $cgi->path_info();
if (!defined $template) {
$r->log->crit("No template configured.\n");
return SERVER_ERROR;
}
if ($raw_xml) {
$r->content_type("text/plain");
$r->send_http_header();
}
$xw->starttag("Yasl");
# write the page xml data to the xml handle.
$xw->element("date", scalar( localtime ) );
$xw->element("title","Dynamic XML Demo: Pid " . $$);
foreach (sort $cgi->param) {
$xw->starttag("Params");
$xw->element("name", $_);
$xw->element("value", $cgi->param($_));
$xw->endtag();
}
$xw->endtag();
return OK if ($raw_xml);
$r->notes("XML_HANDLE", $xw->handle());
$r->internal_redirect($template);
return OK;
}
1;
Sample Apache Configuration Directives
# Sample set of Apache Config Directives.
#
# - Sander van Zoest
# MP3.com, Inc.
#
## Static XML and templates
AddHandler yasl-parse-html .html
Yasl On
## Dynamic XML Generation using mod_perl
PerlFreshRestart On
PerlRequire /usr/local/www/lib/ApacheCon/Example.pm
PerlSetVar TemplateDir /apachecon/templates
SetHandler perl-script
PerlHandler ApacheCon::Example
PerlSetVar RawXMLMode 0
SetHandler perl-script
PerlHandler ApacheCon::Example
PerlSetVar RawXMLMode 1
## mod_perl Yasl template
Current Date:
Skip is not working
bgcolor="#CCCCCC"bgcolor="#FFFFFF">
Name:
bgcolor="#CCCCCC"bgcolor="#FFFFFF">
Value:
Gotchas and Problems.
---------------------
- One of the problems with converting XML to HTML is that it uses a
similar set of entities, such as the &, that need to be encoded
when putting content in XML and HTML.
When the content is requested by mod_yasl it does not know if the
expected output should be an & or simply just an &.
You would think that you could just put it in the XML the way you
would like to see it on the HTML page, but what if you wanted to
convert the same XML document to PostScript instead?
To solve this problem we added some HTML entity encoding arguments
to , so you that the templates can decide if they want
the entity or the actual value.
- A similar issue is that of quotation marks. Because the XML content
is not known ahead of time putting a yasl:value inside a form input
field could be problematic. Consider the following Yasl template
segment.
">
If the XML contents of Message did not contain a double quote
everything would work without any problems. Now if it was a
double quote the browser would have a fit.
This situation does not come up very often and since it usually
does not come up unless we use FORM tags which are only useful
in HTML; we currently just solve this by adding another argument
to that will escape any single and double quote to
the appropriate HTML entities.
- As shown by the recent CERT advisory URI and Entity encoding is
very important for security reasons. It also can be someone confusing
in template work since you are never really sure if the text is
going to be used in an anchor tag or simply is a piece of text
that needs to be displayed.
In the case of an anchor and image tag you would like to see URI
encoding while just about any other time you would expect HTML
entity encoding. This is another option we added to
that can tell yasl to use URI encoding instead of the default HTML
entity encoding.
This could probably have been determined programmaticly by having
yasl pay more attention to the template it is working with, but
we would rather specify this ourselves then having the performance
overhead of yasl keeping track of this scenario and handle it
appropriately.
- Navigational Elements in content XML seems like something that
would come up every so often for pages that have navigational
elements that depend on the amount of content and which content
is currently being displayed.
This is somewhat of a bummer, because this clutters the clear line
between content and design elements and layout. I would probably
suggest using a second, but smaller XML file that would dictate
these kind of things. Especially because the real estate available
differs dramaticly if you are on a WAP capable phone, Web browser
or on a Legal size piece of paper.
- Performance, in some cases parsing XML is too much overhead and
generated DBM files instead (or from the XML) solves much of this
problem.
Credits and Acknowledgements
----------------------------
At MP3.com, Inc. the development of this project was lead by John DeRose
and executed by David Story, Matt DiMeo, Thomas Tarka and myself. We would
like to thank the Apache Foundation and all open source developers for
this great pluggable web server and Dirk-Willem van Gullik for starting
the XML Apache Project where we hope to contribute immensly.
$Id: notes_apachecon.txt,v 1.1.1.1 2002/10/08 22:38:42 sander Exp $