OR058: High-Performance Dynamic Pages with Templates, XML, and mod_perl Day: Fri Time: 3:45pm Session chair: None assigned Duration: 90 minutes Style: Presentation Categories: Performance, Perl, XML Speaker: Sander van Zoest -- Introduction: ------------ In every group project you would like to split all the tasks that are associated with the project up between the members of your team and have each person be responsible for the task that most fits their experience and/or motivation of the task at hand. Although this is also desired producing a website, it has always been hard to split up the tasks involved in developing a web site, since the content, style, user interface and character sets, delivery and navigational elements are all tightly integrated. The objective of this session is to provide you with ideas and ways to combine tools to seperate all these elements as much as possible so each of these can be developed simultaneously with minimal dependancies between the groups. When working on a large website usually you have different groups that handle different parts of the website. A rough idea of what the break up could be is as follows: o A Design Group That does layout and graphics and navigational elements to accomidate each browser. o A Content group That generate and provide the text and content. o An Internationalization group That does translation of all the navigational elements and content into foreign languages. o A team of programmers That work on massaging the content for handling the dynamic changes in content and potentially providing the means to the design crew to provide dynamic view to the end user. o And a Network and System Administration group That does all the network and system administrative tasks, such as load balancing, network architecture, switching and system configuration. The Network and System Administration group can be split up without too much trouble when you have a working CM System and a documented process on how to build and install each type of machine. The group of programmers can then make specific installation scripts for each type of system. Things such as Domain Name Service, Load Balancing and Monitor can then be managed by the Administrative group without intervention of any other group. Since our topic does not cover any of the above tasks, we will skip over this and assume you have a good enough idea on how that would be split up. How Did we get here ------------------- Now a little bit of history that explains to how we got where we are today. After the early days of the web, people wanted a consistant style throughout the website to signify a particular look and feel. This was usually accomplished with static page generators or Server Side Includes that would include a standard header and footer file. With the influx of e-commerce, bringing a catalog of merchandise onto the web, provided a site for people to shop without ever leaving their home. Most catalogs were maintained and stored in easy to manipulate SQL compliant databases. The e-commerce wave provided the need for dynamic user tracking of potential purchases and to provide an easy method to browse the sites catalog that would lead to a virtual check out line. Most of these online catalogs were either dynamicly generated by custom CGI scripts written specificly for each site using C, perl, java. Development with these tools, usually meant that people would embed stylistic elements within the source code, which made it hard for the design team to make changes on dynamicly generated content without bugging the engineering team. Specificly designed development tools such as Cold Fusion, Microsoft Active Server Pages (ASP) and PHP/FI made this a little easier, but did not provide the flexability has the more programmer specific languages did, which had access to a wealth of packages already written to do just about anything you ever dreamed of doing. Also the learning curve on these tools was usually a higher, because they had their own specific syntax. The specificly designed development tools usually were logical extensions to the static page generators and the Server Side Include modules and were built into the web server. The more people found out about the web the more overloaded the servers became. The specificly designed development tools usually out performed the custom CGI scripts, because the CGI gateway fork() and exec()'d the custom scripts for each request. This limitation brought about server modules such as mod_fastcgi, mod_jserv and mod_perl which each in their own way, solved the fork() and exec() performance hit and allowed developers to continue development in their programming language of choice. Things such as co-branding and affiliate marketing, became next few hot topics on the net. These new topics required the sharing of either the content or style over onto other sites that were not controlled by the same group of individuals or even companies. Remote Virtual Server Includes (SSI), JavaScript and nightly pulls of content and brands were some of the ways this was accomplished. A Quick Overview ---------------- There have been a lot of tools created to help meet lots of different requirements set forth in the way we use the web. Now we will combine these tools and come up with a robust web publishing infrastructure that is focused on just-in-time page assembly, performance and scalablility and allowed to re-use as many tools already in the market today. To create a web page visable by a browser, you need to combine content, design and a particular natural language you want to publish in such as English. [PICTURE] To create the HTML document, you take an English Template/StyleSheet and apply the English XML content. This can be done on-the-fly by the apache web server, using a building stylesheet parser module and URI translation module to link up the appropriate stylesheet with the requested content. This method allows the Design Team to develop the stylesheet template and the content by the Content Group. The Internationalization Group can then take each and provide translations without needing to work closely with either group. Perl can then be used by the Engineering Team to provide Dynamic views of the XML content that will be picked up by the Template for display to the End User. Because of the seperation of the style from the content. There is no reason the Template needs to generate HTML, it could generate HDML, WML for mobile handheld devices or just about any other format. Any component of the request can determine which template to use and which branded design to display. Note: This concept was implemented by MP3.com, Inc. using Apache, C and mod_perl with performance and just-in-time assembly over feature set in mind. For an implementation of a similar concept using Java and Java Servlets with a full XSL implementation see The Cocoon Project http://xml.apache.org/cocoon/ and Stefano Mazzocchi's Nightschool Session "Adding XML Capabilities with Cocoon". The Details ----------- Content ------- As described above, XML provided us with a most standardized way to manage and deliver content. Sometimes the XML simply contains pointers to how to get to the content and other times if the content is embedded. Some of the other sessions such as Pier Paolo Fumagalli's "XML Publishing Fundamentals", Philip Grabowski's "XML: An Intensive Introduction" and Ted Leung's "Everything You Always Wanted to Know About XML Parsing" should give you a good background and concept of XML, so we won't go into much detail here. Generation of XML can be done in several ways. If you already have all your content in a SQL Database, you can use perl's DBI and XML::Writer modules to generate XML from your SQL databases that will be used. Style ----- At MP3.com, Inc. we developed our own Style Template Language that behaves similarily to XSL called Yet Another Style Language (YASL). There is no real reason we can not use XSL or a similar technology here, but at development time there was no C-based XSL implementation that suited our needs. We required just-in-time generation, performance and scalability. For more information about XSL visit http://www.w3.org/ or the sessions "XML Publishing Fundamentals" by Pier Paolo Fumagalli and "Practical XSLT Transformations for Fun and Profit" by Scott Boag. We have been looking at the XML Apache Projects for future possible integration so we can use Xerces as our XML parser and to use the more full featured stylesheet language as provided by Xalan instead. I have not had too much time lately to devote to Apache XML projects, but definately am interested in helping the development effort and I am looking forward to implement a similar solution using these open source tools so that everyone can benefit from what we have learned. A Quick Example --------------- Here is an XML file: Yasl Template Example ApacheCon 2000 http://www.ApacheCon.com/ Orlando, Florida OR058 High-Performance Dynamic Pages with Templates, XML and mod_perl Friday Sander van Zoest MP3.com, Inc. High Geek, Founders Group sander@mp3.com Web Technology Forum http://www.sdsc.edu/WTF/ San Diego, California Sander van Zoest MP3.com, Inc. High Geek, Founders Group sander@mp3.com Combined with a Yasl Template written as <yasl:value "title"/>
Skip is not working
bgcolor="#CCCCCC"bgcolor="#FFFFFF"> Event: ">
Location:
Session:
:
:
Speaker:




Speaker:




This will be displayed through a web browser as:: If you notice the __DEFAULT__ value used for the tag, this tells YASL to use path information from the URI to find the appropriate XML data file and with regex match configuration directive defined in the directory it knows which template to use. This technique allows you to have a single style template for an unlimited amount of XML content. Mod_Perl -------- How does mod_perl interact with the Style Template. For more information about mod_perl visit http://perl.apache.org/ or the sessions "Getting Started with Mod Perl (Part I & II)" and "Improving Script Performance Under mod_perl" by Stas Bekman. To generate XML content on the fly we use the perl scripting language using the mod_perl apache module. To talk to the Yasl Stylesheet Language and XML parser Expat-lite (Build into Apache since version 1.3.9), we use an internal redirect to the Yasl template to have the generated XML be displayed in the browser. To provide Yasl with access to the dynamicly generated XML, we use a modified version of XML::Writer that uses XS glue to talk to Expat-lite and mod_yasl. The constructor of this version of XML::Writer makes a call to mod_yasl to allocate a pool of memory to be used by Expat to store the XML content and returns an integer to perl that it can use to reference this new pool of memory created. With this new version of XML::Writer the developer can make the usual calls to XML::Writer to generate XML content except that it will now be stored in data structure in the memory pool allocated by mod_yasl. After the developer has written out the appropriate XML content. The developer will put the integer to reference the memory pool in the notes field of the request. This tells Yasl which memory pool to use for the XML content after the perl code makes an internal redirect to the appropriate Yasl template. # This mod_perl module implements an example content handler for # ApacheCon 2000 - http://www.ApacheCon.com/ # # It operates within the New Publishing Model, so it generates xml # and lets yasl do the actual rendering. package ApacheCon::Example; use strict; use lib qw(/mp3/tools/yasl /mp3/tools/MP3Com/lib); use CGI; use Apache::Constants qw(:common); use Apache::Log; use XMLapi::XMLWriter; use MP3Com::XMLWriter; sub handler { my $r = shift; # if the url we're called from has xml_raw in it, show the xml as text/plain my $dir_config = $r->dir_config(); my $raw_xml = $dir_config->{RawXMLMode}; my $template; my $xw ; #xmlwriter object. if (!$raw_xml) { $xw = XMLapi::XMLWriter->new(); if (!defined $xw) { $r->log->crit("Can't allocate an XML handle."); return SERVER_ERROR; } } else { $xw = MP3Com::XMLWriter->new(\*STDOUT); } my $cgi = new CGI; $template = $dir_config->{TemplateDir} . $cgi->path_info(); if (!defined $template) { $r->log->crit("No template configured.\n"); return SERVER_ERROR; } if ($raw_xml) { $r->content_type("text/plain"); $r->send_http_header(); } $xw->starttag("Yasl"); # write the page xml data to the xml handle. $xw->element("date", scalar( localtime ) ); $xw->element("title","Dynamic XML Demo: Pid " . $$); foreach (sort $cgi->param) { $xw->starttag("Params"); $xw->element("name", $_); $xw->element("value", $cgi->param($_)); $xw->endtag(); } $xw->endtag(); return OK if ($raw_xml); $r->notes("XML_HANDLE", $xw->handle()); $r->internal_redirect($template); return OK; } 1; Sample Apache Configuration Directives # Sample set of Apache Config Directives. # # - Sander van Zoest # MP3.com, Inc. # ## Static XML and templates AddHandler yasl-parse-html .html Yasl On ## Dynamic XML Generation using mod_perl PerlFreshRestart On PerlRequire /usr/local/www/lib/ApacheCon/Example.pm PerlSetVar TemplateDir /apachecon/templates SetHandler perl-script PerlHandler ApacheCon::Example PerlSetVar RawXMLMode 0 SetHandler perl-script PerlHandler ApacheCon::Example PerlSetVar RawXMLMode 1 ## mod_perl Yasl template <yasl:value "title"/>

Current Date:

Skip is not working
bgcolor="#CCCCCC"bgcolor="#FFFFFF"> Name: bgcolor="#CCCCCC"bgcolor="#FFFFFF"> Value:
Gotchas and Problems. --------------------- - One of the problems with converting XML to HTML is that it uses a similar set of entities, such as the &, that need to be encoded when putting content in XML and HTML. When the content is requested by mod_yasl it does not know if the expected output should be an & or simply just an &. You would think that you could just put it in the XML the way you would like to see it on the HTML page, but what if you wanted to convert the same XML document to PostScript instead? To solve this problem we added some HTML entity encoding arguments to , so you that the templates can decide if they want the entity or the actual value. - A similar issue is that of quotation marks. Because the XML content is not known ahead of time putting a yasl:value inside a form input field could be problematic. Consider the following Yasl template segment. "> If the XML contents of Message did not contain a double quote everything would work without any problems. Now if it was a double quote the browser would have a fit. This situation does not come up very often and since it usually does not come up unless we use FORM tags which are only useful in HTML; we currently just solve this by adding another argument to that will escape any single and double quote to the appropriate HTML entities. - As shown by the recent CERT advisory URI and Entity encoding is very important for security reasons. It also can be someone confusing in template work since you are never really sure if the text is going to be used in an anchor tag or simply is a piece of text that needs to be displayed. In the case of an anchor and image tag you would like to see URI encoding while just about any other time you would expect HTML entity encoding. This is another option we added to that can tell yasl to use URI encoding instead of the default HTML entity encoding. This could probably have been determined programmaticly by having yasl pay more attention to the template it is working with, but we would rather specify this ourselves then having the performance overhead of yasl keeping track of this scenario and handle it appropriately. - Navigational Elements in content XML seems like something that would come up every so often for pages that have navigational elements that depend on the amount of content and which content is currently being displayed. This is somewhat of a bummer, because this clutters the clear line between content and design elements and layout. I would probably suggest using a second, but smaller XML file that would dictate these kind of things. Especially because the real estate available differs dramaticly if you are on a WAP capable phone, Web browser or on a Legal size piece of paper. - Performance, in some cases parsing XML is too much overhead and generated DBM files instead (or from the XML) solves much of this problem. Credits and Acknowledgements ---------------------------- At MP3.com, Inc. the development of this project was lead by John DeRose and executed by David Story, Matt DiMeo, Thomas Tarka and myself. We would like to thank the Apache Foundation and all open source developers for this great pluggable web server and Dirk-Willem van Gullik for starting the XML Apache Project where we hope to contribute immensly. $Id: notes_apachecon.txt,v 1.1.1.1 2002/10/08 22:38:42 sander Exp $