Pub Protocol v3 a.k.a Pub3

The OKWS publishing system is a way for programmers to separate their C++ code and their HTML, and to keep the HTML on the file system where it's most convenient to manage. There are two sides to this system, then:

  • C++ bindings - in your OKWS services, you'll use these to read HTML templates off the file system, parse them, and publish them.
  • The templates themselves - OKWS templates look a lot like HTML, JS, CSS, or whatever else you're trying to publish, but with some important differences and enhancements.

The first half of this document covers the template language (its enhancements), and the second half covers the C++ bindings.

Template Language

Understading HTML Mode and Pub Mode

All the text in your pub documents can be classified into two regions. We say those regions are in HTML Mode and Pub Mode. HTML mode is where you find normal HTML and variable substitution. Pub mode is where you find all advanced pub logic.

Pub Mode is activated with the {% .. %} closing markers. For example:

This is some normal text
{% 
     include("somefile1.html")
     include("somefile2.html")
%}

Inside pub mode, some commands – covered later in detail – switch back into HTML mode, using the double curly brace markers.

{%
   if(x==1) {{ This is some normal text }}
   include("somefile.html")
%}

The HTML mode inside the double curly braces is just as powerful as the top-level HTML mode. So it, too, can switch back into Pub Mode, with yet another {% .. %} region. You can nest arbitrarily deep:

{%
    if(x==1) {{
         This text will be output if x == 1.
         {%  if (y==1) {{ This text will be output if x == 1 and y == 1. }} %}         
    }}
    include("somefile.html")
%}

Variable Substitution

Inside HTML Mode,

 %{X}

is interpreted as a variable that should be resolved at runtime. A practical example:

Welcome home, %{screenname}. You're %{age} years old.

Inside Pub mode, variables should not be wrapped in ”%{}”.

{%
    if (age > 18) {{ You're an adult because you're %{age} years old. }}
%}

Variables can be set either in the C++ side of things – passed along in calls to pub – or elsewhere in the HTML templates, which we'll get to in a little while.

Variables as objects

In earlier versions of pub, a template var could only be resolved to a string, an integer, a float, or NULL (when unset). However, in pub3 a var may be any of those things, or an array of other vars, or a dictionary of key-value pairs, where the key is a string, and the value is another var. Those familiar with JavaScript Object Notation or Python should be at home with this kind of object representation:

user : {
     name : "maxwell",
     age : 22,
     emails : ["supermax@maxk.org", "submin@maxk.org"],
     pi : 3.14
}

In pub3 %{user} might very well be an object such as the one above. If you try to print %{user} in your HTML, Pub will output object notation as exampled above. However, most likely you'll want to use the components, like so:

Hi, %{user.name}, welcome back.  
Your primary email address is %{user.emails[0]}.  
On your world, pi is ${user.pi}.

With associative arrays, you can use either dot notation or array notation:

Hi, %{user["name"]}, a.k.a., %{user.name}.

Later in this doc we'll cover looping (iterating) over arrays.

Including Other Templates

Templates can include each other much like the old Server-Side Includes (SSIs) of yore. The syntax is:

{% include (<filename> [, dictionary assignments ]) %}

Here's a simple example:

{% include("header.html") %}

And here's an example that sets a var in the included file:

{% include("header.html", { bodystyle : "old-fashioned" }) %}

Inside header.html, we might see something like this:

<body class="%{bodystyle}">

What's going on here? The include command asks the runtime system to suck in the file subfile.html (in the same directory as the current template) and while so doing, to substitute all instances of %{bodystyle} for old-fashioned in the included template. Of course, templates can be nested, and there are checks at runtime to make sure there are no circular inclusions. Note it's possible to have assignments of the form:

{% include("subfile.html", { "X" : "some%{foo}bar" }) %}

That is, the value half of a name-value pair can have interesting resolutions in it too. Also useful, the filename of the include can also be dynamic:

{% include("subfile.%{LANG}.html", { "X" : "some%{foo}bar" }) %}

This is useful, for instance, when displaying pages in different languages based on user preferences, etc.

Conditionals

The ''If'' Statement

Pub3 supports the if statement, which expects a series of conditionals and then output. It's best thought of as a series of if, else, else, etc. statements:

   {% if (cond1) {{ output1 }}
      elif   (cond2) {{ output2 }}
      elif   (cond3) {{ output3 }}
      ...etc. 
      else {{ output4 }}
     %}

As soon as a condition is met, the appropriate pub code is executed, and the if statement ends. Here's an example:

   {% if   (user.age < 18) {{ Major burns, minor! }}
      elif (user.age < 22) {{ Welcome to college. }}
      elif (user.age < 30) {{ Get a job! }}
      elif (user.age > 65) {{ Relax... }}
      else {{ You're working for the man. }} %}

A default case (collected with true above) is not necessary.

Boolean logic is supported in conditionals:

{% if (user.age < 18 && user.gender == "female") {{ Jailbait! }} %}

Boolean operators can be strung together, and order of operation can be controlled through parentheses:

{% if ( (a < 12 && b >= 10) || c == "foo") {{ Awesome. }} %}

Vars that are NULL (i.e., they haven't been set or failed a lookup) will fail comparisons and make messy warnings. You can test whether a variable is null without generating a warning:

{% if (isnull(A)) {{ "A" is not set. }} %}

Double Curly Braces or Single Curly Braces in Conditionals

Double curly braces switch you back to HTML mode:

{% if (is_logged_in) 
   {{ You are logged in!!! }} %}

Single curly braces keep you in Pub Mode:

{% if (is_logged_in) {
      include("logged_in_header.html") 
   } %}

Of course you can still switch modes inside either:

{% if (is_logged_in) 
   {{ You are logged in!!! 
         {% include ("logged_in_header.html") %}
   }} %}

Set Statements

You can update or create a var within a template with the set command.

{% set { <assignment1> [,<assignment2>, <assignment3>,... } %}

For example, to set a site_url var:

{% set { site_url : "http://www.okcupid.com" } %}

This sets two different vars, age and screenname, in one call:

{% set { age : 10, screenname : "Dr. Who" } %}

And, since variables can be much more complex than just floats, integers, and strings, you might be wondering how to set a big object. It looks like like the object notation at the top of this page, with a set around it:

{% set { 
     user : {
        name : "maxwell",
        age : 22,
        emails : ["supermax@maxk.org", "submin@maxk.org"],
        pi : 3.14
     }
%}

You can make right-hand assignments that are objects, too.

{% set {
      old-fashioned : {
         color : "cornsilk",
         font-size : "2.0em"
     }
   set {
       user : {
          name : "maxwell",
          age : 22,
          template : old-fashioned
      }
%}

Strings can be expanded at run-time:

{% set { 
      user : {
         name : "Maxwell",
         usa_address : "%{street}\n%{city}, %{state} %{zip}"
     }
%}

Looping

The most powerful addition to pub is the shift from simple scalar vars to full-blown objects. An OKWS service can provide a complicated object or array of objects as a var, and pub can handle it nicely. For example, let's say a var %{buddies} has been populated with the following data, in object notation:

buddies : [
    {name : "Adam", age : "20", gender : "m" },
    {name : "Bill", age : "24", gender : "m" },
    {name : "Caty", age : "25", gender : "f" },
    {name : "Debb", age : "26", gender : "f" }
]

We can print a nicely formatted list of them like so:

{% for (buddy, buddies) {{
      <li>%{buddy.name} is online. 
          {% if (buddy.gender == "m") {{ <a href="#">Send him a message</a> }}
                (true)                {{ <a href="#">Send her a message</a> }}
          %}
      </li>
   }} 
%}

Don't forget you can include another HTML file inside this loop. You can also pass vars to it, even ones that are objects and local to the loop:

{% for (buddy, buddies) {
       include("person_display.html", { person_to_display : buddy } ) 
   } 
%}

The keywords last, first, even, odd, iter, and count are filled automatically. For example, this generates a comma separated list of items with an “and” between the last two:

{% for (item, list) {
       if   (!item.first && item.last) {{ and }},
       elif (!item.first)              {{, }}
       print item.whatever
    } 
%}

Scope

When you set a var, by default that variable is scoped globally. The key here is that a parent document will get the variable too, as well as any brother or cousin documents, after the set happens.

For example, imagine a parent file:

{%
   set { x : 10 }
   include ("child.html")
   print "2:%{x}"
   include ("brother.html")
%}

And the child file (“child.html”) is:

{%
   set { x : 20 }
   include ("grandchild.html")
%}

And the brother file (“brother.html”) is:

{%
   print "3:%{x}"
%}

And the grandchild file (“grandchild.html”) is:

{%
   print "1:%{x}"
%}

This above example will print “1:20 2:20 3:20”, since the setting of x to 20 in the child document will persisnt into the grandchild, and also persist past the end of the child document, into the parent's scope, and subsequently into any files included by the parent (like “brother.html”).

Now image we changed the child file to use setl rather than set. Then, the effects of setting x will be available in the file that does the set and all of its descendants, but won't propagate to parents and brothers. The effect of publishing the parent file would be “1:20 2:10 3:10”.

Passing vars inside an include is equivalent to putting a setl statement at the top of the include:

{%
   // Locally set age to 20 *inside* foo.html
   include("foo.html", {age : 20})
%}

For obvious reasons, assignments inside for loops are equivalent to setl and therefore only work inside the loop:

{% for(buddy,buddies) {{
      buddy.name
}} %}
<!-- the following will be null -->
%{buddy.name}

Dictionary files and the load command

Because assignments are global, it works to include files that have a series of set statements inside them.

{% include("country_facts.html" %}
The center of mass of the USA is %{usa.center_of_mass}.

By convention, you might like to name such a file with the .dict extension, although that's optional.

To suppress all output from the include – such as unwanted whitespace (especially important before a doctype tag in HTML) – use the load command, which is identical to include sans output:

{% load("site_vars.dict" %}<?doctype ... bleah bleha bleah.

It's good style to use load and .dict if you know you're simply setting global vars inside an include.

Comments and language tagging

Anything inside triple square brackets in pub is stripped. Gone are the days of HTML comments your users can read.

 <div>
 [[[ Here's something no one will ever see. ]]]
 </div>

Comments inside commands are not currently allowed:

<!-- ERROR -->
{% set( foo1 : "bar",
        foo2 : "bar2", [[[ this is a comment (ERROR!)]]]
        foo3 : "bar3" ) %}

Double square brackets are removed, leaving their contents in place. This can be helpful for tagging plain-language content inside HTML for translators to find:

 <h1>[[Welcome back, %{screenname}.]]</h1>

Double square brackets inside of a double square brackets are stripped. This feature is used by OkCupid.com in its translation software; the inner comments are shown to translators.

 <a href="#">[[ [[ Here "log" means a document, not a felled tree. ]] View the log]]</a>

The above simply outputs:

 <a href="#">View the log</a>

Arithmetic operators

   {% if (1 < 20)         {{ 1 < 20 }}
      if (-5 == 1 - 6)    {{ -5 == 1 - 6 }} else {{ bad 2}}
      if (!(1 >= 20))     {{ ! 1 >= 20 }}
      if (20 + 30 > 40)   {{ 20 + 30 > 40 }}
      if (30 != -30)      {{ 30 != -30}} %}

Caveat: due to parsing difficulty, currently len(str)-1 will give you a syntax error, but len(str)- 1 will work, or you can try len(str) - 1.

Regular expressions

Pub3 has support for perl-compatible regular expressions, supporting most fancy features you use. Regular expressions can be specified with either simple string syntax:

 {% setl { regex : "a+b?c+" } %}

or special regular-expression syntax if that floats your boat:

 {% setl { regex1 : r{a+b?c+},
           regex2 : r/a+b?c+/i,
           regex3 : r#^a+b?c+$#g,
           regex4 : r[a+b?c+] } %}

and so forth. As in perl, pub3 allows delimiting regular expressions with many symbols. At the end of the day, the characters you use as delimiters don't really matter, they're just referred to in the parsing step. Also as in perl, you can give commands to the regex after the closing delimiter. Once you have a regular expression, you can feed it to the match or search functions:

{% if (match (regex, "aabccc")) {{ "should print!" }} %}
{% if (match (regex1, "aabcc")) {{ "me too!" }} %}

match and search are largely equivalent, except match looks to make sure that the whole string is matched, while search will be happy to find your regex anywhere in the given string. Of course, no need for variable assignments, you can call match directly with a regular expression:

 {% if (match (r/a+b?c+/i, "aabccc")) {{ "should print!" }} %}
 {% if (search ("a+b?c+", "XaabccY")) {{ "me too!" }} %}

Finally, 'match' and 'search' come in two different prototypes. The first, we've already seen, take two arguments:

  • match(regex, text)
  • search(regex, text)

The other takes three arguments, the middle argument being the options to feed to regular expression matcher:

  • match(regex, options, text)
  • search(regex, options, text)

Filters and Function Calls

All filters are available via the standard syntax:

 toupper (html_escape ("<tag>"))

or via a Django-inspired “filter” syntax:

 "<tag>"|html_escape()|toupper()

or more succinctly:

  "<tag>"|html_escape|toupper

If calling with the Django-style filter syntax, then the value coming through the filter is the this parameter, the first argument to the function.

You can call the following functions and filters from Pub3 templates.

rand - random number generator

 <!-- Prints an integer from 2 to 9 -->
 %{rand(2,10)}
 
 <!-- Prints an integer from 0 to 9 -->
 %{rand(10)}
 
 <!-- Prints a random element inside an array -->
 {% set{animals : ["bat", "mouse", "clam" ] } %}
 %{animals[rand(0,3)]}

len(v)

Length of string or array

range(high), range(low,high), range (low,high,step)

Returns array of int, from low (inclusive) to high (exclusive), using step

isnull(v)

Checks if variable v is null or not

substr(str,start,len)

join(delim,vec)

split(regex,s)

Splits s using regex, returns array/list/vector

strip(s)

Strips leading, trailing spaces and reduce in-between spaces to just one space

match, search

See section on regex

tolower(s), toupper(s)

Self explanatory: take an input of a string, and output the same string with all alphabetic characters converted to upper (or lower) case.

html_escape(s), html(s)

Take the input text s, and escape it so that it's safe to output in HTML without it being interpreted as markup. For instance, when outputting forum posts or personal essays. Mainly, escape '<', '>' and '&' tags. Output the escaped string.

json_escape(s), json(s)

Take the input text s and escape all double-quotes, backslashes, newlines and tabs. Output the escaped string.

default(in, val = "")

Given input value in, check to see if it's NULL. If it is null, output val; otherwise, output in. If the default value val is not specified, assume a default of the empty string ””.

example
  {%setl { d : { x : "hi" } } %}
  %{d.y|default("bye")} will output "bye"
  %{d.z|default} will output "" and will not produce any warnings.

append(v,x)

Appends x to list v

map

???

tag_escape

???

url_escape(s), url_unescape(s)

Produce URL encoded string from s, or Decode URL s

sha1(s)

Produce SHA1 hash of s

type(v)

Type of variable v. Possible values are undef, list, str, dict, pub2obj

items(dict)

Returns list of items in dictionary, each item is represented as a list of 2 elements, first being the key, second being the value

keys(dict)

Returns list of keys in dictionary

values(dict)

Returns list of values in dictionary

json2pub (dict)

   locals { s : "{ 'a' : [0 ,1 , 2, { 'b': 4, 'cow' : [ 0, 'boy' ] } ] }" }
   locals { obj : json2pub (s) }
   print (obj.a[3].cow[1]); // prints 'boy'

JSON

If ever you want to use OKWS to output JSON, don't handroll your own:

if (do_json) {{
  {
     "my_foo" : %{foo|json},
     "my_bar" : %{bar|json},
  }
}}

This is what I call “hand-rolling JSON” and there's a much better way in pub3:

if (do_json) {
    locals { tmp : { my_foo : foo, my_bar : bar } }
    print tmp
}

There are also nice tecniques for putting the results of including a file into your JSON object. Instead of this:

if (do_json) {{
    {
         "file_out" : {% include ("x.html") %}
    }
}}

You can do this:

if (do_json) {
     locals { file_out : {} }
     load ("x.html", { ret : file_out })
     print ( { file_out : file_out })
}

Then in x.html, you store whatever values into the local variable 'ret':

ret["dog"] = 10;
ret["cat"] = [1,2,3];

And the output json will give you:

{ "file_out" : { "dog" : 10, "cat" : [1,2,3} }

C++ Interface

The base classes oksrvc_t and okclnt_t both have access to a pub2 object via the method ::pub2(). This method return an object of type ptr<pub2::remote_publisher_t>. If you read libpub/pub2.h, you'll see that the remote_publisher_t class inherits from abstract_publisher_t, and in that superclass is the interface to the new publishing system. The pub2 API on this object is specified in libpub/pub2.h and is summarized as follows:

run_full

Method Name: pub2::abstract_publisher_t::run_full

The run_full command runs the publisher on the give filename, with output to the given zbuf. It's called run_full since it gives you a full status report, not just a success/failure result:

class pub2::abstract_publisher_t {
  void run_full (zbuf *b, pfnm_t fn, status_cb_t cb,
                 aarr_t *a = NULL, u_int opt = 0, penv_t *e = NULL,
                 CLOSURE);
  typedef callback<void, xpub_status_t>::ptr status_cb_t;
}

This function is the primary access point to the publishing system in the second version. Notice that it's asynchronous and blocking! You call it, and it later calls you back. It's suggested to use this interface along with the tame tool though clearly not required.

Input Parameters

  • zbuf *b — Output all data to this zbuf.
  • pfnm_t fn — the name of the input file. If specified in absolute terms (with a leading /) then the filename is relative to pubd's root directory. If specified in relative terms, then relative to the parent's directory or relative to the root if this file is at the top level.
  • status_cb_t cb — Call this callback after the operation completes, with a status flag indicating the outcome of the operation. See libpub/xpub.x for some possible values here.
  • aarr_t *a – An associative array of name/value pairs that will be used to fill in keys for switch statements and variables in HTML templates (like ${X}). Defaults to NULL.
  • u_int opt — Publishing options to use when publishing this template. Possibilities include one or more (via bitwise OR) of the following flags:
    • P_DEBUG — Output debug info output w/ text
    • P_IINFO — Include in HTML comments details about the include hierarchy.
    • P_VERBOSE — Debug messages, etc.
    • P_WSS — Turn on 'white space stripping' to save bytes.
    • P_VISERR — Make include errors visible in HTML.
  • penv_t *e — Specify a publishing environment to use when publishing the given file. Off by default and rarely used.

Return Value

This function has no actual return in the standard C++ sense, but does signal the outcome of the operation via its callback. Expect XPUB_STATUS_OK in the case of success.

run

Method Name: pub2::abstract_publisher_t::run

Like run_full but returns an abbreviated status field — true for success and false for failure:

class pub2::abstract_publisher_t {
  void run (zbuf *b, pfnm_t fn, cbb::ptr cb, aarr_t *a = NULL,
            u_int opt = 0, penv_t *e = NULL,
            CLOSURE);
}

That is, the only difference is in the type of callback given.

run_cfg_full

Method Name: pub2::abstract_publisher_t::run_cfg_full

Run a configuration file through the publishing system. Output to an associative array rather than a zbuf, signifying that the output of reading a configuration file is a series of name-value pairs:

void run_cfg_full (pfnm_t nm, status_cb_t cb, aarr_t *dest = NULL,
                   CLOSURE);

Parameters

  • pfnm_t nm — The name of the configuration file to publish. Specified either relative to the parent file, or absolute, as in run_full and run.
  • status_cb_t cb — The callback to call after the reading and parsing operation completes.
  • aarr_t *dest — Read results into this associative array, accumulating all name/value pairs in the given configuration file and its included children. If no dest parameter is given, then name/value pairs are read into the publishing object used to call this method, and will persist there indefinitely.

Return

No return value, but signals its status asynchronously via status_cb_t cb. See libpub/xpub.x for possible values for the status field. Normal operation is signaled by XPUB_STATUS_OK

run_cfg

Method Name: pub2::abstract_publisher_t::run_cfg

Same as run_cfg_full but return and abbreviated status flag — true or false in the case of failure or success. The signature is as follows:

void run_cfg (pfnm_t nm, cbb::ptr cb, aarr_t *dest = NULL, CLOSURE);

cfg_clear

Method Name: pub2::abstract_publisher_t::cfg_clear

Clear all of the name/value bindings read into the publishing object as a result of calling run_cfg or run_cfg_full without passing an associative array argument:

void cfg_clear ();

Passing Pub3 objects to template

Include pub3obj.h in your code. Use pub3::obj_t objects to generate Pub3 objects.

For example

pub3::obj_t obj;  // null
obj = 3;  // int
obj = "string"; // string
 
pub3::obj_dict_t dict;  // declare an object. obj_dict_t is subclass of obj_t.
dict ("field") = ...; 
 
pub3::obj_list_t array; // declare an array. obj_list_t is subclass of obj_t.
array[0] = ...;

You can also convert a pub3::obj_t to pub3::obj_list_t, if you use the [] operators. But beware, this does change the internal representation so what you had before might be corrupted/lost.

Then, you can get an aarr_t ptr from the dict () method of the pub3::obj_t. E.g.

  pub3::obj_t obj;
  ...
  ptr<aarr_t> a = obj.dict ();

An Example

An example of calling into pub2 using C++ bindings can be seen in this tutorial page.

Configuration and Setup

Configuration Files

To run Pub version 2, you need to add the following line to your okws_config file:

 Pubd2ExecPath  pubd -2

Where okld looks for your pubd executable relative to your top OKWS directory specified with TopDir. The -2 flag means that pubd should run with protocol 2, since it runs with protocol 1 by default. You can also specify, in this line, a configuration file for pubd to use once it starts up. By default, it looks for pub_config in /usr/local/etc/okws or /etc/okws.

As mentioned earlier, the pub_config file is specified in the same syntax as pub's set syntax. For example:

 <!--#set ({
     JailDir     =>  "/path/to/htdocs",   // comments here
     RunAsUser   =>  "www",               // will setuid
     RunAsGroup  =>  ${RunAsUser}         // will setgid
   }) -->

The JailDir field is required; if the RunAsGroup and user are left out, pubd makes a guess (see libpub/okconst.h. When running okld as root, it will launch pubd that chroots into the given jail directory, and setuid/setgids to the given user/group pair. If not, then it will just chdir. All files to publish from C++ or from other templates are taken relative to the given jail directory.

Though not shown in this example, pubd configuration files are free to include other configuration files with the same syntax. The user can also specify name/value pairs that pub doesn't necessary require, such as LANG ⇒ 'en'. Variables defined this way are used as default name/value resolutions for all templates used in the system. That is, if I include from C++ a template that has the variable ${LANG} but does not have a value for that variable explicitly mentioned in the aarr field, then it defaults to the value specified in pub_config.

Optimizations

A typical deployment is to run OKWS on a series of Web servers, but to have all machines share an htdocs directory (or a pubd root in OKWS-speak) via NFS. This is a great setup, but there is a performance caveat. The publishing system is pretty aggressive about caching your HTML templates but also only knows if your template changes by polling the file system. To poll a whole bunch of files over NFS can be slow, and wasteful since the template files probably change infrequently.

The solution packaged with OKWS is the treestat program. The idea of treestat is that it's a very simple program that should run on the NFS server, that touches a little dotfile whenever any file in the htdocs tree changes. This way, OKWS running over NFS just needs to check the little dotfile. If it's changed, then it needs to do a more thorough check, but since you will be modifying your templates pretty infrequently, the more thorough check happens pretty infrequently.

If you don't run treestat everything will still work, but it will make for more network traffic, and OKWS will check your htdocs for template file changes less frequently. If running treestat, OKWS can know within a second of a template change.

treestat uses the same configuration file as pubd and in particular reads the JailDir setting in that file to determine the top level of the HTML documents tree. Thus, to run on the NFS server, make sure the pub_config file is available as usual and then:

 treestat /path/to/my/html/dir

treestat goes into daemon mode and you won't have to worry about it until after reboot. Of course, you are free to use whatever rc mechanisms are at your disposal to run this command automatically on startup. Without an argument, treestat looks for pub_config file, and reads a jaildir out of there.

 
okws/pub3.txt · Last modified: 2010/08/19 20:07 (external edit)
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki