Wednesday, 11 November 2015

How is resource resolution done in Sling?

The below images tells us how a URL is resolved and mapped to a resource.

Consider the URL
GET – www.mywebsite.com/products/product1.printable.a4.html/a/b?x=12
Here the type of request will be HTTP GET request
We can break it down into its composite parts:

Sling's Request Processing Revisited

In Apache Sling each (http) request is mapped onto a JCR resource, i.e. a repository node. This is very different from other web frameworks you might be familiar with, say, Struts or Rails. In these frameworks a request is mapped onto a controller, i.e. a url really addresses application code, so the application developer usually implements some application logic that retrieves model data and passes it on to the view.
Just like Rails or Struts, Sling implements a model-view-controller architecture. However, in Sling a request addresses a piece of content. The mapping between request and model (data, content) is accomplished through the url so there is no need for further custom mapping logic.


Node selection
So, how does this work in detail? Consider an http GET request for the url:

/content/corporate/jobs/developer.html

First, Sling will look in the repository for a file located at exactly this location. If such a file is found, it will be streamed into the response as is. This behavior allows you to use Sling as a web server and store your web application's binary data in the repository as well.

However, if there is no file to be found Sling will look for a repository node located at:

/content/corporate/jobs/developer

(i.e. it drops the file extension). If this node cannot be found Sling will return the http code 404.

Script folders
The scripts that Sling uses to process http requests are stored in subfolders of "/apps". Those subfolders are usually of type nt:folder, but that's not a requirement.

Script selection
Nodes can have a special property named "sling:resourceType" that determines the resource type. Let us consider the simplest case (using the example request URL from above) and assume that the resource type is, say, "hr/job". The selected script will then be "/apps/hr/job/job.esp" (the last part of the resource type will have to be the file name). This works for GET requests and URLs ending in ".html".
Requests using other request methods, say POST, will cause Sling to look for the script at "/apps/hr/job/job.POST.esp". Request URLs ending in something else than ".html", say ".pdf", will make Sling look at "/apps/hr/job/job.pdf.esp". The convention to distinguish the two cases is that http methods are all uppercase and the extension of the request is all lowercase.
In a content-centric application the same content (aka nodes) must often be displayed in different variations, e.g. as a teaser view and as a detail view. In Sling this is achieved through selectors. The selector is specified in the URL like e.g.

/content/corporate/jobs/developer.detail.html

For this URL Sling would locate the script at "/apps/hr/job/job.detail.esp"
If the selected resource has no special resource type a script will be looked up based on the content path. For example, the script for /content/corporate/jobs.html will be searched in /apps/corporate.

Script engine
The ".esp" extension of the scripts used in the examples above indicates script engine to use. ".esp" stands for Ecma script and internally uses Rhino, Mozilla's Javascript engine. Other supported extensions are ".rb" for JRuby scripts, ".jsp" for JSPs or ".jst" for client-side execution (".jst" denotes Javascript template).

Some interesting special cases
The examples above describe rendering nodes as html or as a pdf. Howevere, there are also some built-in renderers for json and txt. The corresponding node presentations are located at (using the example form above):

/content/corporate/jobs/developer.json
and
/content/corporate/jobs/developer.txt

respectively.
For http error handling (404 or 500) Sling will look for a script at "/apps/sling/servlet/errorhandler/404.esp" and 500.esp, respectively.


More on script selection
If you need to find out more on the details of script selection in Sling have a look at Sling ticket 387 where developer Felix Meschberger a lot more on the script resolution process.

SLING 387:

According to the findings in the dev list thread "Simplifying script paths and names?" at [1] I would now like to propose the implementation of this change in script/servlet path resolution:

Note: This issue talks about scripts. But as servlets are mirrored into the virtual Resource Tree accessible through the ResourceResolver, servlets are treated exactly the same as scripts (or vice-versa actually). So the discussion applies to servlets as well as to scripts.

(1) Script Location

Scripts to handle the processing or a resource are looked up in a single location:

     {scriptPathPrefix}/{resourceTypePath}

Where {scriptPathPrefix} is an absolute path prefix (as per ResourceResolver.getSearchPath()) to get absolute paths and {resourceTypePath} is the resource type converted to a path. If the {resourceTypePath} is actually an absolute path, the {scriptPathPrefix} is not used.

Example: Given the search path [ "/apps", "/libs" ] and a resource type of sling:sample, the following locations will be searched for scripts:

     * /aps/sling/script
     * /libs/sling/script


(2) Within the location(s) found through above mechanism a script is searched whose script name matches the pattern
     {resourceTypeLabel}.{selectorString}.{requestMethod}.{requestExtension}.{scriptExtension}

where the fields have the following meaning:

     {resourceTypeLabel} - the last segment of the {resourceTypePath} (see above)
                    This part is required. Only scripts whose name starts with this name are considerd
     {selectorString} - the selector string as per RequestPathInfo.getSelectorString
                    This part is optional. The more selectors of the selector string match, the
                    better.
     {requestMethod}
                    The request method name. This is optional for GET or HEAD requests
                    and is required for non-GET/non-HEAD requests
     {requestExtension}
                    The extension of the request. This is optional.
     {scriptExtension}
                     The extension indicating the script language. Not used for selecting
                     the script but for selecting the ScriptEngine. This is of course not existing
                     for servlets.

If multiple scripts would apply for a given request, the script with the best match is selected. Generally speaking a match is better if it is more specific. More in detail, a match with more selector matches is better than a match with less selector matches, regardless of any request extension or method name match.

For example, consider a request to resource /foo/bar.print.a4.html of type sling:sample. Assuming we have the following list of scripts in the correct location:

   (1) sample.esp
   (2) sample.GET.esp
   (3) sample.GET.html.esp
   (4) sample.html.esp
   (5) sample.print.esp
   (6) sample.print.a4.esp
   (7) sample.print.html.esp
   (8) sample.print.GET.html.esp
   (9) sample.print.a4.html.esp
   (10) sample.print.a4.GET.html.esp

It would probably be (10) - (9) - (6) - (8) - (7) - (5) - (3) - (4) - (2) - (1). Note that (6) is a better match than (8) because it matches more selectors even though (8) has a method name and extension match where (6) does not.

If there is a catch, e.g. between print.esp and print.jsp, the first script in the listing would be selected (of course, there should not be a catch...)

1.         Update Servlet Resolution Description                       Resolved        
To be clear: This new mechanism replaces the script resolution mechanism of today. As such this change is not backwards compatible and existing applications will have to be adapted.
Bertrand Delacretaz added a comment - 18/Apr/08 13:41
> Given the search path [ "/apps", "/libs" ] and a resource type of sling:sample, the following locations will be
> searched for scripts:

> * /aps/sling/script
> * /libs/sling/script

I think this should be

/apps/sling/sample
/lib/sling/sample

And I'm not sure what the initial */ means in your example, I thought the search paths were absolute.
Felix Meschberger added a comment - 18/Apr/08 13:53
the initial "*" in the lines is just a numbering symbol. It has no code significance. And yes, your fixes are correct.
David Nuescheler added a comment - 25/Apr/08 15:38
i think it is important that this change was originally suggested to
make the simple cases as simple and intuitive as possible for
the user of sling and not to come up with something that is really
easy and consistent to map for the sling implementation.

let me try to explain with an example:
as a user of sling i would like to have my app in /apps/myapp and lets say i have a node of resourceType "myapp/homepage" at "/content/myapp".

i would like to to be able to structure my applications as follows:

(1) /apps/myapp/homepage/hompage.esp (or html.esp or GET.esp)
(2) /apps/myapp/homepage/edit.esp (or edit.html.esp)
(3) /apps/myapp/homepage/header/highlight.jpg.esp
(4) /apps/myapp/homepage/header/selected.jpg.esp
(5) /apps/myapp/homepage/header/small.jpg.esp

where

/content/myapp.html -> (1)
/content/myapp.edit.html -> (2)
/content/myapp.header.highlight.jpg -> (3)
/content/myapp.header.selected.jpg -> (4)
/content/myapp.header.small.jpg -> (5)

i think it is important that we avoid unnecessary repetition at any point
and we would allow for enough flexibility in the /apps directory allow
the user to come up with something short, distinct and meaningful.

I haven't found out how to hook a script to POST with a "delete" selector, tried POST.delete.html.esp and various other options but that didn't work.

Looking at the code, I think that might not work at all, and I didn't find tests related to this.
The correct script path would be .../delete/POST.esp

Request Selectors are ignored for non-GET/HEAD requests. Hence the POST-related test is expected to fail. This is related to issue (III) in [1] stating that not all request methods are equal.

Hence, I suggest to for the moment comment out these tests with a comment stating this situation and not implementing that support. Reason for this is, that a resource URL should address the resource (and at most give some hint for a specific representation, such as html or txt) but not include an operation such as delete.
                            
   * Refactored resolution of error handler servlets/scripts to use new mechanism
   * Removed unused methods and classes
   * Renamed helper classes to reflect their functionality

Closing this issue for now. Errors in the new implementation should be reported in new issues.

Currently, working with selectors requires you to put scripts in
subfolders, for example

/apps/foo/html.esp
/apps/foo/someselector/html.esp

and worse, all GET scripts which produce html are named html.esp,
which can be confusing when editing them.

We talked about this with David and Felix, here's a proposal for
simplifying those names in the "happy case", while keeping the current
conventions to resolve potential name conflicts where needed. Comments
welcome.

= Proposal =

The following variants should be accepted for script names, examples:

a) sling:resourceType=foo, request = bar.html

Sling searches for the following scripts and uses the first one found:

  /apps/foo/html.esp
  /apps/foo/foo.esp

The only change is that the script used for html rendering can
optionally be named foo.esp, to avoid having many scripts called
"html.esp" which is not practical when opening many of them in an
editor or IDE.

a) sling:resourceType=foo, request = bar.selector.html

The following scripts can be used to process this request, the first
one found being used:

/apps/foo/selector/html.esp
/apps/foo/selector.html.esp (same but with dots instead a subfolder)
/apps/foo/selector.esp
/apps/foo/html.esp (not specific to the selector)
/apps/foo/foo.esp (not specific either)

In the "happy case" people would then just have those two scripts to
handle the above cases:

/apps/foo/foo.esp
/apps/foo/selector.esp

Protocol host content path selector(s) extension suffix  param(s)
http://myhost:products/product1.printable.a4.html/a/b?x=12     

1 comment :