|
Access
This hub houses a list of access methods. A visitor may obtain access
|
Method Hub
using a browser, an editor like FrontPage, and/or others methods
|
including a &null;spider&null; (more commonly known as a &null;robot&null;). The data
|
about the access methods is derived from a user agent field of the web
|
log. The data can include items like the operating system used, version
|
of the operating system used, and the hardware platform the operating
|
system is located on. Data about an access method is recorded once for
|
each kind of access method. Since the data about each access method is
|
unique, there is no history to track. If the access method is not a robot or
|
a spider, the robot ID is set to &null; &null; 1&null; (negative one) even though that is
|
considered text. If the access method is a robot spider, the key is
|
populated with a real ID string, thereby defining the robot hub and the
|
detail to house a &null;&null;1&null; keyed robot with a name of none.
|
Cookie Key
This hub houses a key-value pair for each variable specified in a cookie.
|
Pair Hub
Generally, each visitor has their own cookie, assuming the program is
|
properly written. Most browsers commonly have a cookie feature turned
|
on to allow tracking of the visitors. As the visitor logs in, data is
|
captured including the username of each visitor, thereby tying the visitor
|
back to an actual person. Additional data about how long the visitor
|
stayed on the outside before logging in, and when the visitor actually did
|
log in can also be tracked. Since the keys and values cannot be tracked
|
with respect to changes over time, this is a hub table and not a satellite.
|
Cookie
This table tracks each visitor to a specific set of cookie keys and values.
|
Visitor Link
The sequence ID identifies which order a particular cookie was in on the
|
web log line. There is one of these rows for each visitor and key-value
|
pair on the cookie line. The delimiter of the cookie is also housed here.
|
Directory
This hub houses a list of unique paths to objects. Each resource path that
|
Hub
is unique receives a new directory ID. To avoid recursive relationships
|
(because directories are hierarchical) directory names are separated, and
|
sequence ordering is accomplished in a child satellite.
|
Directory
This hub includes the structure breakdown of the directory. Each
|
Structure
directory is broken down into a series of directory names. The order of
|
Hub
each directory is provided by a sequence ID. The base directory is
|
always considered to be a structure sequence 1 . Typically directory
|
names change, thereby resulting in new entries to the structure. There
|
really is no good way to track the change of old directory names to new
|
directory names that ensure that each directory name change is captured.
|
However, by using a hub table the old directory link which an object was
|
in can be tracked along with the new directory that the object is now in
|
by looking to see when activity stops on the old object and starts on the
|
new one.
|
Domain
This hub provides a list of domains organized by web server. One web
|
Hub
server may serve many domains. However a single domain must exist
|
only on one web server. Domains are considered to be virtual by nature.
|
Dynamic
This table links the dynamic request (single web log line) to a specific
|
Key Pair
dynamic key-value pair set. The dynamic requests can be search
|
Hub
conditions, or clauses entered on a form, or data needed to be passed to
|
server objects. The sequence ID in this table indicates the order on the
|
web log line in which the dynamic requests appear. A delimiter is also
|
stored here. The delimiter usually is consistent across key-value pairs.
|
This table is a hub because a new log line with a different order, or
|
different keys, generates new surrogate keys in the child hub tables.
|
Geo
This table holds state, province, region, country, and continent data.
|
Location
Typically, states do not change names once assigned, and the
|
Hub
geographical location of states is static. This is a hub of data because the
|
geography is consistent over time.
|
IP Hub
This table houses a list of all the IP addresses. The IP addresses are
|
decoded to be integer based. Any IP address used by any server, or by
|
any client, is recorded in this table. The first time an IP address is
|
recorded, it is date and time stamped. The string representation of the IP
|
address is also available for clarity and ease of use.
|
IP Location
This table links an IP address to a geographical location. The
|
geographical location of an IP address does not change over time outside
|
of state boundaries based on the way IP addresses work. Even with
|
DHCP and dynamic assignment, an IP address is confined to a specific
|
city, or building. Therefore, this is a hub of IP addresses linked to
|
geographical locations. This table includes the domain name as well,
|
which could change over time. However, tracking history data about
|
domain name changes is not required in all implementations.
|
Key Pair
This hub holds the key side of the key-value pair. In a dynamic line
|
Hub
issued to the server, or a cookie, the format is usually:
|
key &null;value<delimiter>key &null;value, etc. This hub is a list of all of the keys
|
found in a request, or in a cookie. The key name is the business key, so
|
changes to the name result in a new entry. Thus, it is a hub table because
|
changes to the name over time cannot be tracked, therefore it cannot be a
|
satellite.
|
Object
This table houses context of local and overall objects. If a local object is
|
Context
housed, the context could be defined as a sub-web (if sub-webs have
|
been identified), if an overall object is housed, it may be available to
|
everyone.
|
Object
This table houses custom attributes that the loader of the data vault
|
Custom
wishes to include. The business key is the attribute code, followed by the
|
Attributes
attribute name or description. These attributes are content about an
|
Hub
object, which are preferably loaded by the loader ahead of time. The
|
loaded attributes are used to describe objects. The user must load the
|
object table from a list created on their web server, and link it to custom
|
attribute codes.
|
Object Flags
This table houses computed flags for each object. The business rules for
|
each object are determinant. An entry page is any page that does not
|
require a login, and can be book marked. An internal page is any page
|
that requires a login to access. A search engine page is any page that
|
feeds the search engine on the site. A private page is one used by
|
internal access only, requiring access to the server and not accessible
|
through the web site. A secured page is one sitting on an HTTPS or SSL
|
layer, and a dynamic page is any page with key-value pairs attached.
|
Object Hub
This table holds the actual object itself. The object could be a web page,
|
a picture, a movie, or anything else that is referenced. If the object has a
|
web server ID of zero, it is considered to be an external, or unknown web
|
server (coming from a referring page for instance). This table is created
|
dynamically for each object on the web log line, including referring
|
objects. As mentioned in the Object Custom Attributes Hub section, this
|
object table can be preloaded from a web-server list of objects if the
|
loader wants to specify their own attribute codes and names to describe
|
the object.
|
Object
This table houses the history of object details, such as flags, and context.
|
Picture
The latest picture, and past pictures of each are kept here. The most
|
recent or current picture is available by performing a max function on the
|
table's load date time stamp, then directly matching the child tables that
|
house corresponding history or deltas.
|
Object Text
This table holds a series of user-defined text. This data is preloaded like
|
Hub
the Object Custom Attributes table. These items allow further extension
|
or definition of the object itself. Since the text is the business key,
|
tracking this text over time is difficult. There is no indication of being
|
provided old and new text or changes to the business key, so tracking
|
changes over time is difficult.
|
Object Type
This table holds the object extension. For instance: .jpg, .gif, .html, .xml,
|
Hub
etc.
|
Request
This link table links a series of dynamic key-value pairs to a requesting
|
Dynamic
object in the request table above it. The sequence number orders the
|
Link
key-value pairs in the order they are seen on the request line. If the order
|
changes, or there is a new request, new link records are generated.
|
However, the duplication of key-value pair data is alleviated.
|
Request
Each web log line is an actual request of an object by a visitor that may
|
Link
or may not have a cookie to identify themselves. Each web log line has a
|
potential referring object (where it came from), and potentially a
|
dynamic set of key-value pairs requested, or referred from. With each
|
web log line, a new request record is built. This table grows rapidly, and
|
quite possibly records duplicate data (outside of the date time stamp).
|
The request link date time stamp is the field that is generated from the
|
web server itself to indicate when this request was made against the
|
server. Each request is filled with data by the server such as status, time
|
taken, method, bytes sent and received. These statistics are the
|
foundation for aggregates such as session, total time, number of visits
|
versus number of hits, etc.
|
Request
This non-recursive table links the request line (which may have a
|
Referrer
referring object) to the referring object. If the referring object has a
|
Dynamic
dynamic set of key-value pairs, then they are linked here. Each web log
|
Link
line has one and only one requested object, and one referring object.
|
However, if there is no referring object, the ID will be zero for the key-
|
value pair, which links to text to indicate NA values.
|
Robot
This table houses a predefined list of robots or spiders. The source for a
|
Detail
robot is external and defined by the W3C on its web site. The data is
|
massaged, and pre-loaded. The robot key is the actual robot ID provided
|
by the list of robots and is a text string in all cases.
|
Robots Hub
This is the hub or list of robot keys.
|
Robots
This table holds past and current historical pictures of each of the robots.
|
Picture
|
Server
This table holds a list of sequenced attributes that are customized by the
|
Custom
user to house additional data about the server. There can be as many
|
Attributes
attributes as desired by the user.
|
Server
This holds the server hardware description including data about the
|
Hardware
amount of RAM, the number of CPUs, the vendor, and the model of the
|
Vendor
hardware.
|
Server Hub
This table holds the list of web servers by IP Address. The IP address is
|
the only consistent attribute that (usually) does not change once assigned.
|
Server
This table houses operating system data for the web server.
|
Operating
|
System
|
Server
This table holds both past and current historical pictures of each of the
|
Picture
satellite tables. The current picture is located by obtaining the most
|
recent date (i.e., the max date) from this picture table, and then directly
|
linking to the satellite tables desired.
|
Server Web
This table houses historical data about the web server software, including
|
Software
the version, make, and vendor.
|
Status Code
This table houses a list of status codes and descriptions that can be fed
|
Hub
back by the server for each request. The list typically does not change
|
over time, thereby allowing the table to be built as a hub. If the list does
|
change, however, it does not matter because the history of this table does
|
not need to be tracked.
|
User Hub
This hub links users to visitors. If a cookie is provided with a user login
|
ID, then the visitor can be identified. This is a list of user surrogate keys,
|
typically pre-generated from another system.
|
User Data
This table houses data about the user. If the surrogate keys from another
|
system have been used, this table need not necessarily be implemented.
|
When the surrogate keys from another system are used, all that is
|
necessary to identify each user is their respective login ID. This table
|
also can be utilized to link the user data to geographical locations (if
|
available), which can thereby group the users across IP addresses
|
according to their geographical location, which in turn demonstrates
|
which domains and servers the users are associated with.
|
User Picture
This table holds the current picture of the user data. This table is not
|
necessary unless there is more than one satellite hooked to the user hub.
|
This table is included for demonstrative purposes of the current picture,
|
and holds all the same necessities as described in the other picture tables.
|
Value Pair
This hub holds the value side of the key-value pairs mentioned in the
|
Hub
Key Pair Hub table description. The value side is either entered into the
|
form by a CGI script, or assigned to a cookie key. Since the value side is
|
itself a business key, the Value Pair Hub is a hub table, and not a
|
satellite.
|
Visitor Hub
This table houses visitor objects. Each IP address is a visitor, across a
|
specific time period of requests. Without cookies it is difficult to identify
|
visitors. With cookies, each visitor becomes unique and distinct, as long
|
as there is a cookie per visitor. Where a user login id is available, it will
|
be matched up to pull in user data. It will also link each visitor to the
|
cookie key-value pairs that they own.
|