|
Access
This hub houses a list of access methods. A visitor may
|
Method
obtain access using a browser, an editor like FrontPage, and/or
|
Hub
others methods including a &null;spider&null; (more commonly known
|
as a &null;robot&null;). The data about the access methods is derived
|
from a user agent field of the web log. The data can include
|
items like the operating system used, version of the
|
operating system used, and the hardware platform the
|
operating system is located on. Data about an access method
|
is recorded once for each kind of access method. Since the
|
data about each access method is unique, there is no history
|
to track. If the access method is not a robot or
|
a spider, the robot ID is set to &null;&null;1&null; (negative one) even
|
though that is considered text. If the access method is a
|
robot spider, the key is populated with a real ID string,
|
thereby defining the robot hub and the detail to house a
|
&null;&null;1&null; keyed robot with a name of none.
|
Cookie
This hub houses a key-value pair for each variable specified
|
Key
in a cookie. Generally, each visitor has their own cookie,
|
Pair Hub
assuming the program is properly written. Most browsers
|
commonly have a cookie feature turned on to allow tracking
|
of the visitors. As the visitor logs in, data is captured
|
including the username of each visitor, thereby tie the visitor
|
back to an actual person. Additional data about how long
|
the visitor stayed on the outside before logging in, and when
|
the visitor actually did log in can also be tracked. Since the
|
keys and values cannot be tracked with respect to changes
|
over time, this is a hub table and not a satellite.
|
Cookie
This table tracks each visitor to a specific set of cookie
|
Visitor
keys and values. The sequence ID identifies which order a
|
Link
particular cookie was in on the web log line. There is
|
one of these rows for each visitor and key-value
|
pair on the cookie line. The delimiter of the cookie is
|
also housed here.
|
Directory
This hub houses a list of unique paths to objects. Each
|
Hub
resource path that is unique receives a new directory ID. To
|
avoid recursive relationships (because directories are
|
hierarchical) directory names are separated, and sequence
|
ordering is accomplished in a child satellite.
|
Directory
This hub includes the structure breakdown of the directory.
|
Structure
Each directory is broken down into a series of directory
|
Hub
names. The order of each directory is provided by a sequence
|
ID. The base directory is always considered to be a structure
|
sequence 1. Typically directory names change, thereby
|
resulting in new entries to the structure. There
|
really is no good way to track the change of old directory
|
names to new directory names that ensure that each directory
|
name change is captured. However, by using a hub table
|
the old directory link which an object was in can be
|
tracked along with the new directory that the object is now in
|
by looking to see when activity stops on the old object and
|
starts on the new one.
|
Domain
This hub provides a list of domains organized by web server.
|
Hub
One web server may serve many domains. However a single
|
domain must exist only on one web server. Domains are
|
considered to be virtual by nature.
|
Dynamic
This table links the dynamic request (single web log line) to
|
Key Pair
a specific dynamic key-value pair set. The dynamic requests
|
Hub
can be search conditions, or clauses entered on a form, or
|
data needed to be passed to server objects. The sequence
|
ID in this table indicates the order on the web log line
|
in which the dynamic requests appear. A delimiter is also
|
stored here. The delimiter usually is consistent across
|
key-value pairs. This table is a hub because a new log
|
line with a different order, or different keys, generates new
|
surrogate keys in the child hub tables.
|
Geo
This table holds state, province, region, country, and continent
|
Location
data. Typically, states do not change names once assigned,
|
Hub
and the geographical location of states is static. This is a hub
|
of data because the geography is consistent over time.
|
IP Hub
This table houses a list of all the IP addresses. The IP
|
addresses are decoded to be integer based. Any IP address
|
used by any server, or by any client, is recorded in this table.
|
The first time an IP address is recorded, it is date and time
|
stamped. The string representation of the IP address is
|
also available for clarity and ease of use.
|
IP
This table links an IP address to a geographical location. The
|
Location
geographical location of an IP address does not change over
|
time outside of state boundaries based on the way IP addresses
|
work. Even with DHCP and dynamic assignment, an IP
|
address is confined to a specific city, or building. Therefore,
|
this is a hub of IP addresses linked to geographical
|
locations. This table includes the domain name as well, which
|
could change over time. However, tracking history data about
|
domain name changes is not required in all implementations.
|
Key Pair
This hub holds the key side of the key-value pair. In a
|
Hub
dynamic line issued to the server, or a cookie, the format is
|
usually: key&null;value<delimiter>key&null;value, etc. This hub is a
|
list of all of the keys found in a request, or in a cookie. The
|
key name is the business key, so changes to the name result in
|
a new entry. Thus, it is a hub table because changes to
|
the name over time cannot be tracked, therefore it cannot be a
|
satellite.
|
Object
This table houses context of local and overall objects. If a
|
Context
local object is housed, the context could be defined as a sub-
|
web (if sub-webs have been identified), if an overall object
|
is housed, it may be available to everyone.
|
Object
This table houses custom attributes that the loader of the data
|
Custom
vault wishes to include. The business key is the attribute code,
|
Attributes
followed by the attribute name or description. These attributes
|
Hub
are content about an object, which are preferably loaded by
|
the loader ahead of time. The loaded attributes are used to
|
describe objects. The user must load the object table from a
|
list created on their web server, and link it to custom
|
attribute codes.
|
Object
This table houses computed flags for each object. The
|
Flags
business rules for each object are determinant. An entry
|
page is any page that does not require a login, and can be
|
book marked. An internal page is any page that requires a
|
login to access. A search engine page is any page that
|
feeds the search engine on the site. A private page is one
|
used by internal access only, requiring access to the server
|
and not accessible through the web site. A secured page is one
|
sitting on an HTTPS or SSL layer, and a dynamic page is any
|
page with key-value pairs attached.
|
Object
This table holds the actual object itself. The object could be
|
Hub
a web page, a picture, a movie, or anything else that is
|
referenced. If the object has a web server ID of zero, it is
|
considered to be an external, or unknown web server (coming
|
from a referring page for instance). This table is created
|
dynamically for each object on the web log line, including
|
referring objects. As mentioned in the Object Custom
|
Attributes Hub section, this object table can be preloaded from
|
a web-server list of objects if the loader wants to specify their
|
own attribute codes and names to describe the object.
|
Object
This table houses the history of object details, such as flags,
|
Picture
and context. The latest picture, and past pictures of each are
|
kept here. The most recent or current picture is available by
|
performing a max function on the table's load date time
|
stamp, then directly matching the child tables that
|
house corresponding history or deltas.
|
Object
This table holds a series of user-defined text. This data is
|
Text Hub
preloaded like the Object Custom Attributes table. These
|
items allow further extension or definition of the object itself.
|
Since the text is the business key, tracking this text over time
|
is difficult. There is no indication of being provided old
|
and new text or changes to the business key, so tracking
|
changes over time is difficult.
|
Object
This table holds the object extension. For instance: .jpg, .gif,
|
Type Hub
.html, .xml, etc.
|
Request
This link table links a series of dynamic key-value pairs to
|
Dynamic
a requesting object in the request table above it. The sequence
|
Link
number orders the key-value pairs in the order they are seen
|
on the request line. If the order changes, or there is a new
|
request, new link records are generated. However, the
|
duplication of key-value pair data is alleviated.
|
Request
Each web log line is an actual request of an object by a visitor
|
Link
that may or may not have a cookie to identify themselves.
|
Each web log line has a potential referring object (where
|
it came from), and potentially a dynamic set of key-value
|
pairs requested, or referred from. With each web log line,
|
a new request record is built. This table grows rapidly, and
|
quite possibly records duplicate data (outside of the date time
|
stamp). The request link date time stamp is the field that is
|
generated from the web server itself to indicate when this
|
request was made against the server. Each request is filled
|
with data by the server such as status, time taken, method,
|
bytes sent and received. These statistics are the foundation
|
for aggregates such as session, total time, number of visits
|
versus number of hits, etc.
|
Request
This non-recursive table links the request line (which may
|
Referrer
have a referring object) to the referring object. If the referring
|
Dynamic
object has a dynamic set of key-value pairs, then they are
|
Link
linked here. Each web log line has one and only one requested
|
object, and one referring object. However, if there is no
|
referring object, the ID will be zero for the key-value
|
pair, which links to text to indicate NA values.
|
Robot
This table houses a predefined list of robots or spiders. The
|
Detail
source for a robot is external and defined by the W3C on its
|
web site. The data is massaged, and pre-loaded. The robot key
|
is the actual robot ID provided by the list of robots and
|
is a text string in all cases.
|
Robots
This is the hub or list of robot keys.
|
Hub
|
Robots
This table holds past and current historical pictures of each of
|
Picture
the robots.
|
Server
This table holds a list of sequenced attributes that are
|
Custom
customized by the user to house additional data about the
|
Attributes
server. There can be as many attributes as desired by the user.
|
Server
This holds the server hardware description including data
|
Hardware
about the amount of RAM, the number of CPUs, the vendor,
|
Vendor
and the model of the hardware.
|
Server
This table holds the list of web servers by IP Address. The IP
|
Hub
address is the only consistent attribute that (usually) does not
|
change once assigned.
|
Server
This table houses operating system data for the web server.
|
Operating
|
System
|
Server
This table holds both past and current historical pictures of
|
Picture
each of the satellite tables. The current picture is located by
|
obtaining the most recent date (i.e., the max date) from this
|
picture table, and then directly linking to the satellite tables
|
desired.
|
Server
This table houses historical data about the web server
|
Web
software, including the version, make, and vendor.
|
Software
|
Status
This table houses a list of status codes and descriptions that
|
Code Hub
can be fed back by the server for each request. The list
|
typically does not change over time, thereby allowing the
|
table to be built as a hub. If the list does change, however,
|
it does not matter because the history of this table does
|
not need to be tracked.
|
User Hub
This hub links users to visitors. If a cookie is provided with a
|
user login ID, then the visitor can be identified. This is a list
|
of user surrogate keys, typically pre-generated from
|
another system.
|
User Data
This table houses data about the user. If the surrogate keys
|
from another system have been used, this table need not
|
necessarily be implemented. When the surrogate keys from
|
another system are used, all that is necessary to identify each
|
user is their respective login ID. This table also can be
|
utilized to link the user data to geographical locations (if
|
available), which can thereby group the users across IP
|
addresses according to their geographical location, which in
|
turn demonstrates which domains and servers the users are
|
associated with.
|
User
This table holds the current picture of the user data. This table
|
Picture
is not necessary unless there is more than one satellite
|
hooked to the user hub. This table is included for
|
demonstrative purposes of the current picture, and holds all
|
the same necessities as described in the other picture tables.
|
Value Pair
This hub holds the value side of the key-value pairs mentioned
|
Hub
in the Key Pair Hub table description. The value side is
|
either entered into the form by a CGI script, or assigned to a
|
cookie key. Since the value side is itself a business key, the
|
Value Pair Hub is a hub table, and not a satellite.
|
Visitor
This table houses visitor objects. Each IP address is a visitor,
|
Hub
across a specific time period of requests. Without cookies it
|
is difficult to identify visitors. With cookies, each visitor
|
becomes unique and distinct, as long as there is a cookie per
|
visitor. Where a user login id is available, it will be matched
|
up to pull in user data. It will also link each visitor to the
|
cookie key-value pairs that they own.
|