Analysis settings

<< Click to Display Table of Contents >>

Navigation:  Reference > Log settings > Log wizard >

Analysis settings

You can specify here many settings allowing you to extract some useful information or ignore information that would have made your analysis more complicated.

 

Analysis settings list:

 

Tracking

Extract ASP.NET session ID

If you web site is based on ASP.NET it's possible that ASP.NET sessions are used to keep the state of the web application for the user. If not enabled for the whole web site sessions are usually used on specific sections where the internet user needs to interact with the web site (registration forms, download pages, buy pages...).

In order to make a session persist for an internet user ASP.NET stores a tracking number in a cookie.

Here is an example of a cookie field in a log line:

ARRAffinity=9290170f700b8dbc18a118f16c339292cfbf19b429769082d756ff7acd3f5145;+__RequestVerificationToken=lHOFPAsZn0KutZIbVftRdGxeUbpxKF5NzNhj6VoS8k8w-TvoEbK01wAxsE-JGGhkgKl40hF70xJw31pV3oQGPL0KG1SNrGusg2rLiEc0zas1;+ASP.NET_SessionId=od351je4glt4c00z0slgx2an;+.ASPXAUTH=893537475E417E2E9C502193408A75EE1EFD9E88D596C9573368D91422D5C3FC4D63CB2DE69B48994A759EF0A47A7ED74D7E8E8B0D3B3386A2C1DE1AB672FF96415A2241E4647C62F1F949824CFE5D4A54169D9A06290F6FB5E791E486CD6D1AAAAD97F175CEA1298F88D30EA2E5F173AC1C23367331A1D2E7206BD01DFDA7BE

You see in this example the cookie ASP.NET_SessionId with a value in it. If this setting is enabled the value will be automatically extracted in a new field named ASPSessionId.

You can see hereafter the result:

ASP.NET session ID extracted as a new field in the HttpLogBrowser

 

Extract PHP session ID

This is the equivalent of the ASP.NET sessions but for PHP. The name of the extracted field is PHPSessionId.

 

Extract Google Ads gclid

If you pay for Google Ads and you enable the auto-tagging feature for your Ad campaigns, a variable named gclid with a unique number is added to the web request when the internet user lands on your website.
Example of web request when the internet user lands on your web site after clicking on a Google ads:

http://www.yourwebsite.com/YourLandingPage?gclid=CPvGz7Tx8bkCFY6Y4Aod6zAAnQ

 

When this option is selected the value of the gclid query variable will be extracted in a new field. Then you will be able to get some statistics on how many internet users are sent to your web site by the ads. It is not a replacement of Google Analytics but it will allow you to do some independent verifications.

 

Extract Google Ads keywords

For some kind of Google ads you may see keywords in the referer when the internet users lands on your web site. These keywords are not entered by the internet user but are rather the keywords that triggered your Ad to be displayed on a specific web site.

 

Extract search engines keywords

Some search engines keep the keywords entered by the internet user in the referer.

Example for Bing:

http://www.bing.com/search?q=keyword

So if this setting is selected the query variable q in the referer will be extracted in a new field name SearchKeyWords.

Google doesn't provide this information. For now, Bing is the only supported search engine.

 

Group Google domains

Google uses a different extension for every country. This makes it difficult to quantify the web activity generated by Google landings with the RefererSite field. If you enable this setting: google..com, google.fr, google..de, google..co.uk, … will all be seen as google.com in the RefererSite field.

 

Azure

When you host your web site in an Azure web App you will see some specific information in the logs.

Remove X-ARR-LOG-ID query variable

You will see that systematically in every requests a variable named X-ARR-LOG-ID is included in the http request with a different value (a GUID).

Example of web request:

http://www.finalanalytics.com/tags/httplogbrowser?X-ARR-LOG-ID=357d26be-e5ee-4b20-a707-adaf48e39a07

The problem is that this variable is not generated nor used by your web site and if for any reason you want to do some statistics on specific http requests you will not be able to do it because of this random value.

So this setting allows you to remove this variable from the http request to remove this random noise.

 

Ignore always on request

In an Azure Web app if there is no activity the application pool will shut down after a while. Once this happened the next web request will be very slow until the web app starts up. In order to avoid this there is a setting named Always On in Azure that will keep the web application up by sending a request to the web application every 5 minutes. This web request will appear in the logs but is absolutely not interesting so this option allows you to discard these log lines.

 

URL normalization

In this section you can fix URLs that are not canonical meaning that you can access the same page with several different URLs. If the web site is already optimize for SEO you should not need these settings as it should not be possible to access the same content with different URLS.

Remove the trailing slash

Allows to remove the slash at the end of the URL so the URL without the slash and the URL with the slash will be considered as the same URL (http://www.mywebsite.com/myfolder/ will be considered as http://www.mywebsite.com/myfolder).

Remove default documents

If a default document is defined when you access a folder you can use this setting to remove the default document name at the end of the URL so the URL with the default document will be considered the same as just the URL to the folder (e.g. http://www.mywebsite.com/myfolder/index.htm will be considered as http://www.mywebsite.com/myfolder/)
 

Log analysis settings in the HttpLogBrowser

 

Exchange

When you access a mail box in MS Exchange it is done in many case through IIS. So IIS logs a very interesting to analyze the activity on Exchange servers.

Extract Active sync command and device

This is particularly true for mail box accesses done by smart phones with the ActiveSync protocol. In an ActiveSync http request you will always see some interesting variables in the query.

Example of ActiveSync http request:

/Microsoft-Server-ActiveSync/default.eas?User=UserName&DeviceId=8OVUHDT0G12LF5CMUDJR576RKC&DeviceType=iPhone&Cmd=Ping&CorrelationID=<empty>;&ClientId=PLGDHGKDWYBECDQ&cafeReqId=48a0d69d-a932-48c9-bc16-41a4e15d9ed1;

If you are for example analyzing the IIS logs to see from where a specific user is accessing his mailbox and with which devices this is useful information so this setting allows you to extract this information in new fields.

 

General

Extract browser and OS information from user agent

A web browser sends in each web request a string named the user agent string to the web server. This string includes some specific information allowing to determine the browser used by the internet user and the operating system on which is running the browser and in some case the kind of device.

Example of user agent string:

Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+rv:11.0)+like+Gecko

If this setting is selected the web browser (fields BrowserFamily and Browser), and the operating system (Fields OSFamily and OS) and in some case the kind of device (Field Device) will be extracted from this string. One important benefit of this is the ability to differentiate web traffic generated by bots from the traffic generated by humans. You can read the use case Remove bot entries to know how to do that.

Extract day of week and hour of day

This will add the field TimeOfWeek with the time position in the week when the web request took place. The unit is the day. With US regional settings Sunday is the first day of the week (e.g. 0.5 means Sunday 12 PM, 6.25 means Saturday 6 AM) and for many other country it is Monday (e.g. 0.5 means Monday 12 PM and 6.25 means Sunday 6 AM).

The setting will also add the field TimeOfDay with the time in the day when the web request took place. The unit is also the day (e.g. 0.5 means 12 AM,).

In histogram mode these fields will show a time span X axis (Days:Hour:min:Sec) in the distribution chart.

In Pie chart mode you will see the day in the week for the TimeOfWeek field and the hour in the day for the TimeOfDay field.

Resolve the host name of the client IP address

If this setting is enabled the program will do a reverse DNS look up for every IP address to find its host name. The host name is then verified by looking up its IP address. If the IP address matches, the host name is added in the field ClientHostName and the domain of the host name is added in the field ClientHostDomain.

The ClientHostName field is very useful to verify that a bot identified by the user agent string is really what he says (e.g. A GoogleBot is really coming from Google). You can read the use case Analyze the bot traffic about this.

Warning! This setting will slow down the log loading process because remote DNS servers need to be queried. If you want to use it intensively on big log files it is recommended to use the professional edition because it allows to query many DNS concurrently to speed up the process. Additional DNS servers can be configured in Preferences. The professional edition also allows to store parsed events with the cache mode or the database mode so only new log rows will require a DNS operation.

Lookup the country of the IP address

If you select this option a new field Country will be added with the country where the client IP address is located.

Translate cryptographic fields

If you enable the extended log fields mentioned in the article New IIS functionality to help identify weak TLS usage and you enable this option, numeric values in these fields will be converted to their textual representation to facilitate detection of weak TLS usage. Warning! You need to configure these extended log fields with exactly the same name as in the article otherwise the program will not detect them. The blog article Identify and forbid weak TLS usage in IIS explains how to do it in details.

Use UTC time

Allows to display the time of HTTP requests in UTC instead of local time. If you change this setting you need to reload the profile and if you use the cache mode or the database mode you need to clear first the database or cache in the profile configuration.

Limit period to the last N days

Suppose that the selected log files contain more than one year of logs it may takes some time to load all the data and you are maybe only interested in the last 30 days. So if you select this setting and specify 30 days, only the last 30 days of logs will be loaded even that more log files were selected.

Maximum number of log rows to load

The program will not load more than the specified number of rows to avoid consuming to much RAM. The default value is 1 million log rows. You can increase this number to 2 millions rows with the free edition. For the professional edition there is no limit but you need to have enough free RAM. About 1 GB of free RAM is needed for 1 million log rows.

When the limit is reached the older log rows will be missing.

Web site host name

You can optionally provide the host name of your web site. This allows using the Navigate to web page action in the Log rows grid context menu. If the Host field is already included in the logs the specified value here will not be used as the information is provided by the logs.

 

Log analysis settings general section in the HttpLogBrowser