You can specify here many settings allowing you to extract some useful information or ignore information that would have made your analysis more complicated.
Analysis settings list:
If you web site is based on ASP.NET it's possible that ASP.NET sessions are used to keep the state of the web application for the user. If not enabled for the whole web site sessions are usually used on specific sections where the internet user needs to interact with the web site (registration forms, download pages, buy pages...).
In order to make a session persist for an internet user ASP.NET stores a tracking number in a cookie.
Here is an example of a cookie field in a log line:
You see in this example the cookie ASP.NET_SessionId with a value in it. If this setting is enabled the value will be automatically extracted in a new field named ASPSessionId.
You can see hereafter the result:
This is the equivalent of the ASP.NET sessions but for PHP. The name of the extracted field is PHPSessionId.
If you pay for Google Ads and you enable the auto-tagging feature for your Ad campaigns, a variable named gclid with a unique number is added to the web request when the internet user lands on your website.
Example of web request when the internet user lands on your web site after clicking on a Google ads:
When this option is selected the value of the gclid query variable will be extracted in a new field. Then you will be able to get some statistics on how many internet users are sent to your web site by the ads. It is not a replacement of Google Analytics but it will allow you to do some independent verifications.
For some kind of Google ads you may see keywords in the referer when the internet users lands on your web site. These keywords are not entered by the internet user but are rather the keywords that triggered your Ad to be displayed on a specific web site.
Some search engines keep the keywords entered by the internet user in the referer.
Example for Bing:
So if this setting is selected the query variable q in the referer will be extracted in a new field name SearchKeyWords.
Google doesn't provide this information. For now, Bing is the only supported search engine.
When you host your web site in an Azure web App you will see some specific information in the logs.
You will see that systematically in every requests a variable named X-ARR-LOG-ID is included in the http request with a different value (a GUID).
Example of web request:
The problem is that this variable is not generated nor used by your web site and if for any reason you want to do some statistics on specific http requests you will not be able to do it because of this random value.
So this setting allows you to remove this variable from the http request to remove this random noise.
In an Azure Web app if there is no activity the application pool will shut down after a while. Once this happened the next web request will be very slow until the web app starts up. In order to avoid this there is a setting named Always On in Azure that will keep the web application up by sending a request to the web application every 5 minutes. This web request will appear in the logs but is absolutely not interesting so this option allows you to discard these log lines.
In this section you can fix URLs that are not canonical meaning that you can access the same page with several different URLs. If the web site is already optimize for SEO you should not need these settings as it should not be possible to access the same content with different URLS.
Remove the trailing slash
Allows to remove the slash at the end of the URL so the URL without the slash and the URL with the slash will be considered as the same URL (http://www.mywebsite.com/myfolder/ will be considered as http://www.mywebsite.com/myfolder).
Remove default documents
If a default document is defined when you access a folder you can use this setting to remove the default document name at the end of the URL so the URL with the default document will be considered the same as just the URL to the folder (e.g. http://www.mywebsite.com/myfolder/index.htm will be considered as http://www.mywebsite.com/myfolder/)
When you access a mail box in MS Exchange it is done in many case through IIS. So IIS logs a very interesting to analyze the activity on Exchange servers.
This is particularly true for mail box accesses done by smart phones with the ActiveSync protocol. In an ActiveSync http request you will always see some interesting variables in the query.
Example of ActiveSync http request:
If you are for example analyzing the IIS logs to see from where a specific user is accessing his mailbox and with which devices this is useful information so this setting allows you to extract this information in new fields.
A web browser sends in each web request a string named the user agent string to the web server. This string includes some specific information allowing to determine the browser used by the internet user and the operating system on which is running the browser and in some case the kind of device.
Example of user agent string:
If this setting is selected the web browser (fields BrowserFamily and Browser), and the operating system (Fields OSFamily and OS) and in some case the kind of device (Field Device) will be extracted from this string. One important benefit of this is the ability to differentiate web traffic generated by bots from the traffic generated by humans. You can read the use case Remove bot entries to know how to do that.
Extract day of week and hour of day
This will add the field DayOfWeek with an integer between 0 and 6 indicating the day in the week when the web request took place. 0 indicates the first day of the week. With US regional settings Sunday is the first day of the week for many other country it is Monday.
The setting will also add the field HourOfDay with an integer between 0 and 23 with the hour in the day when the web request took place.
If this setting is enabled the program will do a reverse DNS look up for every IP address to find its host name. The host name is then verified by looking up its IP address. If the IP address matches, the host name is added in the field ClientHostName and the domain of the host name is added in the field ClientHostDomain.
The ClientHostName field is very useful to verify that a bot identified by the user agent string is really what he says (e.g. A GoogleBot is really coming from Google). You can read the use case Analyze the bot traffic about this.
Warning! This setting will slow down the log loading process because remote DNS servers need to be queried. If you want to use it intensively on big log files it is recommended to use the professional edition because it allows to query many DNS concurrently to speed up the process. Additional DNS servers can be configured in Preferences. The professional edition also allows to store parsed events with the cache mode or the database mode so only new log rows will require a DNS operation.
Lookup the country of the IP address
If you select this option a new field Country will be added with the country where the client IP address is located.
If you enable the extended log fields mentioned in the article New IIS functionality to help identify weak TLS usage and you enable this option, numeric values in these fields will be converted to their textual representation to facilitate detection of weak TLS usage. Warning! You need to configure these extended log fields with exactly the same name as in the article otherwise the program will not detect them. The blog article Identify and forbid weak TLS usage in IIS explains how to do it in details.
Suppose that the selected log files contain more than one year of logs it may takes some time to load all the data and you are maybe only interested in the last 30 days. So if you select this setting and specify 30 days, only the last 30 days of logs will be loaded even that more log files were selected.
The program will not load more than the specified number of rows to avoid consuming to much RAM. The default value is 1 million log rows. You can increase this number to 2 millions rows with the free edition. For the professional edition there is no limit but you need to have enough free RAM. About 1 GB of free RAM is needed for 1 million log rows.
When the limit is reached the older log rows will be missing.
You can optionally provide the host name of your web site. This allows using the Navigate to web page action in the Log rows grid context menu. If the Host field is already included in the logs the specified value here will not be used as the information is provided by the logs.