Web Pages Not Databases – Part 2: Fail2ban, Apache, IP Addresses, Linux, SELinux

23 08 2015

August 23, 2015 (Modified August 31, 2015)

(Back to the Previous Article in this Series)

I started using Linux in 1999, specifically Red Hat Linux 6.0, and I recall upgrading to Red Hat Linux 6.1 after downloading the files over a 56k modem – the good old days.  I was a little more wise when I upgraded to another release a couple of months later – I found a site on the Internet that offered Red Hat Linux CD sets for a couple of dollars.  In late 2001/early 2002 I picked up a very good book about creating Linux-based IPTables firewalls, so I set up a dual firewall setup (with a DMZ in between) using a couple of spare computers.  That setup worked great in a corporate environment for several years – I even upgraded the hardware in 2006 to inexpensive Dell PowerEdge servers and installed the latest version of Red Hat Linux (I believe Fedora 5).  I was excited about the potential capabilities of this free operating system, even going so far in 2004 to use it as the operating system for the primary file servers (Red Hat Enterprise Linux 3, if I remember correctly) in an effort to save a few thousand dollars in Microsoft licensing fees (it almost worked too).

F.A.I.L.S.?  I must have put those keywords in the blog article title for a reason, or maybe not.  In 2003 I tried setting up the Frees/wan VPN server on a spare Linux computer as an alternative to having to use a 28k/33k dial up modem connection.  It was around that time that I learned the dark side of Linux and the “free” software that could be installed.  I found an old message thread that I posted in 2003 related to Frees/wan where I mentioned that I spent in excess of 2.5 months trying to make this free VPN solution work correctly.  There were several how-to articles returned by a Google search, some of which were written for other Linux variants, others did not use X.509 certificates, and others almost worked.  Making matters worse, the Red Hat Linux kernel at the time did not support X.509 certificates, so I eventually ended up installing the Working Overloaded Linux Kernel.  I recall desperately looking for a program called Setup.exe that would just take care of the problem, but no such program was found.  A couple of months after I had Frees/wan working, a security compromise was reported in all products like Frees/wan, and the Frees/wan development had been abandoned.  I learned a very important lesson that “free” software may not be free software when you consider the time that it takes to implement and maintain the free software.  I also learned another important lesson – Linux how-to articles that are more than a couple of months old may be misleading or nearly useless; Linux articles that are written for one of the other 790 Linux Distributions may be just as misleading or useless; and not everything on the Internet in a hot-to article is true/correct (this article is no exception).

With that long introduction out of the way, I thought that I would share a couple of notes that I collected along the way when I setup Fedora 22 Linux as a server for a website that uses Apache and WordPress.  I have the headache inspiring SELinux enabled on the server, as well as the latest version of Fail2ban to temporarily block IP addresses used by the clowns on the Internet that want to make the Linux server running WordPress their new best friend.  So far, Fail2ban is working great, once the how-to articles that apply to Fedora 21 or Fedora 20 are ignored, although the current version does output apparently incorrect error messages when certain commands are executed:

[fedora 22]# fail2ban-client reload wordpress-login
ERROR  NOK: ('Cannot change database when there are jails present',)

Protecting Fedora 22 Linux with a Firewall

In one of the recent 17 Fedora releases, there was a transition from directly calling iptables commands in a script to using a command called firewall-cmd to accomplish the same task.  So, on Fedora 22 you should no longer execute commands like this:

iptables -t nat -A PREROUTING -i $INET_INTERFACE -p esp -j DNAT --to $VPN_IPADDR
 
iptables -A FORWARD -i $INET_INTERFACE -o $DMZ_INTERFACE -p udp --sport 4500 --dport 4500 -d $VPN_IPADDR -j ACCEPT
 
iptables -A FORWARD -i $INET_INTERFACE -o $DMZ_INTERFACE -p esp -j ACCEPT

Instead, with Fedora 22 the commands that are used to control the firewall have an entirely different syntax (allow access to port http 80, https port 443, ssh port 22, and ftp ports 20/21, remove access to FTP ports 20/21, and then reload and activate the changed rules):

firewall-cmd --set-default-zone=public 
 
firewall-cmd --permanent --zone=public --add-service=http 
 
firewall-cmd --permanent --zone=public --add-service=https 
 
firewall-cmd --permanent --zone=public --add-service=ssh
 
firewall-cmd --permanent --zone=public --add-service=ftp
 
firewall-cmd --permanent --zone=public --remove-service=ftp
 
firewall-cmd --reload

The changes do not take effect until the reload command is executed.  If you are planning to setup a publically accessible website, and you do not want the server to respond to ping requests and similar icmp requests, you might add a couple of additional firewall rules:

firewall-cmd --permanent --zone=public --add-icmp-block=destination-unreachable
firewall-cmd --permanent --zone=public --add-icmp-block=echo-reply
firewall-cmd --permanent --zone=public --add-icmp-block=echo-request
firewall-cmd --permanent --zone=public --add-icmp-block=parameter-problem
firewall-cmd --permanent --zone=public --add-icmp-block=redirect
firewall-cmd --permanent --zone=public --add-icmp-block=router-advertisement
firewall-cmd --permanent --zone=public --add-icmp-block=router-solicitation
firewall-cmd --permanent --zone=public --add-icmp-block=source-quench
firewall-cmd --permanent --zone=public --add-icmp-block=time-exceeded
firewall-cmd --reload

You might also decide to block certain web content spiders that mercilessly drain your server’s Internet bandwidth without returning any benefit to your website.  I noticed that the Baiduspider web crawler is a frequent offender, using several ranges of IP addresses.  I put an end to a large portion of the bandwidth drain from this web content spider with a simple firewall rule that blocks the IP address range 180.76.15.1 through 180.76.15.254 (don’t forget to reload after):

firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='180.76.15.0/24' reject"

Note that you may see a message similar to the following when attempting to execute the reload command:

Error: 'NoneType' object has no attribute 'query_rule'

If you see the above error message when trying to reload the firewall rules, just shout “free Linux software” five times and execute this command to restart the firewall – this command should have the same end effect as the reload command, except that this command works:

systemctl restart firewalld

Now, assume that you have setup Fail2ban’s ssh jail.  After a couple of hours you have received over 200 emails from Fail2ban telling you that it has blocked 200+ computers wanting to be best ssh friends with your server.  Obviously, you skipped the step of setting up a different port for ssh.  Modify the sshd config file (if you forgot the basic vi commands: press i to be able to make changes in the file, Esc ZZ to save the changes and exit, Esc :q! to quit without saving changes):

vi /etc/ssh/sshd_config

Assume that you want to change the ssh port from 22 to 1492 (something about sailing the ocean blue?).  Below the #Port 22 heading, add:

Port 1492

Then save the file and exit vi.  Since SELinux is enabled, we need to instruct SELinux to behave correctly when an ssh client attaches to port 1492:

semanage port -a -t ssh_port_t -p tcp 1492

Note: Using the semanage command requires another package to be installed first:

dnf install policycoreutils-python

Note 2: If you think that SELinux is blocking something that should not be blocked, SELinux may be temporarily disabled with this command:

setenforce 0

To re-enable SELinux, either reboot the server or execute this command:

setenforce 1

Next, we need to add a firewall rule to permit connections on port 1492, and reload the firewall rules (note that I am using the command to restart the firewall daemon instead due to the error that appeared with the reload command):

firewall-cmd --permanent --zone=public  --add-port=1492/tcp
systemctl restart firewalld

As a final verification, make certain that the Linux firewall and SELinux recognize the new port:

firewall-cmd --list-ports
semanage port -l | grep ssh

If there are no apparent problems with the above output, restart the ssh daemon:

systemctl reload sshd.service

You may also wish to confirm which services are enabled for the Linux firewall:

firewall-cmd --list-services

Beating on a Linux box that lacks a monitor and keyboard is only so much fun (that old reboot joke, I guess).  If you have a Windows computer handy, the free Putty program will allow access to the ssh interface on the Linux server.  WinSCP is a helpful utility that provides Windows Explorer-like views through the ssh interface on the Linux server.

Protecting Fedora 22 Linux with Fail2ban

Fail2ban is a utility that monitors various log files on the server, looking for unexpected activity that typically originates from another computer on the network or on the Internet.  Fail2ban may be setup to take various actions when a problem is noticed, such as the same IP address failing to connect to SSH 10 times in 15 minutes.  The action may be to send an email to an administrator and/or to configure a firewall rule that temporarily blocks the offender’s IP address.  There are a few how-to articles found through Google searches that describe how to install and configure Fail2ban.  Shockingly (not really), some of those articles are more than a couple of months old (so the articles may not work with Fedora 22) and/or instruct people to modify files that explicitly state in the header:

# YOU SHOULD NOT MODIFY THIS FILE.

What to do?  What to do?

If you have not done so recently, make certain that the installed Fedora packages are up to date (dfn… another new command, what happened to the rpm command?):

dnf update

If the Apache web server is running on the server, there is a good chance that you execute commands similar to the following at some time in the past:

dnf install httpd
systemctl start httpd.service
systemctl enable httpd.service

Fail2ban is able to send emails using Sendmail, so if Sendmail is not installed, consider installing it:

dnf install sendmail
systemctl start sendmail
systemctl enable sendmail

While not directly applying to Fail2ban, SELinux, by default, blocks Apache from using Sendmail.  It is possible to verify that this is the case, and remove the restriction with these two commands:

sestatus -b | grep -i sendmail
setsebool -P httpd_can_sendmail 1

With Sendmail installed and running, we are able to proceed with the Fail2ban installation and configuration:

dnf install fail2ban ipset
dnf install whois fail2ban-sendmail
systemctl start fail2ban
systemctl enable fail2ban

The configuration file for Fail2ban that should be modified is /etc/fail2ban/jail.d/local.conf – but that file does not exist after installation.  The local.conf file references files in the /etc/fail2ban/filter.d/ directory that tell Fail2ban how to read the various log files and recognize problems using regular expressions (they look pretty irregular to me, but then I have not done much with regular expressions since that Turbo Pascal programming class years ago).  A starting point for the local.conf file with Fedora 22 and Sendmail, blocking ssh connection requests after a few incorrect login attempts from the same IP address within an hour, would look like the following (replace my.IP.address.here with your IP address so that Fail2ban will ignore your incorrect login attempts):

[DEFAULT]
bantime = 2592000
banaction = firewallcmd-ipset
backend = systemd
sender = emailaddress1@mydomain.com
destemail = emailaddress2@mydomain.com
action = %(action_mwl)s
ignoreip = 127.0.0.1 my.IP.address.here
 
[sshd]
enabled = true
findtime = 3600

The settings listed under the [DEFAULT] heading apply to all of the other sections in this file, unless those settings are also mentioned in the other sections of the file.  For example, the bantime (number of seconds to block an IP address) applies to the [sshd] section, as does the backend = systemd setting.  If we want Fail2ban to help protect WordPress, we will want Fail2ban to monitor a variety of log files, which cannot be done with the backend = systemd setting, so that setting will need to be modified in other sections for the file.  [sshd] describes the sshd jail, so we will need to select logical names for the sections of the file that will be added later.  The sshd jail was not defined (actually, not enabled – it is defined in another configuration file) when Fail2ban was first started, so we need to let Fail2ban know that it should load/reload the sshd jail configuration, and then verify that the jail is functional:

fail2ban-client reload sshd
fail2ban-client status sshd

If you wait a couple of minutes between executing the first of the above and second of the above commands, you may see output similar to this, which indicates that some candidates for blocking were identified and blocked, and a notification email was sent to the email address specified by the destemail setting:

Status for the jail: sshd
|- Filter
|  |- Currently failed: 0
|  |- Total failed:     0
|  `- Journal matches:  _SYSTEMD_UNIT=sshd.service + _COMM=sshd
`- Actions
   |- Currently banned: 307
   |- Total banned:     307
   `- Banned IP list:   1.215.253.186 101.78.2.106 103.15.61.138 103.224.105.7 103.248.234.3 103.253.211.244 ...

Protecting WordPress running on Fedora 22 with Fail2ban.

When an attempt is made to access the password protected /wp-admin section of a WordPress site, and a bad password is entered, by default WordPress silently destroys that failed connection attempt, so Fail2ban is not able to help by blocking repeat offenders.  A partial solution that I found on several websites is to add the following code near the start of the WordPress theme’s functions.php file:

add_action('wp_login_failed', 'log_wp_login_fail'); // hook failed login
function log_wp_login_fail($username) {
        error_log("WP login failed for username: $username");

Once that code is in place, some of the bad login attempts will be written to either the /var/log/httpd/error_log or /var/log/httpd/ssl_error_log file.  You might then start seeing errors such as these buried in those files:

[Thu Aug 13 10:17:43.578391 2015] [auth_basic:error] [pid 30933] [client 75.145.nnn.nnn:50683] AH01618: user admin not found: /wp-admin/css/login.min.css, referer: http://www.websitehere.com/wp-login.php
[Thu Aug 13 19:12:53.054913 2015] [:error] [pid 2060] [client 50.62.136.183:33789] WP login failed for username: k-mm
[Thu Aug 13 20:13:02.316777 2015] [:error] [pid 1873] [client 50.62.136.183:42677] WP login failed for username: k-mm
[Thu Aug 13 21:13:12.012160 2015] [:error] [pid 15701] [client 50.62.136.183:52432] WP login failed for username: k-mm.com
[Thu Aug 13 21:28:32.073261 2015] [:error] [pid 15697] [client 50.62.136.183:58571] WP login failed for username: k-mm.com
[Thu Aug 13 21:58:43.118303 2015] [:error] [pid 21245] [client 50.62.136.183:52059] WP login failed for username: k-mm.com
[Thu Aug 13 22:03:49.150456 2015] [:error] [pid 21244] [client 50.62.136.183:60540] WP login failed for username: k-mm.com
[Thu Aug 13 22:23:28.348351 2015] [:error] [pid 15688] [client 50.62.136.183:52911] WP login failed for username: k-mm.com
[Thu Aug 13 23:14:14.453002 2015] [:error] [pid 19632] [client 50.62.136.183:37700] WP login failed for username: admin
[Fri Aug 14 01:14:15.455095 2015] [:error] [pid 5085] [client 50.62.136.183:45656] WP login failed for username: administrator
[Fri Aug 14 02:14:16.478660 2015] [:error] [pid 4114] [client 50.62.136.183:53068] WP login failed for username: administrator

In the above, note the behavior of the computer at IP address 50.62.136.183 – that computer is slowly hitting the server with different username and password combination – slow so as not to set off blocking utilities like Fail2ban that might be configured to start blocking when there have been, for instance, five bad password attempt in an hour.  Note that I stated that the addition to the theme’s functions.php file would help to identify some of the bad login attempts – to see the others, the /var/log/httpd/access_log and /var/log/httpd/ssl_access_log files must also be monitored.  In those files you may see patterns such as these where a single IP address will try to rapidly and repeatedly post to the /wp-login.php file for more than eight hours straight:

85.97.41.164 - - [12/Aug/2015:17:17:34 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:35 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:36 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:37 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:38 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:38 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:40 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:42 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
85.97.41.164 - - [12/Aug/2015:17:17:43 -0400] "POST /wp-login.php HTTP/1.1" 200 1628 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
...
109.228.0.250 - - [13/Aug/2015:01:42:43 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"
109.228.0.250 - - [13/Aug/2015:01:42:48 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"
109.228.0.250 - - [13/Aug/2015:01:42:49 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"
109.228.0.250 - - [13/Aug/2015:01:42:50 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"
109.228.0.250 - - [13/Aug/2015:01:42:56 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"
109.228.0.250 - - [13/Aug/2015:01:42:56 -0400] "POST /wp-login.php HTTP/1.0" 403 3030 "-" "-"

Obviously, the computers at those IP addresses were up to no good, and should also be blocked.  Another interesting pattern that might be seen in the access_log or ssl_access_log files is an attacker trying to retrieve the login of the first author username in WordPress, working slowly to try logging into the website so as not to trip protection utilities like Fail2ban that identify multiple failed logins from the same IP address in a short period of time:

185.93.187.69 - - [20/Aug/2015:00:38:16 -0400] "GET /?author=1 HTTP/1.1" 302 -
185.93.187.69 - - [20/Aug/2015:00:38:20 -0400] "GET /wp-login.php HTTP/1.1" 403 221
185.93.187.69 - - [20/Aug/2015:00:58:35 -0400] "GET /?author=1 HTTP/1.1" 302 -
185.93.187.69 - - [20/Aug/2015:00:58:37 -0400] "GET /wp-login.php HTTP/1.1" 403 221
185.93.187.69 - - [20/Aug/2015:01:19:20 -0400] "GET /?author=1 HTTP/1.1" 302 -
185.93.187.69 - - [20/Aug/2015:01:19:22 -0400] "GET /wp-login.php HTTP/1.1" 403 221
185.93.187.69 - - [20/Aug/2015:01:39:45 -0400] "GET /?author=1 HTTP/1.1" 302 -
185.93.187.69 - - [20/Aug/2015:01:39:46 -0400] "GET /wp-login.php HTTP/1.1" 403 221
185.93.187.69 - - [20/Aug/2015:01:59:59 -0400] "GET /?author=1 HTTP/1.1" 302 -
185.93.187.69 - - [20/Aug/2015:02:00:00 -0400] "GET /wp-login.php HTTP/1.1" 403 221

You might also see something like this in the access_log or ssl_access_log file:

220.163.10.250 - - [17/Aug/2015:21:03:43 -0400] "DELETE / HTTP/1.1" 400 226

I strongly suspect that the computer at IP address 220.163.10.250 had other uses in mind for my website.  From the documentation:

“The DELETE method requests that the origin server delete the resource identified by the Request-URI. This method MAY be overridden by human intervention (or other means) on the origin server. The client cannot be guaranteed that the operation has been carried out, even if the status code returned from the origin server indicates that the action has been completed successfully. However, the server SHOULD NOT indicate success unless, at the time the response is given, it intends to delete the resource or move it to an inaccessible location. “

A quick method to determine if a potential attacker tried to use the above DELETE request is to use the grep command to search within the ssl_access_log and access_log files:

grep "DELETE" /var/log/httpd/ssl_access_log*
grep "DELETE" /var/log/httpd/access_log*

Another set of attempted compromises that is not directed at WordPress sites are also visible in the ssl_access_log and access_log files:

162.246.61.20 - - [29/Jul/2015:02:13:11 -0400] "GET /cgi-bin/php HTTP/1.1" 404 209 "-" "-"
162.246.61.20 - - [29/Jul/2015:02:13:11 -0400] "GET /cgi-bin/php5 HTTP/1.1" 404 210 "-" "-"
162.246.61.20 - - [29/Jul/2015:02:13:11 -0400] "GET /cgi-bin/php-cgi HTTP/1.1" 404 213 "-" "-"
162.246.61.20 - - [29/Jul/2015:02:13:11 -0400] "GET /cgi-bin/php.cgi HTTP/1.1" 404 213 "-" "-"
162.246.61.20 - - [29/Jul/2015:02:13:11 -0400] "GET /cgi-bin/php4 HTTP/1.1" 404 210 "-" "-"
195.145.157.189 - - [30/Jul/2015:12:07:38 -0400] "GET /cgi-bin/test-cgi HTTP/1.1" 404 214 "-" "the beast"
37.144.20.31 - - [01/Aug/2015:09:34:10 -0400] "GET /tmUnblock.cgi HTTP/1.1" 400 226 "-" "-"
69.64.46.86 - - [03/Aug/2015:01:48:28 -0400] "GET /cgi-bin/rtpd.cgi HTTP/1.0" 404 214 "-" "-"
69.64.46.86 - - [14/Aug/2015:01:24:35 -0400] "GET /cgi-bin/rtpd.cgi HTTP/1.0" 404 214 "-" "-"
23:46.148.18.122 - - [16/Aug/2015:20:30:17 -0400] "GET /tmUnblock.cgi HTTP/1.1" 403 - "-" "-"
23:46.148.18.122 - - [16/Aug/2015:20:30:17 -0400] "GET /hndUnblock.cgi HTTP/1.1" 403 - "-" "-"
88.202.224.162 - - [23/Aug/2015:07:05:15 -0400] "GET //cgi-bin/webcm?getpage=../html/menus/menu2.html&var:lang=%26%20allcfgconv%20-C%20voip%20-c%20-o%20-%20../../../../../var/tmp/voip.cfg%20%2 HTTP/1.1" 404 211
80.82.65.186 - - [01/Aug/2015:08:42:51 -0400] "GET //cgi-bin/webcm?getpage=../html/menus/menu2.html&var:lang=%26%20allcfgconv%20-C%20voip%20-c%20-o%20-%20../../../../../var/tmp/voip.cfg%20%26 HTTP/1.1" 404 211
46.165.220.215 - - [16/Aug/2015:20:51:51 -0400] "GET /cgi-bin/webcm?getpage=../html/menus/menu2.html&var:lang=%26%20allcfgconv%20-C%20voip%20-c%20-o%20-%20../../../../../var/tmp/voip.cfg%20%26 HTTP/1.1" 404 211
46.165.220.215 - - [17/Aug/2015:03:09:59 -0400] "GET /cgi-bin/webcm?getpage=../html/menus/menu2.html&var:lang=%26%20allcfgconv%20-C%20voip%20-c%20-o%20-%20../../../../../var/tmp/voip.cfg%20%26 HTTP/1.1" 404 211

If any of the above appear to be interesting, you might try a Google search to see what the remote computers were attempting to compromise.

Far less obnoxious are entries that show your Nagios monitoring utility checking the website availability:

50.196.nnn.nnn - - [19/Aug/2015:09:30:54 -0400] "GET / HTTP/1.1" 200 57465 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"
50.196.nnn.nnn - - [19/Aug/2015:09:31:07 -0400] "GET / HTTP/1.1" 200 57465 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"
50.196.nnn.nnn - - [19/Aug/2015:09:31:42 -0400] "GET / HTTP/1.1" 200 57465 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"
50.196.nnn.nnn - - [19/Aug/2015:09:31:47 -0400] "GET / HTTP/1.1" 200 57465 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"

As well as random computers trying to download a file named wpad.dat (in the webserver root directory execute touch wpad.dat to create a zero byte file for that name – this is important if your client computers should not be trying to retrieve such a file and you have a custom error page configured for the website that is a feature rich web page).  There is a chance that your client computers could be searching for this file due to a specific configuration setting:

WebNotDatabaseWPAD

Example output, showing repeated requests, is shown below:

76.29.115.160 - - [20/Aug/2015:02:07:40 -0400] "GET /wpad.dat HTTP/1.1" 200 - "-" "-"
76.29.115.160 - - [20/Aug/2015:02:07:46 -0400] "GET /wpad.dat HTTP/1.1" 200 - "-" "-"
76.29.115.160 - - [20/Aug/2015:02:08:03 -0400] "GET /wpad.dat HTTP/1.1" 200 - "-" "-"
76.29.115.160 - - [20/Aug/2015:02:08:14 -0400] "GET /wpad.dat HTTP/1.1" 200 - "-" "-"

Regular Expression Building Assistance:

If we intend to have Fail2ban help protect WordPress running on Apache on Fedora 22 Linux, we need to first create “filter” files that contain the regular expressions needed to recognize bad guy attempted access.  The filter files are located in the /etc/fail2ban/filter.d/ directory and all end with .conf, although the .conf portion of the filename is not specified in the /etc/fail2ban/jail.d/local.conf file that we created earlier.  I will create separate filter files for ssl and non-ssl log files, although that is not required.  The first filter file is apache-wp-login.conf:

vi /etc/fail2ban/filter.d/apache-wp-login.conf

I set that file to have four regular expressions to recognize a bad guy’s attempted access (one or two of the regular expressions below may be incorrect because I have not had enough recent practice at writing regular expressions):

[Definition]
failregex = [[]client <HOST>[]] WP login failed.*
            [[]client <HOST>[]] client denied.*wp-login.php
            .*\[auth_basic:error\] \[pid.*\] \[client <HOST>.*?
            .*\[:error\] \[pid.*\] \[client .*?(?P<host>\S+):\d+\] WP login failed.*
ignoreregex =

Save the file and exit vi.  Verification of the regular expression syntax is important.  The fail2ban-regex utility will process a Linux log file of your choice using one of the regular expression filters that you create in the /etc/fail2ban/filter.d/ directory.  For example, to test the filter than was created above, execute the following command:

fail2ban-regex --print-all-matched /var/log/httpd/error_log /etc/fail2ban/filter.d/apache-wp-login.conf

Your output may be similar to what appears below (note that I processed an error_log from a previous week:

Running tests
=============
 
Use   failregex filter file : apache-wp-login, basedir: /etc/fail2ban
Use         log file : /var/log/httpd/error_log-20150816
Use         encoding : UTF-8
 
 
Results
=======
 
Failregex: 40 total
|-  #) [# of hits] regular expression
|   3) [26] .*\[auth_basic:error\] \[pid.*\] \[client <HOST>.*?
|   4) [14] .*\[:error\] \[pid.*\] \[client .*?(?P<host>\S+):\d+\] WP login failed.*
`-
 
Ignoreregex: 0 total
 
Date template hits:
|- [# of hits] date format
|  [140] (?:DAY )?MON Day 24hour:Minute:Second(?:\.Microseconds)?(?: Year)?
`-
 
Lines: 144 lines, 0 ignored, 40 matched, 104 missed [processed in 0.24 sec]
|- Matched line(s):
...
|  [Thu Aug 13 22:03:49.150456 2015] [:error] [pid 21244] [client 50.62.136.183:60540] WP login failed for username: k-mm.com
|  [Thu Aug 13 22:23:28.348351 2015] [:error] [pid 15688] [client 50.62.136.183:52911] WP login failed for username: k-mm.com
|  [Thu Aug 13 23:14:14.453002 2015] [:error] [pid 19632] [client 50.62.136.183:37700] WP login failed for username: admin
|  [Fri Aug 14 01:14:15.455095 2015] [:error] [pid 5085] [client 50.62.136.183:45656] WP login failed for username: administrator
|  [Fri Aug 14 02:14:16.478660 2015] [:error] [pid 4114] [client 50.62.136.183:53068] WP login failed for username: administrator
|  [Fri Aug 14 13:02:10.181252 2015] [auth_basic:error] [pid 30239] [client 75.145.nnn.nnn:54787] AH01618: user test not found: /wp-admin/css/login.min.css, referer: http://www.mydomain.com/wp-login.php
|  [Fri Aug 14 13:02:12.819515 2015] [auth_basic:error] [pid 30239] [client 75.145.nnn.nnn:54787] AH01618: user test not found: /wp-admin/css/login.min.css, referer: http://www.mydomain.com/wp-login.php
|  [Fri Aug 14 13:02:14.880515 2015] [auth_basic:error] [pid 30239] [client 75.145.nnn.nnn:54787] AH01618: user test not found: /wp-admin/css/login.min.css, referer: http://www.mydomain.com/wp-login.php
|  [Fri Aug 14 13:02:29.497034 2015] [:error] [pid 3357] [client 75.145.nnn.nnn:54798] WP login failed for username: k-mm, referer: http://www.mydomain.com/wp-login.php
|  [Fri Aug 14 13:02:29.531482 2015] [auth_basic:error] [pid 3357] [client 75.145.nnn.nnn:54798] AH01618: user test not found: /wp-admin/css/login.min.css, referer: http://www.mydomain.com/wp-login.php
...

The /etc/fail2ban/filter.d/apache-wp-login-ssl.conf filter file that I created is identical to the /etc/fail2ban/filter.d/apache-wp-login.conf file:

[Definition]
failregex = [[]client <HOST>[]] WP login failed.*
            [[]client <HOST>[]] client denied.*wp-login.php
            .*\[auth_basic:error\] \[pid.*\] \[client <HOST>.*?
            .*\[:error\] \[pid.*\] \[client .*?(?P<host>\S+):\d+\] WP login failed.*
ignoreregex =

After saving the file and exiting vi, we are able to test the filter:

fail2ban-regex --print-all-matched /var/log/httpd/ssl_error_log /etc/fail2ban/filter.d/apache-wp-login-ssl.conf

The wordpress-login.conf and wordpress-login-ssl.conf filter files will be used to examine the /var/log/httpd/access_log and /var/log/httpd/ssl_access_log files, respectively.

The /etc/fail2ban/filter.d/wordpress-login.conf file (note once again that one or two of the regular expressions used for matching may need to be adjusted):

[Definition]
failregex = ^<HOST> .* "POST .*wp-login.php HTTP.1.*" 403
            ^<HOST> .* "POST .*wp-login.php HTTP.1.*" 200
            ^<HOST> .* "GET .*wp-login.php HTTP/1.*" 403 221
            ^<HOST> .* "GET ..author=1 HTTP/1.*" 302 -
ignoreregex =

The /etc/fail2ban/filter.d/wordpress-login-ssl.conf file:

[Definition]
failregex = ^<HOST> .* "POST .*wp-login.php HTTP.1.*" 403
            ^<HOST> .* "POST .*wp-login.php HTTP.1.*" 200
            ^<HOST> .* "GET .*wp-login.php HTTP/1.*" 403 221
            ^<HOST> .* "GET ..author=1 HTTP/1.*" 302 -

To test those two filters, use these commands:

fail2ban-regex --print-all-matched /var/log/httpd/access_log /etc/fail2ban/filter.d/wordpress-login.conf
fail2ban-regex --print-all-matched /var/log/httpd/ssl_access_log /etc/fail2ban/filter.d/wordpress-login-ssl.conf

Added August 31, 2015:

I have found that a couple of computers on the Internet are trying to access a variety of *.cgi files in rapid fashion, resulting in entries such as these being written to the /var/log/httpd/error_log file:

[Sun Aug 30 20:38:08.187093 2015] [cgi:error] [pid 6426] [client 64.15.155.177:53122] AH02811: script not found or unable to stat: /var/www/cgi-bin/webmap.cgi
[Sun Aug 30 20:38:08.271430 2015] [cgi:error] [pid 6230] [client 64.15.155.177:53316] AH02811: script not found or unable to stat: /var/www/cgi-bin/whois.cgi
[Sun Aug 30 20:38:08.599455 2015] [cgi:error] [pid 6094] [client 64.15.155.177:54035] AH02811: script not found or unable to stat: /var/www/cgi-bin/register.cgi
[Sun Aug 30 20:38:08.733852 2015] [cgi:error] [pid 6453] [client 64.15.155.177:54213] AH02811: script not found or unable to stat: /var/www/cgi-bin/download.cgi
[Sun Aug 30 20:38:09.048479 2015] [cgi:error] [pid 5353] [client 64.15.155.177:54516] AH02811: script not found or unable to stat: /var/www/cgi-bin/shop.cgi
[Sun Aug 30 20:38:09.533326 2015] [cgi:error] [pid 5673] [client 64.15.155.177:56107] AH02811: script not found or unable to stat: /var/www/cgi-bin/profile.cgi
[Sun Aug 30 20:38:09.736446 2015] [cgi:error] [pid 6455] [client 64.15.155.177:56274] AH02811: script not found or unable to stat: /var/www/cgi-bin/about_us.cgi
[Sun Aug 30 20:38:09.830315 2015] [cgi:error] [pid 6456] [client 64.15.155.177:56734] AH02811: script not found or unable to stat: /var/www/cgi-bin/php.fcgi
[Sun Aug 30 20:38:09.918823 2015] [cgi:error] [pid 4232] [client 64.15.155.177:56923] AH02811: script not found or unable to stat: /var/www/cgi-bin/calendar.cgi
[Sun Aug 30 20:38:10.013162 2015] [cgi:error] [pid 6423] [client 64.15.155.177:57115] AH02811: script not found or unable to stat: /var/www/cgi-bin/download.cgi
[Sun Aug 30 20:38:10.106597 2015] [cgi:error] [pid 6425] [client 64.15.155.177:57399] AH02811: script not found or unable to stat: /var/www/cgi-bin/light_board.cgi
[Sun Aug 30 20:38:10.193901 2015] [cgi:error] [pid 6426] [client 64.15.155.177:57574] AH02811: script not found or unable to stat: /var/www/cgi-bin/main.cgi
[Sun Aug 30 20:38:10.288724 2015] [cgi:error] [pid 6230] [client 64.15.155.177:57754] AH02811: script not found or unable to stat: /var/www/cgi-bin/search.cgi
[Sun Aug 30 20:38:10.516842 2015] [cgi:error] [pid 5349] [client 64.15.155.177:57949] AH02811: script not found or unable to stat: /var/www/cgi-bin/test.cgi
[Sun Aug 30 20:38:10.601953 2015] [cgi:error] [pid 6094] [client 64.15.155.177:58409] AH02811: script not found or unable to stat: /var/www/cgi-bin/file_up.cgi

If you have Fail2ban running on the webserver, and you are seeing entries like the above in the error_log file, consider creating a file named /etc/fail2ban/filter.d/apache-cgi-bin.conf with the following contents:

[Definition]
failregex   = ^.*\[cgi:error\] \[pid.*\] \[client .*?(?P<host>\S+):\d+\] AH02811: script not found or unable to stat: \/var\/www\/cgi-bin.*$
ignoreregex =

To test the above filter definition, execute this command:

fail2ban-regex --print-all-matched /var/log/httpd/error_log /etc/fail2ban/filter.d/apache-cgi-bin.conf

(Note that the steps that follow assume that the local.conf file has already been created, see the steps below.)  To set up the jail that uses the above filter, in the /etc/fail2ban/jail.d/local.conf file, you would then add the following lines, which will setup blocking when a search locates five or more matching entries from the same IP address within two days:

[apache-cgi-bin]
enabled  = true
filter   = apache-cgi-bin
logpath  = /var/log/httpd/error_log
bantime  = 2592000
findtime = 172800
port     = http,https
maxretry = 5
backend  = polling
journalmatch =

To activate the jail, execute:

fail2ban-client reload apache-cgi-bin

To see the jail status, execute:

fail2ban-client status apache-cgi-bin

Below is sample output for the above command:

Status for the jail: apache-cgi-bin
|- Filter
|  |- Currently failed: 1
|  |- Total failed:     111
|  `- File list:        /var/log/httpd/error_log
`- Actions
   |- Currently banned: 4
   |- Total banned:     4
   `- Banned IP list:   118.219.233.133 27.254.67.157 118.163.223.214 64.15.155.177

For Fail2ban to use the filters that were just created, we must add additional lines (jail descriptions) to the /etc/fail2ban/jail.d/local.conf file:

vi /etc/fail2ban/jail.d/local.conf

At the end of the file add the following four jail definitions (note that without the backend and journalmatch lines the jails will not work due to the settings in the [DEFAULT] section of this file):

[apache-wp-login]
enabled = true
filter   = apache-wp-login
logpath  = /var/log/httpd/error_log
bantime  = 2592000
findtime = 3600
port    = http,https
maxretry = 5
backend  = polling
journalmatch =
 
[apache-wp-login-ssl]
enabled = true
filter   = apache-wp-login-ssl
logpath  = /var/log/httpd/ssl_error_log
bantime  = 2592000
findtime = 3600
port    = http,https
maxretry = 5
backend  = polling
journalmatch =
  
[wordpress-login]
enabled = true
filter   = wordpress-login
logpath  = /var/log/httpd/access_log
bantime = 345600
findtime = 86400
port    = http,https
maxretry = 6
backend = polling
journalmatch =
 
[wordpress-login-ssl]
enabled = true
filter   = wordpress-login-ssl
logpath  = /var/log/httpd/ssl_access_log
bantime = 345600
findtime = 86400
port    = http,https
maxretry = 6
backend = polling
journalmatch =

Save the file and exit vi.  Next we need to instruct Fail2ban to recognize the four new jails:

fail2ban-client reload apache-wp-login
fail2ban-client reload apache-wp-login-ssl
fail2ban-client reload wordpress-login
fail2ban-client reload wordpress-login-ssl

As an alternative to the above, we could just restart Fail2ban, which will restart all of the jails, and potentially spam your inbox with ssh blocking notifications:

systemctl restart fail2ban.service

Checking the status of the jails is quite simple to accomplish:

fail2ban-client status apache-wp-login
fail2ban-client status apache-wp-login-ssl
fail2ban-client status wordpress-login
fail2ban-client status wordpress-login-ssl

You might be curious about the emails that Fail2ban sends.  Below is a portion of an actual email that I received from Fail2ban recently:

Hi,

The IP 46.119.117.47 has just been banned by Fail2Ban after
12 attempts against wordpress-login.

Here is more information about 46.119.117.47:

[Querying whois.ripe.net]
[whois.ripe.net]
% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: this output has been filtered.
%       To receive output for a database update, use the “-B” flag.

% Information related to ‘46.118.0.0 – 46.119.255.255’

% Abuse contact for ‘46.118.0.0 – 46.119.255.255’ is ‘abuse@kyivstar.net’

inetnum:        46.118.0.0 – 46.119.255.255
descr:          Golden Telecom LLC
netname:        UA-SVITONLINE-20100517
org:            ORG-SOGT1-RIPE
country:        UA
admin-c:        GTUA-RIPE
tech-c:         GTUA-RIPE
status:         ALLOCATED PA
mnt-by:         RIPE-NCC-HM-MNT
mnt-lower:      GTUA-MNT
mnt-lower:      GTUA-WO-MNT
mnt-domains:    GTUA-ZONE-MNT
mnt-domains:    GTUA-MNT
mnt-routes:     GTUA-RT-MNT
mnt-routes:     GTUA-MNT
created:        2010-05-17T08:47:45Z
last-modified:  2011-08-04T15:58:57Z
source:         RIPE # Filtered

organisation:   ORG-SOGT1-RIPE
org-name:       Golden Telecom LLC
org-type:       LIR
address:        15/15/6 V. Khvojki str.
address:        04080
address:        Kiev
address:        UKRAINE
phone:          +380444900000
fax-no:         +380444900048
admin-c:        AEL17-RIPE
admin-c:        NP1533-RIPE
mnt-ref:        RIPE-NCC-HM-MNT
mnt-ref:        GTUA-MNT
mnt-by:         RIPE-NCC-HM-MNT
abuse-c:        GTL6-RIPE
created:        2004-04-17T12:09:58Z
last-modified:  2015-07-17T13:48:48Z
source:         RIPE # Filtered

role:           Golden Telecom Ukraine NOC
address:        Golden Telecom
address:        4 Lepse blvr
address:        Kiev, 03067, Ukraine
phone:          +380 44 4900000
fax-no:         +380 44 4900048
remarks:        All abuse notifications have to be sent on:
abuse-mailbox:  abuse@kyivstar.net
admin-c:        AEL17-RIPE
admin-c:        NP1533-RIPE
nic-hdl:        GTUA-RIPE
mnt-by:         GTUA-MNT
created:        2007-07-25T09:02:04Z
last-modified:  2014-06-17T08:24:26Z
source:         RIPE # Filtered

% Information related to ‘46.119.112.0/20AS15895’

route:          46.119.112.0/20
descr:          Kyivstar GSM, Kiev, Ukraine
origin:         AS15895
mnt-by:         GTUA-MNT
created:        2012-03-21T09:29:14Z
last-modified:  2012-03-21T09:29:14Z
source:         RIPE # Filtered

% This query was served by the RIPE Database Query Service version 1.80.1 (DB-2)
Lines containing IP:46.119.117.47 in /var/log/httpd/access_log

I am not sure why, but this particular email did not list the lines from the access_log that matched the filter rule.

Protecting WordPress running on Fedora 22 with .htaccess Files

One step that you may want to take is to password protect the /wp-admin directory on your web server.  To do that, you would create a new Linux user with a username and password that are difficult to guess based on your website name and WordPress users – the password should be at least eight characters long with upper and lower case letters, numbers, and punctuation marks.  Then, using tips from the last post in this message thread, create a file name .htaccess in the /wp-admin directory.  Inside that file, add the following lines (replace /full/path/to/your/wp-admin with the directory where you will later create a .htpasswd file):

AuthName "Admin Area"
AuthType Basic
AuthUserFile /full/path/to/your/wp-admin/.htpasswd
require valid-user
 
<Files admin-ajax.php>
    Order allow,deny
    Allow from all
    Satisfy any
</Files>

Next use the htpasswd generator website to create an encrypted version of the password for the Linux username.  For example, if you created the Linux user hillbillyforpresident with a password of GreatScott1TrumpIsAhead? the htpasswd website would instruct you to create a .htpasswd file with the following contents:

hillbillyforpresident:$apr1$gAgbX0SU$YjtXg5pAvXrD6i.F2lh6z1

Make certain that the .htaccess file (and possibly the .htpasswd file also) have read/write access for the owner, read access for the group in which Apache runs (the Apache user should not own the files), and that the files are not world readable.  For example:

chmod 640 /var/www/html/wp-admin/.htaccess

The wp-config.php file should also be protected with similar file permissions:

chmod 640 /var/www/html/wp-config.php

The .htaccess file in the web server’s root directory should also be adjusted to control which files may be accessed.  Below the # END WordPress line in the file, consider adding the following (once you understand what the lines accomplish – note that the entry containing 123\.123\.123\.123 should allow the IP address 123.123.123.123 to access the wp-login.php file):

# Block access to files.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^wp-admin/includes/ - [F,L]
RewriteRule !^wp-includes/ - [S=3]
RewriteRule ^wp-includes/[^/]+\.php$ - [F,L]
RewriteRule ^wp-includes/js/tinymce/langs/.+\.php - [F,L]
RewriteRule ^wp-includes/theme-compat/ - [F,L]
 
RewriteCond %{REQUEST_URI} ^(.*)?wp-login\.php(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*)?wp-admin$
RewriteCond %{REMOTE_ADDR} !^123\.123\.123\.123$
RewriteRule ^(.*)$ - [R=403,L]
</IfModule>
 
<files wp-config.php>
order allow,deny
deny from all
</files>
 
<Files .htaccess>
 order allow,deny
 deny from all
</Files>
 
# Stop Apache from serving .ht* files
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
 
Options -Indexes

WordPress and SELinux – a Headache Waiting to Attack

From what I understand, everything in the webserver’s root directory is set by default to the httpd_sys_content_t SELinux context – and sometimes that context is not present when files are copied into various subdirectories that are accessible to Apache.  The following command resets the SELinux context to the default context:

chcon -R -v -t httpd_sys_content_t /var/www/

Using FTP integrated with WordPress to install updated plugins or new WordPress versions is a bit of a nightmare because different SELinux contexts are required for the different directories – I never did find a combination that worked.  As a result, I added the following line to the wp-config.php file so that FTP integration is not necessary:

define( 'FS_METHOD', 'direct');

Of course the WordPress upload directory must have the httpd_sys_rw_content_t SELinux context, so at some point the following command would need to be executed:

chcon -R -v -t httpd_sys_rw_content_t /var/www/html/wp-content/uploads/

The same command may also need to be executed for the WordPress plugins and upgrade directories (and probably a tempfiles directory) so that it is possible to install and update plugins using the WordPress interface.  Right now I do not permit WordPress to auto-update when a new version is released (this is due to the file system permissions that I use that only allow the apache user to read the files, not change the files).  I previously created a download directory in the /var directory.  Whenever I need to upgrade WordPress to a new version I use a script with the following contents (note that the script was pieced together based on what the WordPress release notes stated needed to be updated):

cd /var/downloads
rm -rf /var/downloads/wordpress
rm /var/downloads/wordpress.tar.gz
wget https://wordpress.org/latest.tar.gz
mv latest.tar.gz wordpress.tar.gz
tar -xzf wordpress.tar.gz
chcon -R -v -t httpd_sys_content_t /var/downloads/wordpress/
chown -R FileOwnerHere:ApacheGroupHere /var/downloads/wordpress/
find /var/downloads/wordpress/ -type d -exec chmod 2755 {} +
find /var/downloads/wordpress/ -type f -exec chmod 2644 {} +
cp -av /var/downloads/wordpress/wp-admin/* /var/www/html/wp-admin/
cp -av /var/downloads/wordpress/wp-includes/* /var/www/html/wp-includes/
cp -v /var/downloads/wordpress/wp-content/* /var/www/html/wp-content/
cp /var/downloads/wordpress/*.php /var/www/html/
cp /var/downloads/wordpress/*.txt /var/www/html/
cp /var/downloads/wordpress/*.html /var/www/html/

The above information is consolidated from weeks, maybe months, of hammering on a seemingly simple problem – 12 years later (OK, maybe 16 years later) and I am still in search of the Linux program named Setup.exe that configures everything that needs to be configured to get a job done quickly.  Oh, going out on a limb here, let’s ask for a GUI interface too that works with Putty.  Or, even further out on a limb, let’s ask for consistency of file paths, filenames, and commands across the 790+ Linux distributions and versions within each distribution so that a how-to article created two years ago is still valid today.  Stepping off the soap box… or SOAP box.

If any readers have comments or suggestions that improve upon the above information (or gently correct), please feel free to add a comment below.  Maybe someone else will find some of the above information useful to avoid putting a couple of extra dents in the top surface of their desk.





Hyper-Extended Oracle Performance Monitor 7.0 Beta

16 08 2015

August 16, 2015

Three and a half years ago I offered the Hyper-Extended Oracle Performance 6.0 Beta for download – that Beta version expired a year later.  This program has been somewhat of a pet project for the last 13 years (almost 10 years at the time of the previous Beta offering), so I was excited to read some of the feedback about the 6.0 Beta.  The minimal feedback had me wondering for a while, then the pet project was pushed to a dark corner for roughly three years.  I taught the pet a few new tricks on Windows 10, so I thought that I make the latest version available.  Unlike the previous Beta, Beta 7.0 does not have an expiration date.

NOTE August 17, 2015: The first two download links are working now – Wordpress was blocking the downloads.  Note that the first two downloads are actually compressed .zip files with a .doc extension – inside those .zip files are the files that are of interest.  Right-click the files, save the files to your computer, then rename the files to end with .zip.

  1. Program download (right-click Save As, then change the .doc extension to .zip): Hyper-ExtendedOraclePerformanceMonitor7.zip
  2. Documentation download (right-click Save As, then change the .doc extension to .zip – still incomplete, see old docs): Hyper-Extended Oracle Performance Monitor7.doc
  3. Old Documentation download: Hyper-Extended Oracle Performance Monitor3.doc
  4. Update August 20, 2015: If you have an old computer, you may need to put the MSCOMCTL.OCX file in your C:\Windows\SysWOW64 folder (on 64 bit Windows, or C:\Windows\System32 on 32 bit Windows).  You will then need to register (process) the file with REGSVR32 – see the command in the instructions below.  A recent version of MSCOMCTL.OCX may be downloaded here: MSCOMCTL.OCX (save the file, then rename the file as MSCOMCTL.OCX).

Requirements:

Windows 2000 through Windows 10, 32 bit Oracle Client, MDAC/ADO 2.8 (or greater – preinstalled starting with Windows XP), connection to the Oracle database by database SID (tnsnames.ora) using Oracle’s Oracle Provider for OLE DB (Oracle’s Oracle ODBC Driver is required for connections that are initiated by SYS – primarily for access to the Advanced Init Parameter functionality), Excel 2000-2013, Microsoft Grid control (provided in the download – put into the program’s folder), TIMED_STATISTICS set to TRUE. Most features require DBA permissions (SYSADM or SYS), or specific grants to views or packages. For example, sessions that use the DBMS Xplan and Trace functionality must have permission to execute ALTER SESSION, have execute permission on the DBMS_XPLAN package, and select permissions on V$SQL, V$SQL_PLAN, and V$SQL_PLAN_STATISTICS.  Configuring session tracing requires execute permission on the DBMS_SYSTEM package and/or the DBMS_MONITOR package.

Description:

The Hyper-Extended Oracle Performance Monitor provides a variety of functions to simplify working with and performance monitoring Oracle databases. Various types of simple data dictionary and database instance related reports may be generated in Excel by simply putting a check in a box on the program’s main window, and clicking the Report button. Generating DBMS Xplan output (using the raw information from a 10046 level 4/12 trace) with optional 10046/10053/other traces, creating data change logging triggers using the data dictionary, peeking at the hidden and non-hidden initialization parameters, and access to a quick Oracle keyword search are also a click away on the program’s main window. Performance monitoring and tracing tools for checking a cross-reference of the time model and wait event statistics (at the system and session levels), checking a cross-reference of the system statistics and wait event statistics (with drill-down to the session level), reviewing potentially high-load SQL statements and their execution plans, cross-referencing the current session waits with enqueues, and enabling various types of Oracle traces are also a click away on the program’s main window. Permanently recording of performance monitoring statistics in a Microsoft Access compatible database (Microsoft Access is not required) is also a click away on the program’s main window.

General Command Line Parameters:

  • -D   The Database instance SID to which the program should connect.
  • -U   The user name to be used for connecting to the database instance.
  • -P   The password to be used for connecting to the database instance.

Logging-Specific Command Line Parameters:

  • -LC 20   Specifies Force a Log Capture when CPU Usage Exceeds value to 20%
  • -LI 30   Specifies Force a Log Capture if No Log Captured in Minutes value to 30 minutes
  • -LB   Specifies the Force a Log Capture when a Blocking Lock is Detected value to checked
  • -LW  Specifies the Force a Log Capture when a Wait Reason is Detected value to checked
  • -LR  Specifies the Capture SQL Execution Statistics for Wait Reasons value to checked
  • -LD   Specifies the Capture Segment Change Statistics value to checked
  • -LO   Specifies the Capture Operating System and Time Model Statistics value to checked
  • -LH   Specifies the Capture High Load SQL Statement Statistics value to checked
  • -LT   Specifies the Capture High Load SQL Statement Text value to checked
  • -LP   Specifies the Capture High Load SQL Statement Plan value to checked
  • -LHC 60   Species the minimum CPU time that is considered high load to 60 seconds accum.
  • -LHE 90   Species the minimum elapsed time that is considered high load to 90 seconds accum.
  • -LS   Specifies that Smart Logging should begin as soon as the login completes
  • -LE 240   Specifies that Smart Logging should end after 240 minutes
  • -LQ   Specifies that the program should quit (end) when logging ends

Using the above command line parameters, you could create a program shortcut for each of the databases that you administer.  For example, create a shortcut, and set its Target to (assuming that the program is on the root of the F: drive, the Oracle SID is OR1122P, the username is myuser, the password for that account is pword, and the smart logging should default to logging when 10% of CPU utilization is observed):

"F:\Hyper-Extended Oracle Performance Monitor.exe" -D OR1122P -U myname -p pword -LC 10

What’s New:

The Hyper-Extended Oracle Performance Monitor runs on Windows 10, even though the touchpad on the Sony laptop sometimes stops working until the computer is put to sleep and then awoken.  The program should also work with Oracle Database 12c (and might still be compatible with Oracle Database 8i, although the Time Model Viewer will not work).

Hyper7HyperExtendedMainScreen

The Time Model Viewer main window is still essentially the same, showing the hierarchy of the time model statistics, with color-coded session-level time model statistics (color ranges from yellow to red to indicate the percentage of the total represented by the session).  The bottom of the window shows the system-wide wait events, and the sessions that contributed to those system level wait events:

Hyper7TimeModelViewer

When the Time Model Viewer is active, an Excel spreadsheet is created that shows the graphic history of the statistics for the 20 most recent time capture periods – if you find this view helpful, you might want to widen the DB Time and CPU chart:

Hyper7TimeModelViewerExcelCharts

The statistics also appear in numerical form on the Statistics tab in the Excel spreadsheet:

Hyper7TimeModelViewerExcelStatistics

The Real-Time Monitor is also essentially the same as in previous Beta versions:

Hyper7RealTimeMonitor

Like the Time Model Viewer, an Excel spreadsheet is created showing the graphical history of the statistics.  Which statistics?  Click a wait event or one of the white statistics boxes to add that statistic to the list for which charts are created:

Hyper7RealTimeMonitorExcelCharts

Previous Beta versions of the program wrote out a tab delimited file containing the statistic deltas after every 30 time capture periods (this version also creates such a file).  The Beta for version 7 writes those statistics to Excel after each time capture period to facilitate the chart creation:

Hyper7RealTimeMonitorExcelStatistics

Oracle’s OLEDB provider is used for connectivity, see this article for a description of how to fix the bug in the Oracle Database 11.2.0.3 Client installer that is related to the OLEDB provider (the problem may have been corrected in later releases).  I have not yet found a method for SYS to login using Oracle’s OLEDB provider when “AS SYSDBA” must be specified for the login to complete.  As such, Oracle’s ODBC driver is used when SYS tries to login and the login attempt using the OLEDB provider fails.  Oracle’s ODBC driver is also used when viewing the normally hidden parameters using the SYS login.  Because there could be multiple Oracle homes, you must create a 32 bit ODBC System DSN using the Oracle ODBC driver – name that DSN HYPEREXTEND:

Hyper7Login

Hyper7ODBC1

Hyper7ODBC2

Hyper7ODBC3

The ODBC connection driver information is used to login as the SYS user (AS SYSDBA) to view the Advanced Initialization Parameters:

Hyper7AdvancedInitParameters

Hyper-Extended Oracle Performance Monitor 7.0 Beta is also more compliant with restrictions placed on the computer by User Account Control (User Access Control), which prohibits programs from create folders in the root of the C:\ drive and writing files into that folder.  Version 7 instead creates the OracleLog folder in the current user’s profile (typically C:\Users\username\OracleLog\), and separates the files created by database.  The picture below shows a couple of Microsoft Access compatible logging database that were created, as well as some of the tab-delimited statistic files, and various other generated files:

Hyper7LoggingFolder

As mentioned above, the documentation is still incomplete, with several pages of the documentation containing nothing but pictures.  It takes a lot of effort and time  to put the documentation together, so I will try to piece it together over the next couple of months.  You might be able to refer to the older documentation if the pictures are not self-explanatory.  If you find the program useful, please leave a note.  If you need help understanding a program window or the generated output, leave a comment attached to this blog article and I will try to help.





On the Topic of Technology… 8 – First 48 hours with Windows 10

3 08 2015

August 3, 2015 (Updated August 5, 2015, August 10, 2015, August 29, 2015)

(Back to the Previous Post in the Series)

I have been testing Windows 10 under the Windows Insider Program for several months using an old Dell Precision Core 2 Extreme computer (with NVidia graphics card) that was retired from engineering service a couple of years ago. I had some issues upgrading from one preview version to the next, such as the new Start menu failing to display – forcing a format and reinstall of the operating system. One version that automatically installed over the top of build 10074 caused the computer to continuously reboot. Odd, I thought, that a Windows update could cause that problem. I traced that issue back to disabling data execution protection (DEP) several years earlier on the engineering computer to keep some of the software from spontaneously crashing – turning DEP back on in the BIOS immediately resolved the reboot loop. I was still seeing occasional Start menu display glitches a week or so before the official release date for Windows 10 (July 29, 2015), but those problems were diminished with the application of Windows updates as the official release date neared.

—–

Quick note on August 12, 2015: I came across an article on Spiceworks that lead me down an interesting path since I am fighting driver issues on a Sony Vaio laptop.  In short, if you have a Sony computer, don’t expect Windows 10 compatible drivers for a while.  How long?  October or November 2015.

Windows10SonyWarning

In a related article, the following statement appears, apparently quoting an unconfirmed Sony source:

“In a message to Sony laptop owners, the firm has pleaded for patience when it comes to installing the new operating system (OS) due to the real risk of software or driver corruption that could result in a catastrophic data loss.

Sony’s advice goes on to say that if customers don’t wait for the test results then there is a chance that their computer “may no longer work as intended.” This can mean anything from the system crashing, becoming unresponsive or suffering from hardware damage. The bottom line is that tapping Sony to service your PC following any of these issues could result in “losing all of the data” on your computer.”

—–

A couple of years ago I bought a Sony SVE14AE13L (SVE14A27CXH) touch screen laptop as an incentive to learn how to use and troubleshoot Windows 8. With a Core i7 processor and 8GB of memory, I anticipated having an easy time working through Windows 8, and after watching several hours of how-to videos I was reasonably comfortable with Windows 8 – for someone who has worked with computers since the early 1980s it was a bit of a struggle, but not as bad as some people claimed (I also thought that Windows Vista was reasonably good before the release of Windows 7). That said, this computer was updated to Windows 8.1 on the day that the new version became available. This Sony laptop computer, however, had a couple of annoying behaviors. It would randomly wake up in the middle of the night, the touchpad would occasionally stop working, Internet Explorer on the computer was occasionally slow, and the touchpad would send random zoom in/out messages to the current application.

Windows 8.1 worked reasonably well on the Sony laptop, but I went ahead and reserved a copy of Windows 10 for the computer to get some experience with the release version of Windows 10. The first 48 hours of Windows 10 began this past Saturday. I grew tired of waiting for Microsoft to tell me that my copy of Windows 10 was available for installation, so I opened a Windows command line and executed this command: wuauclt.exe /updatenow – that seemed to kick start the upgrade, but it failed to install once roughly 3.5GB of installer files were downloaded. Another search of the Internet indicated that I should have first deleted everything in the C:\Windows\SoftwareDistribution\Download folder. After downloading another 3.5GB copy of the Windows 10 installer, I found that this attempt also failed after I clicked the Continue (or whatever that button showed) button in the Windows Update window. I started wondering what I was doing wrong, so I downloaded the ISO DVD image for Windows 10 from Microsoft.  Progress – writing the ISO DVD image to a DVD permitted the Windows 10 installer to start with a simple double-click.

Five or six hours in now, roughly 10.5GB downloaded from the Internet (I sure am happy that I no longer have the Internet connection with the 6GB monthly limit), and the Windows 10 installer is showing a message “Your PC will restart several times. Sit back and relax.” Don’t worry, be happy (click the picture for a larger view).

Windows10-95PercentComputerMayRestart

The install completed… with an error message, “The installation failed in the FIRST_BOOT phase with an error during MIGRATE_DATA operation.” Fantastic, at least the installer brought back the previous Windows 8.1 (click the picture for a larger view).

Windows10CouldNotInstallErrorFailedFirstBootMigrateData

That sure is an obscure error message. Google searches seemed to indicate that the problem could be caused by a corrupt profile, or a half dozen other issues.

I made another attempt at the upgrade, this time telling the installer NOT to install any Windows updates during the initial install. Perhaps eight or nine hours in, success – the Windows 10 Pro logon window is a keyboard tap away (note that the picture below was taken the following day).

Windows10FirstLogin

Great, Windows updates are automatically downloading and installing. Five minutes later, the laptop is apparently unhappy with Windows 10, a frown on a blue screen of death with the message “If you’d like to know more, you can search online later for this error: INTERNAL_POWER_ERROR

Windows10BlueScreenInternalPowerError

E-Moe-Gee! (emoji). Sure, collect your error info and restart. Recalling the message from the earlier picture: “Your PC will restart several times. Sit back and relax.” And thus started an automatic reboot loop, sometimes lasting just long enough to log into Windows 10 and display the Device Manager. In this case, that INTERNAL_POWER_ERROR message means that Windows 10 does not like the AMD Radeon HD 7600M series graphics card. Congratulations Microsoft and AMD, I have not had this level of difficulty with video card drivers since 1999 when I set up a triple boot on a computer with the latest NVidia graphics card, booting into Windows 98 (drivers easy to find), Windows NT 4.0 (a little more challenging), and Red Hat Linux with the Gnome X-Window desktop (on par with Windows 10 on this Sony laptop).

A couple of tips at this point. Pressing F8 during the initial Windows boot apparently does not display the old Windows 95 style boot menu that allows the computer to start in Safe Mode with limited functionality. If the computer is able to boot to the logon screen, there is a power button at the bottom-right of that screen. Hold down the Shift key on the keyboard, click the power button, and select Restart – if you continue to hold down the Shift key, you will have the option to Troubleshoot the computer. If you select Troubleshoot, then Advanced options, then Startup Settings, you will be able to select to start the computer in Safe Mode. Shutting off the computer when it is booting into Windows will often result in the next boot attempt taking the computer to the screen that allows selecting Troubleshoot, and eventually the option to start in Safe Mode. Once in Windows 10, holding the Windows flag key and pressing the X key on the keyboard has the same effect as right-clicking the Start button. A menu will appear permitting quick access to the Device Manager, Computer Management, Task Manager, and a variety of other computer administration tasks.

One of the tips that I found online for dealing with the INTERNAL_POWER_ERROR message was to quickly navigate to the Device Manager, and delete the Windows installed AMD Radeon HD 7600M item under the Display adapters heading. I tried that without success, and even tried deleting the Intel HD Graphics 4000 item under the Display adapters heading without success. Instructing Windows to scan for new devices resulted in two “Microsoft Basic Display Adapter” items being added under the Display adapters heading, leading to another blue screen after a couple of minutes.

Windows10DeviceManager1

Booting the computer to Safe Mode with Networking to do some troubleshooting, Windows 10 was stable. A web search using a different computer suggested that I needed to download the latest video card drivers for the AMD video card to fix the blue screen. I found that the new Edge web browser, which is the new default web browser on Windows 10, cannot start in Safe Mode, resulting in a message stating “This app can’t open. Microsoft Edge can’t be opened using the Built-in Administrator account. Sign in with a different account and try again.”  I was logged into the computer using my personal account, not the Built-In Administrator account, but I guess that detail did not matter.

Windows10CannotStartEdgeInSafeMode

Where did Microsoft hide Internet Explorer on Windows 10? Hold down the Windows key and press R. Type iexplore.exe and press the Enter key. Now, if only the wireless network worked in Safe Mode with Network Support! Trying again with the laptop connected by Cat 5e cable to a network switch, I managed to download the correct drivers. Sorry, cannot install AMD video card drivers in Safe Mode, “Failed to load detection driver.” Clicking OK caused the installer to hang at 100% complete. The previously registered Microsoft Word 2010 claimed that it could not verify that I was running an authentic version of Word 2010 while in Safe Mode, so the problem is not just with AMD when in Safe Mode.

Windows10CannotInstallATIDriversInSafeMode

Now what? Tell the computer to try reinstalling Windows 10 without installing updates – tried that, only to be greeted with a near instantaneous frowny face (a blue screen of death) upon completion. It took roughly 10 attempts to make it into the BIOS setup on the laptop (F2 did not work at the start of the boot) – I eventually found that applying about 200 pounds of pressure to the pink Assist button while powering on the Sony laptop allowed access to the BIOS setup. I found an option titled “Discrete Graphics Adapter” that was set to Enabled, so I changed that to Disabled, saved the changes, and managed to log into a stable copy of Windows 10… about 30 to 36 hours after the first download of Windows 10 initiated.

Windows10BIOSDisableDiscrete

After verifying that the computer worked fine with the Discrete Graphics Adapter disabled, I re-enabled that setting and made another attempt with the AMD graphics card.  The AMD graphics drivers installed without issue in Windows 10, but again resulted in a continuous reboot blue screen loop. I found that by quickly navigating to Device Manager after logon, I was able to set the AMD graphics device to Disabled before the computer would blue screen – once again Windows 10 was stable (and Microsoft Word 2010 worked OK too). Windows 10 installed various Windows updates, but simply refused to install the video driver for the Intel HD Graphics 4000 device.

Windows10WindowsUpdateCannotInstallIntelDrivers

As a result, Device Manager still shows a “Microsoft Basic Display Adapter” in Device Manager. Instructing Device Manager to update the drivers also failed.

Windows10DeviceManagerCannotInstallVideo

Even though the correct video card drivers are not installed, Windows 10 seems to work OK. Unfortunately, the problem where the touchpad would send random zoom in/out messages is still present – note the size of the zoomed icons in the desktop background in the following picture. I have yet to find a logical way to send the zoom back to normal on the desktop. The right side of the below picture shows the notification Action Center that is displayed by clicking the icon near the clock at the bottom-right of the screen.

Windows10VideoPlaybackNotifications2

The following picture shows that the Windows 8 new style apps are now able to appear on the desktop with regular Windows applications. I was shocked to see that Windows 8 would not permit overlapped windows for the new style apps – I think that the original release of Windows had the same problem, even though the Commodore Amiga from the same time period (mid to late 1980s) supported windows arranged on top of other windows. Cortana is shown at the left of the window in the picture below – apparently Cortana lost an argument with the Internet shortly before this screen capture was saved (Cortana is a bit evasive when asked “who are your programmers”). The Windows 10 scientific calculator appears at top-center, Microsoft Edge at bottom-center, Paint.Net at top-right, and the Microsoft Solitaire Collection app at the bottom right.

Windows10NewOldAppsNoInternet

As I stated above, overlapping Windows 8 apps are supported, and Cortana sent me to a web page about Easter Eggs – I thought that maybe that would be a good way to learn more about Cortana’s programmers. The speech input recognition seems to work very well in Cortana – as long as the Internet connection is not down.

Windows10OverlappingWindows

The picture below shows the new Start menu at the left – it is possible to stretch or narrow the Start menu to show more or less of the Windows 8 style live tiles on the Start menu. Changing settings is still a little confusing – for some settings it is necessary to click All Settings in the Action Center (the gray background window at the top-right), and in other cases it is necessary to use the Windows Vista style Control Panel (the white background window at the bottom-right).

Windows10StartMenuControlPanelAllSettings

The task view trick that was introduced with Windows Vista (Windows key and Tab key) still works, but has changed for the better. The Task View may also be opened by clicking the Task View button on the task bar at the bottom of the screen.

Windows10TaskView

The location of important settings seems to change with just about every new release of Windows. It would be nice if (almost) all settings could be found within a single interface. A trick that worked with older versions of Windows also works with Windows 10. Right-click the desktop (or inside any folder) and select New – Folder. Give that folder the following name:

GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}

Double-clicking that folder to open it gives you just what you need (until the touchpad starts randomly sending zoom in/out messages).

Windows10GodMode1
Windows10GodMode2
Windows10GodMode3
Windows10GodMode4

Oh, it appears that the Oracle Database 11.2.0.3 database instance survived all of the upgrading nonsense, although it did take an exceptionally long time to start SQL Plus.

I have been working with computers for a long time – started back in the early 1980s. My new boss (almost five months now) at work for some reason thinks that I am a programmer… just a programmer? Last week I casually mentioned to him that I had read 30+ books about Oracle Database, and hinted that I co-authored another. I wonder if he has a clue about the books that I read about network administration (including a couple of CNE books in the late 1990s), operating systems, Microsoft Exchange (two for Exchange 2013, and other for previous versions), computer hardware troubleshooting, and a variety of other computer related skills. Well, at least he thinks that I am a programmer, and not just one who fixes computers (not that there is anything wrong with fixing computers, right Microsoft/Sony/AMD?).

As a suggestion, before you attempt a Windows 10 upgrade on a computer with an AMD video card, ask yourself if you have 24+ hours to spare and how much you like seeing frowns on blue screens. If the above is not enough of a warning, consider the before and after webcam photos captured by the Sony laptop’s webcam that follow.

Before picture: staring down Windows 8.1, trying to figure out why the window zoom keeps randomly changing:

Windows10Before

After Windows 10 is installed picture: note that more than the background changed.  I think that Cortana is stalking me now (why do I have a purple square target on my head?).  Just what is she doing in the middle of the night when the computer spontaneously turns on from a deep sleep?:

Windows10After

Humor aside, like Windows NT 4.0, once you get through the blue screens of death during the driver installs, Windows 10 will likely be rock solid stable.  I am already liking it more than Windows 8.1, even if it is not as fast on the same hardware.

Update August 4, 2015:

Below is a possible fix for the random zoom in/zoom out problem if your computer has a Synaptics touchpad.  This solution will probably work on Windows 8/8.1 also.  Near the bottom right of the screen, click the up pointing arrow to show all of the tray program icons.  There may be a tray icon named Synaptics Pointing Device that looks like a rectangle with two smaller rectangles below – click that tray icon, and then click Pointing Device Properties from the menu.  Note that if you also want to turn off the potentially annoying left-clicks due to touching the touchpad with a bit too much force, click the Tap to Click menu item to remove the checkmark in front of that item.

aWindows10FixUnintentionalZoom1

On the Device Settings tab, click the Settings button.  Remove the checkmark in front of Pinch Zoom, click the OK button, and then click the OK button in the Mouse Properties window.  This random zoom behavior has plagued the laptop for some time, but only became beyond bearable when the randomly activating feature affected the desktop icons in Windows 10.

Windows10FixUnintentionalZoom2

Update August 5, 2015:

If your desktop icons are super-sized as shown below (icons are shown at actual size) due to random zoom-in/zoom-out messages from your touchpad, there is a simple quick-fix (after shutting off this feature using the instructions above).  Right-click an open area of the desktop, select View and then Medium icons.

Windows10FixOverZoomedDesktopIcons

Microsoft released the first cumulative update KB3081424 for Windows 10 64 bit computers within the last hour – that update is being rolled out and installed automatically on all Windows 10 64 bit computers (Windows 10 computers that are joined to a domain may or may not automatically install this update).  After the update installed, I told the computer to go ahead and reboot.  After rebooting the computer, the Windows Start menu would not appear, Cortana would not respond, and the Action Center would not displayI saw this same behavior when testing the automatic updates in the various Technical Preview versions – in those cases I had to format the hard drive and reinstall Windows to recover.  Fortunately, in this case the functionality was restored by simply rebooting the computer (if this problem happens to you, there is no obvious way to tell Windows to reboot – just hold down Ctrl, then Alt, and press the Delete key once – a power option should appear near the bottom right of the screen to permit you to restart the computer).  This Start menu, Cortana, and Action Center functionality loss, if permanent and widespread, could cause a bit of havoc.

Windows10UpdateKillsStartMenu

Note that I am still fighting the Intel and AMD video card driver issues (Intel driver will not install, AMD driver installs with a blue screen).  I might have found a solution for the computer waking up unexpectedly in the middle of the night, but I will wait to post what I found until I confirm that the fix works.

Update 2 August 5, 2015:

Fixing the issue where the Intel HD Graphics 4000 device is listed as “Microsoft Basic Display Adapter” in Device Manager and Windows returns an error when installing the correct driver through Windows Update and Device Manager is a bit of a hassle.  The actual error message is not displayed when the driver fails to install.  If you experience this problem, immediately navigate to the C:\Windows\SoftwareDistribution\Download\Install folder, and locate a file that ends with .inf.  Right-click that file and select Install.  If Windows returns an error “There is no driver selected for the device information set or element.” that likely means that the company that set up the driver’s .inf file made an error in the file.

Windows10WindowsUpdateCannotInstallIntelDrivers2

The short-term solution for this problem is to download the Windows 7, 8, and 8.1 graphics driver package from the Intel website.  After a bit of searching, I found a suitable driver for the Intel HD Graphics 4000 device here.  After the installer completed, the computer had to restart – Device Manager then showed the HD Graphics 4000 device in place of the “Microsoft Basic Display Adapter” in Device Manager.  After the driver installation Windows Update still shows that the Windows 10 driver for the Intel graphics device fails to install, so there is still an issue that Microsoft/Intel need to rectify with the driver update.

Windows10FixIntel4000Driver

Now to tackle the blue screen of death caused by the AMD Radeon video driver…

Update August 10, 2015:

Peer-to-peer Windows Updates – potentially OK if the computer is connected to a trusted network, and the default Windows Update settings are adjusted; potentially dangerous at the default setting.  The default settings for Windows Updates essentially make your computer part of a world-wide Windows Update torrent.   If your computer is connected to a pay-as-you-go Internet provider, or if your Internet provider charges substantial fees for exceeding a certain monthly data cap (6GB per month, for instance), you should definitely modify the default update settings.  If you are concerned about your computer retrieving and automatically installing Windows updates from computers that have no association with Microsoft, other than the computers are running Windows 10, you should definitely modify the default update settings.  I was made aware of this particular issue when discussing Windows 10 with a relative who is an IT expert.  I also read a couple of articles today that described the torrent-like behavior of Windows 10 updates.

To fix the default setting, select Settings from the Start menu, then click Update & security.  Click Advanced options, then click Choose how updates are delivered.

Windows10DisableWindowsUpdateSharing1\

If your computer is always connected to a trusted network (never taken to a coffee shop, hotel, LAN party, etc.), consider changing the default setting to PCs on my local network – this setting could be beneficial if you have a pay-as-you-go or monthly capped Internet connection.  If you ever connect the computer to an untrusted network, consider changing the default On setting to Off.

Windows10DisableWindowsUpdateSharing2

Update August 29, 2015:

On August 5, 2015 I stated, ” I might have found a solution for the computer waking up unexpectedly in the middle of the night, but I will wait to post what I found until I confirm that the fix works.”  The fix that I implemented seems to have corrected this problem that has plagued the laptop for quite some time (possibly since new, nearly three years ago).  If your computer wakes up unexpectedly in the middle of the night, and you would like to end that behavior, click the Windows 10 Start Menu, then click Control Panel.  In the search box at the right type power and then click Edit power plan at the left.

Windows10DisableWakeUp1

In blue lettering near the top of the window you should see the words “Change settings for the plan: ” followed by the selected power plan name (High performance in the picture below) – make note of the selected power plan name.  Click Change advanced power settings.

Windows10DisableWakeUp2

Check to make certain that the same power plan appears selected in the list – if not, select the power plan that appeared in the previous window.  Click the + next to Sleep, then the + next to Allow wake timers.  Change both the On battery and Plugged in settings to Disable.  Finally, click OK to save the changed settings.  Note: it may be necessary to click the Change settings that are currently disabled link prior to making these changes.

Windows10DisableWakeUp3

I have now upgraded a computer from Windows 8.1 Pro (Ultimate?) to Windows 10 Pro (Sony laptop), another from Windows 8.1 Home to Windows 10 Home, another from Windows 7 Ultimate to Windows 10 Pro, and a fourth (and fifth) from Windows 7 Home to Windows 10 Home.  The Sony laptop has at least twice had a panic attack a couple of minutes after coming out of sleep.  During the panic attack the fan ramps up to maximum speed, the touch pad and keyboard stop responding, and the power button must be held in for 10 second to shut the computer off (the laptop does not go to sleep with a quick press of the power button).  The Sony computer has also crashed a couple of times with an irql_not_less_or_equal blue screen at least once in the last week.  The computer that was upgraded from Windows 7 Pro to Windows 10 Pro has spontaneously rebooted three times (twice within 30 minutes) while building Windows large icon thumbnails for a number of Apple Quicktime .mov video files.  The computer is connected to a true sign wave UPS, so I know that the reboot is not caused by a power problem (I was next to the computer each time the reboot happened, no blue screen, no notification entries in the Windows event logs except that the previous shut down was unexpected).  One of the computers upgraded from Windows 7 Home to Windows 10 Home is a Dell laptop where the G and H keys did not work correctly before or after the upgrade (some people have reported that this is a known intermittent problem with this particular Dell laptop model) – BIOS upgrades for the laptop refused to install following the upgrade to Windows 10 Home.  I do not yet have a status update from the other two upgraded computers.





To N or Not to N, is SYS_OP_C2C the Question; Oracle NVARCHAR Slow?

8 02 2015

February 8, 2015

I was recently contacted about an odd performance issue with a software product named TDM, where certain SQL statements submitted to an Oracle 9.2.x database required roughly 30 seconds to execute when a nearly instantaneous response was expected.  The person who reported the issue to me provided a sample SQL statement that appeared something similar to the following:

SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;

In the SQL statement, notice the N character that is immediately before ‘112’ in the SQL statement.  The person indicated that the SQL statement executed quickly if that N character were removed from the SQL statement.  At this time the developer of that application is unwilling to release a bug fix to remove the N character from this (and likely other) SQL statements.

I did not initially have the table datatype descriptions (retrieved with DESC MVIS_DATA), so I made a couple of guesses about the datatypes.  What if the TOOLID column was defined as a number, and is it the primary key column for the table (indicating that there must be an index on that column)?  It might be the case that the developer of the application decided that in all SQL statements that are submitted with literal values (rather than using bind variables), that all numbers would be submitted in single quotes.  I created a testing table for a mock up in Oracle Database 11.2.0.2:

CREATE TABLE MVIS_DATA_NUM (
  TOOLID NUMBER,
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA_NUM
SELECT
  ROWNUM TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA_NUM',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

With the testing table created with 100,000 rows, and statistics gathered for the table and primary key index, I then tried executing a query and retrieving the execution plan for that query so that I could determine if the Predicate Information section of the execution plan provided any clues.  I executed the following, the first SQL statement retrieved one row, and the second SQL statement retrieved the execution plan for the first SQL statement:

SET LINESIZE 140
SET PAGESIZE 1000
 
SELECT
  DATA
FROM 
  MVIS_DATA_NUM
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The execution plan output is as follows:

SQL_ID  gzzrppktqkbmu, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA_NUM WHERE   TOOLID = N'112'   AND
DATATYPE = 0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1080991
 
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA_NUM |     1 |   113 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050817  |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"=112)

Nothing too unusual in the above execution plan, the N’112′ portion of the SQL statement was simply changed by the query optimizer to the number 112, which permitted the use of the table’s primary key index.  Obviously, Oracle Database 11.2.0.2 contains a few optimizations that are not available in Oracle Database 9.2.x, so maybe the outcome would be different in Oracle Database 9.2.x.  However, when a number value is compared to a character (for example VARCHAR2) value, Oracle Database will attempt to implicitly convert the character value to a number value when performing the comparison, so the outcome should be the same on Oracle Database 9.2.x.

What if that TOOLID column were defined as VARCHAR?  Below is another test table with that column defined as VARCHAR2:

CREATE TABLE MVIS_DATA (
  TOOLID VARCHAR2(15),
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA
SELECT
  TO_CHAR(ROWNUM) TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

With the new test table created, let’s try the SQL statement again.  A 10053 trace file will be enabled in the event that you are interested in examining any potential automatic transformations of the SQL statement:

SET LINESIZE 140
SET PAGESIZE 1000
 
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'SQL_10053V';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
 
SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));
 
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

The first of the above SQL statements output one row.  Here is the execution plan that was output:

SQL_ID  5pkwzs079jwu2, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = N'112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 353063534
 
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |   227 (100)|          |
|*  1 |  TABLE ACCESS FULL| MVIS_DATA |   122 | 13908 |   227   (3)| 00:00:01 |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL
              AND "DATATYPE"=0 AND SYS_OP_C2C("TOOLID")=U'112' AND "COMMISSIONID" IS
              NULL))

Notice the SYS_OP_C2C function in the Predicate Information section of the execution plan, that is a clue that there might be a performance problem lurking.  Also notice that the INDEX UNIQUE SCAN operation was replaced with a TABLE ACCESS FULL operation, that is also a clue that a performance problem may be lurking.  This section of the execution plan also indicates that the N’112′ portion of the SQL statement was changed to U’112′.  Consulting the 10053 trace file indicates that the query optimizer rewrote the submitted SQL statement to the following:

SELECT
  "MVIS_DATA"."DATA" "DATA"
FROM
  "TESTUSER"."MVIS_DATA" "MVIS_DATA"
WHERE
  SYS_OP_C2C("MVIS_DATA"."TOOLID")=U'112'
  AND "MVIS_DATA"."DATATYPE"=0
  AND "MVIS_DATA"."COMMISSIONID" IS NULL
  AND "MVIS_DATA"."OPERATIONID" IS NULL
  AND "MVIS_DATA"."COMMISSIONLISTPOS" IS NULL

SYS_OP_C2C is an internal characterset conversion function.

What happens to the execution plan if the N character is removed from the SQL statement?

SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = '112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

Below is the execution plan that was output:

SQL_ID  d70jxj3ypy60g, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = '112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1051843381
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA    |     1 |   114 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050814 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"='112')

Notice that the SYS_OP_C2C function does not appear in the Predicate Information section of the execution plan this time, and that the primary key index is used, rather than requiring a full table scan.  Unfortunately, the DBMS_XPLAN.DISPLAY_CURSOR function does not exist in Oracle Database 9.2.0.x, otherwise the reason for the performance problem may have been much more readily apparent to the person who reported the issue to me.

So, what is the purpose of that N character in the SQL statement?  I recall seeing SQL statements similar to this one in the past, which converts a character string to a date:

SELECT DATE'2015-02-08' FROM DUAL;
 
DATE'2015
---------
08-FEB-15

After a fair amount of digging through the Oracle documentation, I located the following note about that N character:

“The TO_NCHAR function converts the data at run time, while the N function converts the data at compilation time.”

Interesting.  That quote suggests that the author of the SQL statement may have been trying to convert ‘112’ to a NVARCHAR2 (or NCHAR).  Time for another test, the below script creates a table with the TOOLID column defined as NVARCHAR2, populates the table with 100,000 rows, and then collects statistics on the table and its primary key index:

CREATE TABLE MVIS_DATA_N (
  TOOLID NVARCHAR2(15),
  DATATYPE NUMBER,
  COMMISSIONID NUMBER,
  OPERATIONID NUMBER,
  COMMISSIONLISTPOS NUMBER,
  DATA VARCHAR2(100),
  PRIMARY KEY (TOOLID));
 
INSERT INTO
  MVIS_DATA_N
SELECT
  TO_CHAR(ROWNUM) TOOLID,
  MOD(ROWNUM,2) DATATYPE,
  NULL COMMISSIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) OPERATIONID,
  DECODE(MOD(ROWNUM,2),0,NULL,MOD(ROWNUM,2)) COMMISSIONLISTPOS,
  LPAD('A',100,'A') DATA
FROM
  DUAL
CONNECT BY
  LEVEL<=100000;
 
COMMIT;
 
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA_N',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)

Executing the query against this table also returns one row:

SELECT
  DATA
FROM 
  MVIS_DATA_N 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The execution plan follows:

SQL_ID  1yuzz9rqkvnpv, child number 0
-------------------------------------
SELECT   DATA FROM   MVIS_DATA_N WHERE   TOOLID = N'112'   AND DATATYPE
= 0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1044325464
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA_N  |     1 |   119 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | SYS_C0050815 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND
              "DATATYPE"=0 AND "COMMISSIONID" IS NULL))
   2 - access("TOOLID"=U'112')

Notice in the above Predicate Information section that the SYS_OP_C2C function does not appear, and the N’112′ portion of the SQL statement was still changed to U’112′.  The execution plan also shows that the primary key index was used, while a full table scan was required when the TOOLID column was defined as a VARCHAR2.

The person who reported the issue to me later provide the output of DESC MVIS_DATA, which indicated that the TOOLID column was in fact defined as a VARCHAR2 column.  If this person were running a more recent version of Oracle Database, he might be able to create a function based index that uses the SYS_OP_C2C function on the TOOLID column.  Such an index might look something like this:

CREATE INDEX IND_TOOLID_FIX ON MVIS_DATA (SYS_OP_C2C("TOOLID"));

Gathering statistics on the table and its indexes, executing the original SQL statement, and outputting the execution plan:

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'MVIS_DATA',CASCADE=>TRUE,NO_INVALIDATE=>FALSE)
 
SELECT
  DATA
FROM 
  MVIS_DATA 
WHERE 
  TOOLID = N'112' 
  AND DATATYPE = 0 
  AND COMMISSIONID IS NULL 
  AND OPERATIONID IS NULL 
  AND COMMISSIONLISTPOS IS NULL;
 
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'TYPICAL'));

The first of the above queries output the expected one row, while the second query output the following execution plan:

SQL_ID  5pkwzs079jwu2, child number 1
-------------------------------------
SELECT   DATA FROM   MVIS_DATA WHERE   TOOLID = N'112'   AND DATATYPE =
0   AND COMMISSIONID IS NULL   AND OPERATIONID IS NULL   AND
COMMISSIONLISTPOS IS NULL
 
Plan hash value: 1497912695
 
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| MVIS_DATA      |     1 |   125 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_TOOLID_FIX |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("OPERATIONID" IS NULL AND "COMMISSIONLISTPOS" IS NULL AND "DATATYPE"=0
              AND "COMMISSIONID" IS NULL))
   2 - access("MVIS_DATA"."SYS_NC00007$"=U'112')

In the Predicate Information section of the execution plan, notice the absence of the SYS_OP_C2C function on the TOOLID column – those values are pre-computed in the virtual column (SYS_NC00007$) created for the function based index.  An index range scan is reported in the execution plan, rather than an index unique scan (the function based index, when created, was not declared as unique), so the SQL statement should execute much faster than the roughly 30 seconds required by the SQL statement without the function based index.

So, what are the options that were mentioned above?

  • Have the application programmer fix the SQL statements.
  • Upgrade to a version of Oracle Database (if that version is supported by the application) that supports the SYS_OP_C2C function, and create a function based index using that function.
  • If the TOOLID column only contains numbers, that column should be defined as NUMBER, rather than VARCHAR2.  Consider redefining that column as a NUMBER datatype.
  • If that N character is always used when this column is referenced, that column probably should be defined as NVARCHAR2 rather than VARCHAR2.  Consider redefining that column as a NVARCHAR2 datatype.
  • Consider that the application is working as designed, and that it is nice to receive 30 second breaks every now and then.
  • Take a DUMP and share it with the application developer.  On second thought, such an approach may have limited success in helping to solve this problem.

Here is a little experiment with the DUMP function, which reveals Oracle’s internal representation of data – refer to the Internal datatypes and datatype codes in the Oracle documentation for help in decoding the Typ= values.

SELECT DUMP(112) A FROM DUAL;
 
A
---------------------
Typ=2 Len=3: 194,2,13
 
/* ------------------- */
SELECT DUMP(TO_CHAR(112)) A FROM DUAL;
 
A
---------------------
Typ=1 Len=3: 49,49,50
 
/* ------------------- */
SELECT DUMP('112') A FROM DUAL;
 
A
----------------------
Typ=96 Len=3: 49,49,50
 
/* ------------------- */
SELECT DUMP(N'112') A FROM DUAL;
 
A
----------------------------
Typ=96 Len=6: 0,49,0,49,0,50
 
/* ------------------- */
SELECT DUMP(SYS_OP_C2C('112'))  A FROM DUAL;
 
A
----------------------------
Typ=96 Len=6: 0,49,0,49,0,50
 
/* ------------------- */
SELECT DUMP(TO_NCHAR('112')) A FROM DUAL;
 
A
---------------------------
Typ=1 Len=6: 0,49,0,49,0,50

It is possibly interesting to note that the internal representation for N’112′ is CHAR (or NCHAR), while the internal representation for TO_NCHAR(‘112’) (and TO_NCHAR(112)) is VARCHAR2 (or NVARCHAR2).

This blog’s statistics indicate that the search engine search term Oracle NVARCHAR slow resulted in two page views of this blog yesterday.  I can’t help but wonder if the person who performed that search might have been helped by some of the above analysis.





Install Nagios on a Synology DiskStation DS415+ (Plex Support Also Added)

24 12 2014

December 24, 2014 (Modified December 29, 2014 – Added Plex Support Section, January 11, 2015 – Added Nagios Web Status and Fixed Status Logging, April 5, 2015 – Added Plex/Nagios/Wordpress Stress Test, May 25, 2015 – current Plex download supports CPU in DS415+)

(Back to the Previous Post in the Series)

This article describes how to compile and run Nagios 4.0.8 (with 2.0.3 plugins) on a Synology DiskStation DS415+ (64 bit), which utilizes an Intel Atom Avoton processor (cat /proc/cpuinfo indicates that the unit is using a 2.40GHz Atom C2538, and utilizes the DSM 5.1-5021 (the latest version as of December 24, 2014) operating system.  Nagios is a very useful network monitoring (and even Oracle Database monitoring with plugins) utility that uses email or other communication means to report device or service outages.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors (such as the DS212+), but I have not yet fully tested the procedure with many different NAS units.  Note that the even though the DS415+ and DS1813+ both use Intel CPUs running a 64 bit DSM version, the installation steps are a bit different (I tried various unsuccessful approaches over the last two months, and finally found a working combination).

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins (see my article that describes setting up the Oracle client on a Diskstation to permit Oracle Database monitoring).  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal (note that the steps are more efficient than those for the DS1813+ and DS412+) – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, website accessibility, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal & SNMP.

nagiosds415_enable telnet1

Then put a checkmark next to Enable Telnet service, and click the Apply button.

nagiosds415_enable_telnet2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

nagiosds415_enable telnet_computer

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key  For example, if the DiskStation has an IP address of 192.168.1.55, the screen would appear similar to the following just before pressing the enter key:.

nagiosds415_telnet_to_diskstation

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

nagiosds415_telnet_to_diskstation_command_test

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for the DS415+, DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in a new downloads directory – copy and paste the lines to the Telnet session, one line at a time – simply right-click inside the telnet window to paste, if you are using Windows 7/8’s telnet client):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Press the i key to switch to insert mode, place a # character in front of the two lines that contain the word PATH, and save the file (press the Esc key then type ZZ):

nagiosds415_comment_out_path

Reboot the DiskStation using the Synology interface by clicking the Options button (the head shaped icon) near the top-right, and then click Restart and confirm that the DiskStation should be restarted:

nagiosds415_restart diskstation

Once the Synology reboots, reconnect a telnet session to the Synology.  Update ipkg support and install some ipkg items that will be either directly used by Nagios, or during the installation of Nagios.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – simply ignore that error.  An error will also appear when installing Apache – that problem will be fixed in the step that follows.

ipkg update
ipkg list
ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install MySQL
ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils
ipkg install apache

The ipkg install apache command will likely result in an eroor related to ext_filter_module.  Edit /opt/etc/apache2/httpd.conf using vi, and put a # character in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so :

nagiosds415_comment_out ext_filter_module

Save the file, and re-execute the ipkg install apache command.

ipkg install apache

Download the compilers for the Avoton CPU, which is installed in the DS415+ (the second compiler downloaded below will be used with the Nagios install – the first compiler typically resulted in errors (ex: ../lib/libnagios.a: could not read symbols: Archive has no index; run ranlib to add one) when compiling the Nagios program):

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%205.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.40%20%28Avoton%29/gcc473_glibc217_x86_64_avoton-GPL.tgz
tar zxpf gcc473_glibc217_x86_64_avoton-GPL.tgz -C /usr/local/
wget http://sourceforge.net/projects/dsgpl/files/DSM%205.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.40%20%28Avoton%29/gcc463_glibc213_i686_avoton-GPL.tgz 
tar zxpf gcc463_glibc213_i686_avoton-GPL.tgz -C /usr/local/

The above command will create two directory structures named x86_64-pc-linux-gnu and i686-pc-linux-gnu in the /usr/local directory.  Edit the .profile file to remove the # character that was earlier added in front of the lines containing the word PATH, save the file, then reboot the Synology (see the earlier provided screen captures for the lines that need to be modified, and the process for rebooting the Synology):

vi /root/.profile

Reconnect to the DiskStation using Telnet after the Synology reboots, and compile libltdl – note that the PATH modification below seems to be necessary to avoid an ldconfig: command not found error or a message about Perl not being found:

PATH=$PATH:/opt/bin/
cd /opt/share/libtool/libltdl/
env CC=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-gcc \
LD=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ld \
RANLIB=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-pc-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-pc-linux-gnu/lib" \
./configure --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --build=i686-pc-linux-gnu -–prefix=/opt
 
make all
make install

Note that the following message may appear later when we attempt to verify that Nagios compiled correctly:

/opt/bin/nagios: error while loading shared libraries: libltdl.so.3: cannot open shared object file: No such file or directory

To avoid that error, execute the following commands to copy a couple of files to the correct locations (if the files do not already exist in those locations):

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/libltdl.so

Using the DiskStation’s Control Panel, click Group, and create a nagios group – the group does not require any special DiskStation permissions.

nagiosds415_create_nagios_group

Using the DiskStation’s Control Panel, click User and create a nagios user and add that user to the nagios group. The nagios user does not require any specific DiskStation permissions.

Next, switch back to the telnet session, download the Nagios source code, and compile the source code (note that if a mistake is made, and the nagios-4.0.8 directory must be removed, change to the /volume1/downloads directory and use this command to remove the directory, then re-execite the tar command: rm -rf nagios-4.0.8 ):

cd /volume1/downloads
wget http://sourceforge.net/projects/nagios/files/nagios-4.x/nagios-4.0.8/nagios-4.0.8.tar.gz
tar xzf nagios-4.0.8.tar.gz
cd nagios-4.0.8
 
env CC=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-gcc \
LD=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ld \
RANLIB=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-pc-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-pc-linux-gnu/lib" \
./configure --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --build=i686-pc-linux-gnu -–prefix=/opt --with-command-group=nagios
 
make all
make install
make install-init
make install-config
make install-commandmode

Try running the Nagios verify command by executing the command on the following line (a Segmentation fault (core dumped) error message is bad news, which may indicate that the wrong compiler was used):

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the compile process worked correctly, the telnet session should appear similar to below when the above verify command is executed:

nagiosds415_verify_ok

If the above command was successful, copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

chown nagios:nagios /opt/bin/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file) – we will take a look at those files later:

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios-4.0.8
vi Makefile

nagiosds415_change_httpd_conf

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, and specify a password when prompted (edit: January 11, 2015: the file location specified by the command below is incorrect if you plan to use the Nagios monitoring web pages – if you intend to use the Nagios monitoring web pages, execute the command as shown below, and we will later recreate the file in the correct location):

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Next, we need to start setting up the plugins for Nagios.  First the net-snmp source code is downloaded and extracted:

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

Execute the following to compile the net-snmp source (note that this command uses the second compiler that was downloaded):

env CC=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-gcc \
LD=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ld \
RANLIB=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-pc-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-pc-linux-gnu/lib" \
./configure --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --build=i686-pc-linux-gnu -–prefix=/opt

Several prompts will appear on the screen when the command is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget https://www.nagios-plugins.org/download/nagios-plugins-2.0.3.tar.gz
tar xzf nagios-plugins-2.0.3.tar.gz
cd nagios-plugins-2.0.3
PATH=$PATH:/opt/bin/

Now compile the Nagios plugins:

env CC=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-gcc \
LD=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ld \
RANLIB=/usr/local/i686-pc-linux-gnu/bin/i686-pc-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-pc-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-pc-linux-gnu/lib" \
./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"  --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu --build=i686-pc-linux-gnu -–prefix=/opt
 
make -i
make install -i

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user. Without the following fix (described here), you might see the following error message when attempting to execute a command as the nagios user:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name (such as localhost) that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If there were no problems, then something similar to the following should appear (a Segmentation fault (core dumped) error message is bad news, which may indicate that the wrong compiler was used):

nagiosds415_check_ping_test

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified DNS name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your DNS domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

If you see the following message, then there are two problems:

/opt/etc/mail/sendmail.cf: line 73: fileclass: cannot open '/opt/etc/mail/local-host-names': Group writable directory
WARNING: local host name (DS415) is not qualified; see cf/README: WHO AM I?

To fix the second problem (the warning), edit the /etc/hosts file using vi. Locate the line with the Synology’s IP address and the Synology’s name (for example DS415).  Between those two entries add the Synology’s name followed by a period and the domain name.  For example, the line may look like the following once edited.  Save the file and exit vi:

192.168.1.55   DS415.MyDomainHere.com DS415

To fix the “cannot open… Group writable directory” error, try the following (note that it is possible that only the first line is necessary, so try the sendmail test ago after executing just the first line below):

chmod 555 /opt/etc
chmod 755 /opt/etc/mail
chmod 444 /opt/etc/mail/local-host-names

Then repeat the email test by executing:

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

As a general tip, it is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the Nagios configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

Once the monitoring target devices are defined, it is possible to start Nagios as a background task (daemon) by executing the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, use the ps command to locate the Nagios process, find the lowest process ID (PID) for the line that includes nagios.cfg (17346 in this case), then kill the process that is associated with that PID:

ps | grep 'nagios'

nagiosds415_stop_nagios_daemon

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  See the previous article for the steps to use Microsoft Excel.

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

If you have selected to use any of the custom files that were created based on my previous article, instruct Nagios to read the associated file by adding entries to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

A large number of changes were likely made to the Nagios configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart Nagios as described above.

—-

(Added January 11, 2015)

While trying to find a way to allow the Nagios status web pages to work on the DS415+, I discovered that Nagios was not running in daemon mode once I had made entries into the various configuration files to specify the devices to be monitored.  Taking a look at my previous article, I quickly found the solution for the daemon mode problem.  In a telnet session enter the following:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
mkdir /opt/var/nagios/rw/
chown nagios:nagios /opt/var/nagios -R
chmod g+rwx /opt/var/nagios/rw
chmod g+s /opt/var/nagios/rw
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
object_cache_file=/opt/var/nagios/objects.cache
precached_object_file=/opt/var/nagios/objects.precache
status_file=/opt/var/nagios/status.dat
command_file=/opt/var/nagios/rw/nagios.cmd
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

Save and exit vi.  If the Nagios daemon is running in the background, find the process and kill it (replace 24532 with the process ID of the first line that contains /opt/bin/nagios -d /opt/etc/nagios.cfg):

ps | grep 'nagios'
kill 24532

Verify the Nagios configuration, and if there were no problems, then start Nagios in daemon mode:

/opt/bin/nagios -v /opt/etc/nagios.cfg
/opt/bin/nagios -d /opt/etc/nagios.cfg

Check the Nagios log file for problems, displaying just the last 100 lines:

tail -n 100 /opt/var/nagios/nagios.log

I never bothered to determine how to make the Nagios monitoring status web pages work with the Synology DS412+, DS1813+, and DS212+.  I thought that I would see what steps would be required to make that functionality work on a Synology DS415+ (note that the following steps may work exactly the same on the Synology DS412+, DS1813+, and DS212+). The Synology DSM operating system utilizes port 5000 for web traffic.  If you add the WordPress package to the Synology, that package uses the standard port 80 for web traffic.  If you followed the above steps for installing Nagios, you installed a second copy of the Apache web server on the Synology that uses port 8000 for web traffic.  If your Synology has an IP address of 192.168.1.60, then you would use the following website address to access the second web server on the Synology: http://192.168.1.60:8000/

The Nagios monitoring status web pages, once set up, will appear as a directory of that second web server, as shown below:

nagiosds415_web_status

The value of the Nagios monitoring status web pages becomes apparent fairly quickly by reviewing the above screen capture.  The above Tactical Overview shows that there are 18 monitored devices that are up, one monitored device that is down, and another device that is in the process of being checked.  The links at the left provide additional information about the monitored devices.  Let’s configure the second copy of Apache on the Synology to support the Nagios monitoring status web pages.

The second copy of Apache uses the configuration file /opt/etc/apache2/httpd.conf.  Edit that file:

vi /opt/etc/apache2/httpd.conf

Add the following directly below the # Supplemental configuration heading, which is near the end of the file:

# Added by per Charles Hooper's Nagios installation guide for Synology DS415+
Include etc/apache2/conf.d/nagios.conf
Include etc/apache2/conf.d/php.conf

Save the file and exit vi.  Next, we need to create the htpasswd.users file in the correct location.  In the above steps, we executed the following command:

/opt/sbin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

That htpasswd.users file is expected to be in the /opt/etc directory.  Execute the following command to create the file in the correct directory (you will be prompted for a password for the nagiosadmin user):

/opt/sbin/htpasswd -c /opt/etc/htpasswd.users nagiosadmin

The Nagios monitoring status web pages require PHP support to be added to the second Apache installation.  Execute the following commands to install PHP:

/opt/bin/ipkg update
/opt/bin/ipkg list
/opt/bin/ipkg install php
/opt/bin/ipkg install php-apache

Next, we need to modify the /opt/etc/apache2/conf.d/nagios.conf file that was created during the Nagios installation.

vi /opt/etc/apache2/conf.d/nagios.conf

Below the heading <Directory “/opt/share”> add the following line:

   DirectoryIndex index.php

Save the file and exit vi.  Next, we need to adjust the php.ini file on the Synology:

vi /opt/etc/php.ini

In the file (near line 108), locate the following line:

output_buffering = Off

Change that line to show:

output_buffering = 8192

Locate the following line (near line 248) in the file:

memory_limit = 8M

Change that line to show:

memory_limit = 128M

There may be a second php.ini file located in the /etc/php directory, make the same change to that file, then save the file and exit vi:

vi /etc/php/php.ini

Perform a graceful restart of the second copy of Apache:

/opt/sbin/apachectl -k graceful

Try to access the Nagios status monitoring pages in a web browser (replace 192.168.1.60 with the IP address of your Synology)  http://192.168.1.60:8000/nagios/

You should be prompted to enter a username and password.  Enter nagiosadmin for the username, and enter the password for that user that you created when executing the /opt/sbin/htpasswd command earlier.  If the web page does not display, take a look at the last 100 lines of the Apache error log to see if any clues are provided:

tail -n 100 /opt/var/apache2/log/error_log

Some of the interesting error messages that I experienced including the following:

[Sun Jan 04 14:07:55 2015] [error] [client 192.168.1.218] (2)No such file or directory: Could not open password file: /opt/etc/htpasswd.users
 
[Sun Jan 04 14:23:04 2015] [error] [client 192.168.1.218] Directory index forbidden by Options directive: /opt/share/
 
[Sun Jan 04 15:07:09 2015] [error] [client 192.168.1.218] File does not exist: /opt/share/<, referer: http://192.168.1.60:8000/nagios/
 
[Sun Jan 04 17:53:06 2015] [notice] child pid 15126 exit signal Segmentation fault (11)
 
[Sun Jan 11 09:41:57 2015] [error] [client 192.168.1.213] script not found or unable to stat: /opt/sbin/statusmap.cgi, referer: http://192.168.1.60:8000/nagios/side.php

The last of the above errors still remains, the file /opt/sbin/statusmap.cgi does not exist.  For now, I will let someone else find a solution for that problem.

If you find that a monitored device is down, using the web interface it is possible to disable future checking of the device, as well as alter a number of other notification options.  Unfortunately, clicking any of the options under the Host Commands heading will result in a permission error.

nagiosds415_change_notifications

To fix the permission error:

Using the Synology Control Panel interface, create a new group named www – this group requires no specific Synology permissions.  Next, using the Synology Control Panel interface, create a user named apache – make this user a member of the nagios, users, and www groups.  This user requires no specific Synology permissions.  Then change the username and group under which Apache executes:

vi /opt/etc/apache2/httpd.conf

In the httpd.conf file, locate the following two lines:

User nobody
Group #-1

Change those lines to appears as follows:

User apache
Group www

Save the file and exit vi.  Gracefully restart the second copy of Apache:

/opt/sbin/apachectl -k graceful

Refresh the Nagios monitoring status web page – the options under the Host Commands heading should no longer return an error.

—-


Adding Plex Support (Added December 29, 2014)

Note: Some of these steps may use utilities that were installed during the Nagios install.  Plex does not yet support the CPU type in the DS415+, so it is necessary to modify a configuration file during the install.  (Note May 25, 2015: The Plex download version 0.9.12.1.1079 natively supports the DS415+’s CPU, so after downloading the file it is possible to skip to adjusting the Trust Level setting step in the Synology Package Center.)

Connect to the DS415+ using a telnet session and the root username and password (see the directions above, if the steps are not clear).  Change to the downloads directory that was created above, and use the wget command to download what is as of today the current version of Plex:

cd /volume1/downloads/
wget https://downloads.plex.tv/plex-media-server/0.9.11.7.803-87d0708/PlexMediaServer-0.9.11.7.803-87d0708-x86.spk

Create a directory, extract the downloaded PlexMediaServer-0.9.11.7.803-87d0708-x86.spk file to the just created directory, then switch to that directory:

mkdir PlexMediaServer-0.9.11.7.803-87d0708-x86
tar -xvf PlexMediaServer-0.9.11.7.803-87d0708-x86.spk -C /volume1/downloads/PlexMediaServer-0.9.11.7.803-87d0708-x86
cd PlexMediaServer-0.9.11.7.803-87d0708-x86

Next, we need to edit the INFO file to add support for the CPU that is installed in the DS415+:

vi INFO

If you are not familiar with using vi to edit files, you will need to switch vi to edit mode by typing i on the keyboard (note that is a lowercase i) before attempting to make any changes to the file.  When you are finished making changes, press the Esc key on the keyboard, then type ZZ to save and exit (note that is uppercase ZZ).

In the file, change the line:

arch="x86 cedarview bromolow evansport"

to:

arch="x86 cedarview bromolow evansport avoton"

When finished with the edit, the file should appear as below.  Save the file and exit vi.

plexds415_modify_arch

Next, we need to create a new spk file that includes the modified INFO file, and copy that file to a shared folder (mysharedfolder in this example) that was previously created on the DS415+ that is accessible from a computer that has access to the DS415+’s DSM interface.

tar -cvf /volume1/downloads/PlexMediaServerMod-0.9.11.7.803-87d0708-x86.spk *
cd ..
cp PlexMediaServerMod-0.9.11.7.803-87d0708-x86.spk /volume1/mysharedfolder/PlexMediaServerMod-0.9.11.7.803-87d0708-x86.spk

Now on the computer that has access to the DS415+’s DSM interface, click Package Center, then click Settings.  Under the Trust Level heading, change the setting from Synology Inc. to “Any publisher“, then click OK.

plexds415_modify_package_settings

Next, install Plex using the Manual Install feature.  Click the Manual Install button, then click the Browse button.  Switch to the shared folder where the spk file was copied (mysharedfolder in the above example), and select to open the PlexMediaServerMod-0.9.11.7.803-87d0708-x86.spk file.  Click the Next button to complete the install of Plex.

plexds415_manual_install

I have only just started experimenting with Plex, so I do not have a full grasp of its capabilities yet.  There are several “channels” that can be added to watch certain types of recorded video.  After experimenting with a couple of the easily accessible channels, I stumbled across this page, which described how to add several additional “unsupported” channels.  The following steps seem to work to install the additional channels from a telnet session connected to the DS415+.  Download and extract the unsupported channels:

cd /volume1/downloads/
wget https://dl.dropboxusercontent.com/u/15227710/UnSupportedAppstore.bundle.zip
mkdir plexunsupported
unzip UnSupportedAppstore.bundle.zip -d ./plexunsupported/

Change to the directory where the Plex plugins are stored, create a directory for the new plugin that was just downloaded and extracted, copy the downloaded plugin files to the directory that was just created, then change the ownership of the downloaded and extracted files to the plex user:

cd /volume1/Plex/Library/Application\ Support/Plex\ Media\ Server/Plug-ins/
mkdir UnSupportedAppstore.bundle
cp -r /volume1/downloads/plexunsupported/UnSupportedAppstore.bundle/* ./UnSupportedAppstore.bundle
chown -R plex:users ./UnSupportedAppstore.bundle

As far as I could determine, the unsupported channels will only appear when the Plex package is stopped and then started again.  In the Package Center’s Installed list click Plex Media Server.  Click the down arrow to the right of Action, then click Stop.  Wait 30 seconds, then click the down arrow again and select Run.

plexds415_restart_plex

If the DS415+ is named ds415, the it should be possible to access Plex on the Synology using this link http://ds415:32400/web/index.html (replace ds415 with the unit’s IP address if the unit’s network name is not known).  Clicking the Channels link at the left should show the Unsupported AppStore (the black icon below with the white border)

plexds415_after_restart_unsupported1

Clicking the Unsupported AppStore icon should then display a list of categories at the left.  The New category contains quite a few channels, as shown below.  To add the channel, click the channel, then click the Install link.

plexds415_after_restart_unsupported2

I am sure that there is much more to Plex than just offering an alternative to a Roku player or the streaming video options of some Blu-Ray players and network capable TVs (which are also able to retrieve the channels from Plex), but I have not had much time to experiment yet.  I understand that Plex is very good at transcoding video streams for different media consumption devices, such as tablets, but I am not sure that I have a use for that functionality yet.

—-


Plex/Nagios/Wordpress Stress Test (Added April 5, 2014)

A question was asked on the Synology forums whether or not I had any evidence that replacing the pre-installed 2GB of memory with an 8GB Crucial memory card made a measurable difference in the Synology DS415+’s performance.  That is a very difficult question to answer – the answer will depend largely on how the DS415+ is used.  If the DiskStation is only used to receive 10+ GB files from a single client, the additional memory probably will make no difference in the performance of the DS415+.

When the DS415+ was first released there were several discussion threads on the Synology forums indicating that the Synology DS415+ was not powerful enough to handle 1080p video for clients wishing to play back that video on various streaming devices, while the less powerful DS415Play could support 1080P playback for clients because it had additional coprocessors to support video transcoding.  So, I thought that I would try a quick test with the DS415+.  The DS415+ is running Nagios to monitor several devices, including 15+ high definition security cameras, as well as acting as the FTP target for those security cameras.  The DS415+ is also running WordPress, although WordPress has not seen much action lately on the unit.  Plex is occasionally used on the DS415+, but had not been used in the last 10 to 12 hours.  The DS415+ with 8 GB of memory installed showed the following memory statistics in its Resource Monitor: Reserved: 217.6 MB, Used: 741.9 MB,  Buffer: 151.7 MB, Cached: 6.8 GB, Free: 166.5 MB.  So, most of the additional 6 GB of memory was used for file caching, which helps speed up the apparent write speed of files, and the read speed of frequently accessed files.

So, I opened the WordPress website on the DS415+ and viewed a couple of pages.  I then went to the Plex website hosted on the DS415+ using a Window 8.1 tablet and requested the playback of a 1080P movie – the tablet supports 1080P video.  The DS415+ still had a bit of CPU capacity left, and the video was playing back smoothly on the tablet.  Just for fun I then used a Sony Blu-Ray player to request a different 1080P movie from the DS415+ while the first movie was still streaming to the tablet.  Below is a screen capture of a portion of the DSM interface on the DS415+ while both video streams were playing back (with occasional buffering):

plexds415_plex_playback_2_clients

I still cannot say for certain if the additional memory helped in this particular stress test.  However, for the current cost of the 8GB memory card ($55 USD), the additional memory could prove to be helpful depending on how the DS415+ is used – all that it would take in this case is for a WordPress cache optimizer to consume 400 MB of additional memory to push the DS415+ over 2 GB of total memory consumption.

July 19, 2013 (Modified July 27, 2013, July 28, 2013, November 19, 2013)

(Forward to the Next Post in the Series)

—-

Update July 27, 2013:

  • I now have Nagios running on an ARM based Synology DiskStation DS212+.  Most of the steps are the same as outlined below, however there are a few additional errors that must be addressed (see below additional steps).
  • All of the ./configure commands should have included –prefix=/opt (rather than –prefix=/usr/local or completely omitting that parameter).  That change eliminates the need to copy the Nagios plugins to the correct location.  Possibly related, the -i parameter was unnecessary for the snmp and Nagios plugins make and make install commands when the ./configure command included the –prefix=/opt prefix.
  • The wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20×86%20Linux%203.2.11&#8230; download step for the gcc compiler is apparently unnecessary, at least on the Synology DiskStation DS212+ (see below).

—-

This article describes how to compile and run Nagios on a Synology DiskStation DS1813+ (64 bit) or Synology DiskStation DS412+ (32 bit, the 32 bit steps should also apply to the DS1812+) NAS, both of which utilize Intel Atom processors (cat /proc/cpuinfo indicates that the DS412+ is using a 2.13GHz Atom D2700, while the DS1813+ is using a 2.13GHz Atom D2701), and utilize the DSM 4.2 operating system.  Not all Synology DiskStation NAS devices use Intel based CPUs – some of the less expensive DiskStations use ARM type processors (see this link to determine the type of CPU installed in a specific DiskStation).  It may be possible to produce a working version of Nagios on NAS devices that do not have Intel 32 bit or 64 bit processors, but I have not yet fully tested the procedure.

Warning: A lot of what follows is based on experimentation, with the end goal of having Nagios running on a Synology DiskStation having the ability to ping devices on the network or the Internet, with an email sent to an administrator when a device stops responding to ping requests, and to send a second email when the device resumes responding to ping requests.  This functionality represents a small fraction of Nagios’ capabilities through the use of plugins.  File paths vary from one Linux distribution to the next, so that adds a bit of challenge to make certain that the files are placed in the required directory.  Copying a file to the wrong directory may temporarily disable the DiskStation and require the reinstallation of the Synology DSM operating system.  The directions below are not final, and quite likely do not represent the most efficient approaches to accomplish the end goal – but the directions will hopefully be “close enough to correct” to allow the average reader of this blog to ping and send email alerts from a DiskStation.

I have relied on the free Nagios network monitoring solution since 2002 to provide an early warning of problems associated with network attached equipment including servers, production floor computers, switches, printers, wireless access points, IP cameras, Internet connection stability, etc.  While I rely on Nagios’ alerting system, I am not an expert at configuring the Nagios network monitoring system; the Nagios configuration documentation may be downloaded here.

First, make certain that the Telnet Service (or SSH Service if that is preferred) is enabled on the DiskStation.  In the DiskStation’s Control Panel, click Terminal.

InstallNagiosDiskStation1

Place a checkmark next to Enable Telnet service (if the item is not already checked), and then click the Apply button.

InstallNagiosDiskStation2

Verify that the computer that you intend to use has a Telnet client.  For Windows 7, access the Programs link in the Control Panel, and then click the Turn Windows features on or off link.  Make certain that there is a checkmark next to Telnet Client, then click the OK button.

InstallNagiosDiskStation3

Open a command line (in Windows, Start – Run – type  cmd  and press the Enter key).  On the command line, type telnet followed by either the name of the DiskStation or the IP address of the DiskStation, then press the Enter key.  When prompted for a username, type root and press the Enter key.  Type the admin user’s password (that is used to access the DSM interface in a web browser) and press the Enter key.

InstallNagiosDiskStation4

The command line on the DiskStation is very similar to the command line on a Unix or Linux computer, and is somewhat similar to a Windows command line or MS-DOS command line (use / rather than \, use ls rather than dir, use vi rather than edit):

InstallNagiosDiskStation5

We first need to add ipkg support to the DiskStation, detailed directions may be viewed at this link.  The exact directions may be different for other DiskStation models, but the following directions work for both the DS1813+ and DS412+ (note that all files downloaded from the Internet will be placed on volume1 in the downloads directory – copy and paste the lines to the Telnet session, one line at a time):

cd /volume1
mkdir downloads
cd downloads
wget http://ipkg.nslu2-linux.org/feeds/optware/syno-i686/cross/unstable/syno-i686-bootstrap_1.2-7_i686.xsh
chmod +x syno-i686-bootstrap_1.2-7_i686.xsh
sh syno-i686-bootstrap_1.2-7_i686.xsh

The vi editor is used on the DiskStation to modify files; that vi editor is a bit challenging to use at first sight, so you may need help with a couple of basic commands (see this quick reference for other commands).  The commands in vi are case sensitive (i is not the same as I).  When a file is opened, press the i key on the keyboard to allow making changes to the file (such as typing commands, or deleting commands).  When finished making changes to the file press the Esc key.  Once the Esc key is pressed, type ZZ to save the changed file and quit, or :q! to quit without saving the changes.

Next, we must modify the file that establishes the environment for the root user, when that user connects to the DiskStation.  This change is needed as part of the ipkg installation.  Edit the .profile file used by the root user:

vi /root/.profile

Add a # character in front of the two lines that contain the word PATH, then save the file (see the brief directions above to switch between command and insert mode in vi):

InstallNagiosDiskStation6

Next, reboot the DiskStation by clicking the Restart button in the Synology DSM interface (note: it should be possible to type reboot in the Telnet interface, however the DiskStation locked up the one time I attempted to execute that command).

InstallNagiosDiskStation7

Once the DiskStation reboots, reconnect to the DiskStation using Telnet, connecting as the root user, just as was done earlier.

The ipkg command should now work on the command line.  First, request that an updated list of available packages is downloaded, then display that list of packages:

ipkg update
ipkg list

Next, download a couple of packages that will be used by the Nagios network monitoring tool.  Note that using ipkg to install packages is a lot easier than compiling source code, so have fun with the ipkg utility.  When installing the optware-devel package, an error may appear stating that there is an incompatibility between wget and wget-ssl – just ignore that error for now.

ipkg update wget-ssl
ipkg install optware-devel
ipkg install gcc
ipkg install libtool
ipkg install mysql

Next, we need to compile a file and copy a couple of files:

cd /opt/share/libtool/libltdl/
./configure --prefix=/opt
make all
make install

cp /usr/syno/apache/modules/mod_ext_filter.so /opt/libexec/mod_ext_filter.so
cp /usr/syno/apache/modules/*.* /opt/libexec/

Now, install the Apache package:

ipkg install apache

If an error message is displayed on screen about mod_ext_filter.so, then modify the /opt/etc/apache2/httpd.conf file and add a # in front of the line LoadModule ext_filter_module libexec/mod_ext_filter.so and save the file.  Re-execute the ipkg install apache command (note that the up arrow on the keyboard may be pressed to quickly retype one of the previously executed commands).

InstallNagiosDiskStation8

Using the DiskStation’s Control Panel, create a nagios group and a nagcmd group (the nagcmd group probably will not be used for anything specific).  These groups do not require any special DiskStation permissions.

InstallNagiosDiskStation9

Using the DiskStation’s Control Panel, create a nagios user and add that user to the nagios and nagcmd groups.  The nagios user does not require any specific DiskStation permissions.

Next, switch back to the Telnet session, download the Nagios source code, and compile the source code:

DiskStation DS212+ Notes:

The following ./configure call was used on the DS212+:

./configure --prefix=/opt --with-command-group=nagios --disable-nanosleep --enable-nanosleep=no

The ./configure aborted with the following error message:

checking for pthread_create in -lpthread... no
checking for pthread_mutex_init in -lpthread... no
checking for pthread_create in -lpthreads... no
checking for pthread_create in -llthread... no
checking if we need -pthread for threads... no
checking for library containing nanosleep... no
Error: nanosleep() needed for timing operations.

The test that threw the error is located roughly 63% of the way through the configure file (on roughly line 5635).  If the exit 1 line in the configure file is commented out, then the configure step will complete.  However, the make all command will then fail with the following error messages:

/volume1/downloads/nagios/base/nebmods.c:363: undefined reference to `dlclose'
nebmods.o: In function `neb_load_module':
/volume1/downloads/nagios/base/nebmods.c:218: undefined reference to `dlopen'
/volume1/downloads/nagios/base/nebmods.c:249: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:266: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:299: undefined reference to `dlsym'
/volume1/downloads/nagios/base/nebmods.c:225: undefined reference to `dlerror'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v2@GLIBC_PRIVATE'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v2@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_rt_sa_restorer_v1@GLIBC_PRIVAT
E'
/opt/lib/gcc/arm-none-linux-gnueabi/4.2.3/../../../../arm-none-linux-gnueabi/lib/libpthread.so: undefined reference to `__default_sa_restorer_v1@GLIBC_PRIVATE'
collect2: ld returned 1 exit status
make[1]: *** [nagios] Error 1
make[1]: Leaving directory `/volume1/downloads/nagios/base'
make: *** [all] Error 2

After a bit of searching on the Internet, I found a page that suggested making the following changes (note that I unsuccessfully tried a couple of other steps that may have also partially corrected the issue):

mkdir /opt/arm-none-linux-gnueabi/lib_disabled
mv /opt/arm-none-linux-gnueabi/lib/libpthread* /opt/arm-none-linux-gnueabi/lib_disabled

cp /lib/libpthread.so.0 /opt/arm-none-linux-gnueabi/lib/
cd /opt/arm-none-linux-gnueabi/lib/
ln -s libpthread.so.0 libpthread.so
ln -s libpthread.so.0 libpthread-2.5.so

After making the above changes, I was able to run the configure and make all commands without receiving an error.

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzf nagios-3.5.0.tar.gz
cd nagios
./configure --prefix=/opt --with-command-group=nagios
make all
make install
make install-init
make install-config
make install-commandmode

We apparently need to copy a couple of files to different locations at this point:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Undo the changes that were earlier made to the /root/.profile file, where # characters were added in front of any line that contained the word PATH.  Remove those # characters and save the file:

vi /root/.profile

(This part still needs some fine tuning to make the web interface work with Nagios.)  Edit the Nagios Makefile and change the line beginning with HTTPD_CONF to show HTTPD_CONF=/opt/etc/apache2/conf.d  Then save the file.

cd /volume1/downloads/nagios
vi Makefile

InstallNagiosDiskStation10

Execute the following command:

make install-webconf

Create a nagiosadmin user for the web administration, specify a password when prompted:

htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Update November 19, 2013:

GabrielM reported in a comment below that it may be necessary to specify the full path to the htpasswd program:

/usr/syno/apache/bin/htpasswd -c /usr/local/etc/htpasswd.users nagiosadmin

Install a couple of additional ipkg packages that will be used by Nagios (the last package adds a ping utility that may be used by Nagios – the security permissions on the DiskStation prevent non-root users from using the built-in ping utility):

ipkg install openssl
ipkg install openssl-dev
ipkg install sendmail
ipkg install inetutils

A step that may or may not be required is to download a functioning C++ compiler (some of the commands below point to files provided with the C++ compiler) – it appears that there should already be a compiler on the DiskStation at this point (in /opt/bin), so the successful completion of this task of downloading a usable C++ compiler might not be required.

DiskStation DS212+ Notes:

These wget and tar steps were completely skipped on the DS212+

For the DiskStation DS1813+ 64 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.1%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Cedarview%29/gcc420_glibc236_x64_cedarview-GPL.tgz
tar zxpf gcc420_glibc236_x64_cedarview-GPL.tgz -C /usr/local/

For the DiskStation DS412+ 32 bit:

cd /volume1/downloads
wget http://sourceforge.net/projects/dsgpl/files/DSM%204.2%20Tool%20Chains/Intel%20x86%20Linux%203.2.11%20%28Bromolow%29/gcc421_glibc236_x86_bromolow-GPL.tgz
tar zxpf gcc421_glibc236_x86_bromolow-GPL.tgz -C /usr/local/

Now the net-snmp source code is downloaded and extracted:

DiskStation DS212+ Notes:

The ./configure call on the DS212 (might also work on the other DiskStation models):

./configure –prefix=/opt

The make call threw several errors, including:

/bin/sh: arm-none-linux-gnueabi-ld: not found
make[2]: *** [../blib/arch/auto/NetSNMP/default_store/default_store.so] Error 127

Before running the make command on the DS212+, execute the following command:

ln -s /opt/bin/ld /opt/bin/arm-none-linux-gnueabi-ld

The -i parameter may be omitted when running the make and make install commands.

cd /volume1/downloads
wget http://sourceforge.net/projects/net-snmp/files/net-snmp/5.7.2/net-snmp-5.7.2.tar.gz
tar xzf net-snmp-5.7.2.tar.gz
cd net-snmp-5.7.2

For the DiskStation DS1813+ 64 bit, execute the following to compile the net-snmp source (note that this command uses the compiler that was downloaded):

env CC=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-gcc \
LD=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ld \
RANLIB=/usr/local/x86_64-linux-gnu/bin/x86_64-linux-gnu-ranlib \
CFLAGS="-I/usr/local/x86_64-linux-gnu/include" \
LDFLAGS="-L/usr/local/x86_64-linux-gnu/lib" \
./configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux --prefix=/usr/local

For the DiskStation DS412+ 32 bit, execute the following to compile the net-snmp source (note: I could not use any of the different compilers that I tried downloading due to the compilers crashing with one of two error messages, so this command uses the compiler in /opt/bin):

env CC=/opt/bin/i686-linux-gnu-gcc \
LD=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ld \
RANLIB=/usr/local/i686-linux-gnu/bin/i686-linux-gnu-ranlib \
CFLAGS="-I/usr/local/i686-linux-gnu/include" \
LDFLAGS="-L/usr/local/i686-linux-gnu/lib" \
./configure --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local

Several prompts will appear on the screen when either of the two commands is executed.  I entered the following for the prompts:

Default version of SNMP to use (3): 3
System Contact Information: (Enter)
System Location (Unknown): (Enter)
Location to write logfile (/var/log/snmpd.log): /opt/var/snmpd.log
Location to write persistent information (/var/net-snmp): (Enter)

Two additional commands to execute:

make -i
make install -i

Now we need to download the source code for the Nagios plugins (check_apt, check_breeze, check_by_ssh, check_clamd, check_cluster, check_dhcp, check_disk, check_disk_smb, check_dns, check_dummy, check_file_age, check_flexlm, check_ftp, check_http, check_icmp, check_ide_smart, check_ifoperstatup, check_ifstatus, check_imap, check_ircd, check_jabber, check_ldap, check_ldaps, check_load, check_log, check_mailq, check_mrtg, check_mrtgtraf, check_mysql, check_mysql_query, check_nagios, check_nntp, check_nntps, check_nt, check_ntp, check_ntp_peer, check_ntp_time, check_nwstat, check_oracle, check_overcr, check_ping, check_pop, check_procs, check_real, check_rpc, check_sensors, check_simap, check_smtp, check_snmp, check_spop, check_ssh, check_ssmtp, check_swap, check_tcp, check_time, check_udp, check_ups, check_users, check_wave) that allow Nagios to perform various monitoring tasks:

cd /volume1/downloads
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/

Update November 19, 2013:

GabrielM reported in a comment below that the occasionally changing “current version” of the Nagios plugins makes it difficult to download the plugins from the source shown above.  If you open the http://prdownloads.sourceforge.net/sourceforge/nagiosplug/ web page in a web browser, the web browser will be redirected to http://sourceforge.net/projects/nagiosplug/files/ which contains the following statement:

“The Nagios Plugins are no longer distributed via SourceForge. For downloads and other information, please visit: https://www.nagios-plugins.org/
Source: README.md, updated 2013-10-01″

If you follow that link and then click the Download heading at the top of the page, there should be a link on the page that allows access to the current version of the Nagios plugins.  That link is currently: https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

The command that GabrielM provided should work:

wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz

DiskStation DS212+ Notes:

The following configure, make, and make install commands were used:

./configure --prefix=/opt --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w"
make
make install

For the DiskStation DS1813+ 64 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=x86_64-linux-gnu --target=x86_64-linux-gnu --build=x86_64-pc-linux
make -i 
make install -i

For the DiskStation DS412+ 32 bit:

./configure --with-openssl=/usr/syno/bin/openssl --with-nagios-user=nagios --with-nagios-group=nagios --with-ping-command="/opt/bin/ping -c %d %s" --psdir=/bin --with-ps-varlist="&procpid,&procppid,&procvsz,&procrss,procprog,&pos" --with-ps-cols=6 --with-ps-format="%d %d %d %d %s %n" --with-ps-command="/bin/ps -w" --host=i686-linux-gnu --target=i686-linux-gnu --build=i686-linux-gnu --prefix=/usr/local
make -i 
make install -i

Copy the Nagios plugins to the location expected by Nagios:

DiskStation DS212+ Notes:

The plugins were installed in the correct location on the DS212+

cp /usr/local/nagios/libexec/*.* /opt/libexec
cp /usr/local/nagios/libexec/* /opt/libexec
cp /usr/local/libexec/check_* /opt/libexec

Update November 19, 2013:

GabrielM reported in a comment below that the third command above may fail.  Depending on the compile options used, the first two commands or the third command may fail.  The first two commands are intended to accomplish the same task as the third command; the first two commands or the last command are expected to fail, but all three commands should not fail.  I should have explained this potential area of concern better.

Copy the Nagios startup script to the correct location so that Nagios will automatically start when the DiskStation is rebooted:

cp /usr/local/etc/rc.d/nagios /opt/etc/init.d/S81nagios

Verify that the ownership of the nagios directory is set correctly:

DiskStation DS212+ Notes:

The file is actually in the /opt/bin directory, so use this command instead:

chown nagios:nagios /opt/bin/nagios/nagios -R
chown nagios:nagios /usr/local/nagios -R

In addition to the main /opt/etc/nagios.cfg Nagios file, there are several other configuration files that are potentially used by Nagios (defined in the nagios.cfg file):

/opt/etc/objects/commands.cfg
/opt/etc/objects/contacts.cfg
/opt/etc/objects/timeperiods.cfg
/opt/etc/objects/templates.cfg
/opt/etc/objects/localhost.cfg
/opt/etc/objects/windows.cfg
/opt/etc/objects/server.cfg
/opt/etc/objects/switch.cfg
/opt/etc/objects/printer.cfg

We need to make a couple of adjustments in the  /opt/etc/objects/commands.cfg file.

vi /opt/etc/objects/commands.cfg

Change the ‘notify-host-by-email’ command definition section as follows:

define command{
    command_name notify-host-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘notify-service-by-email’ command definition section as follows:

define command{
    command_name notify-service-by-email
    command_line /usr/bin/printf "%b" "Subject: $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$\n\n***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /opt/sbin/sendmail -vt $CONTACTEMAIL$
    }

Change the ‘check_ping’ command definition section as follows (feel free to read the documentation for check_ping and specify different values):

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000,25% -c 5000,90% -p 3 
        }

Save the file and exit vi.

At this point, the Nagios network monitoring utility will likely experience an error similar to the following when attempting to send an alert email:

output=collect: Cannot write ./dfr6BFFPC7027203 (bfcommit, uid=1026, gid=25): Permission denied

Execute the following commands, which should fix the above problem:

chmod g+w /opt/var/spool/clientmqueue
chmod 444 /opt/etc/mail/*.cf
chmod 7555 /opt/sbin/sendmail

We will need to use su to test the execution of various commands as the nagios user.  Without this fix (described here), you might see the following error message:

su: warning: cannot change directory to /var/services/homes/nagios: No such file or directory su: /sbin/nologin: No such file or directory

Enter the following commands:

mkdir /var/services/homes
mkdir /var/services/homes/nagios
chown nagios:nagios /var/services/homes/nagios -R
vi /etc/passwd

Locate the line in the passwd file for the Nagios user.  Near the end of the line, /sbin/nologin should appear.  Replace that text with /bin/ash then save and exit vi.

Verify that the Nagios user is able to execute the check_ping plugin.  Replace MyDeviceHere with either an IP address or a network device name that is on your network:

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the ping command (called by check_ping) is not able to resolve a network device name, and the fully qualified dns name was not specified (MyDeviceHere.MyDomainHere.com), edit the /etc/resolv.conf file:

vi /etc/resolv.conf

On a new line in the file, add the following line (replacing MyDomainHere.com with your dns domain name for the network):

search MyDomainHere.com

Verify that sendmail works for the Nagios user.  At the prompt that appears, type a short message, press the Enter key, type a period, then press the Enter key again – replace MyEmailAddressHere@MyDomainHere.com with your email address):

su - nagios -c "/opt/sbin/sendmail -vt MyEmailAddressHere@MyDomainHere.com"

—-

It is important to always verify the Nagios configuration before starting (or restarting after a configuration change) Nagios.  To verify the configuration type the following:

/opt/bin/nagios -v /opt/etc/nagios.cfg

To start up Nagios as a background task (daemon), execute the following:

/opt/bin/nagios -d /opt/etc/nagios.cfg

To stop Nagios that is executing as a background task, type:

ps

InstallNagiosDiskStation11

Then search though the list of processes for the first line that shows /opt/bin/nagios -d /opt/etc/nagios.cfg.  The number at the left of that line, 31152 in this case, is used to stop Nagios.  To stop Nagios, type the following (replace 31152 with the number shown on your screen):

kill 31152

Side note: I tried installing quite a few different C++ compilers that supposedly work with the Synology DSM (see here).  As such, I had to find a way to remove a directory, that directory’s subdirectories, and files.  The following command will completely remove the /usr/local/i686-linux-gnu directory, should the need arise:

rm -rf /usr/local/i686-linux-gnu

At this point, Nagios will hopefully run as a background task, and it should be able to ping and send email alerts.  However, if you were following the above directions, we have not yet instructed Nagios which devices to monitor, and to whom the alert emails should be sent.  The next step is to define the email contacts by modifying the /opt/etc/objects/contacts.cfg file (see the documentation for assistance):

vi /opt/etc/objects/contacts.cfg

After setting up the contacts, we should probably tell Nagios which devices to monitor.  If there are a lot of devices on your network to be monitored, you might find that using Microsoft Excel rather than vi to create the object definitions makes the task more manageable.  Set up a simple worksheet with four columns.  Column A will be used to specify the short host_name for the object to be monitored.  Column B will be used to specify the alias (long description for the object).  Column C will be used to either specify the IP address for the device or the network name for the device.  Column D will be used to identify the group to which the object belongs and the file name to which the definition is saved (the Excel macro supports the following groups: ap, camera, computer, external, other, printer, server, switch).

InstallNagiosDiskStation13

The Excel macro is set up to read a tab delimited file, rather than reading the object description directly from the Excel worksheet.  Highlight all of the rows in the worksheet except for the top header row, and press Ctrl C (or edit – Copy) to copy the definitions to the Windows clipboard in tab delimited format.  Start Notepad (Start – Run – Notepad), and then press Ctrl V (or edit – Paste) to paste the tab delimited object descriptions into Notepad.  The Excel macro code expects the text file to be saved as nagioshosts.txt.

The Excel macro code follows (I image that not many computers still have a second floppy drive installed, so change the B:\Hardware Documentation\Synology\ path as appropriate for your environment):

Private Sub cmdProcessText_Click()
    Dim intFileNumRead As Integer
    Dim intFileNumAP As Integer
    Dim intFileNumCamera As Integer
    Dim intFileNumComputer As Integer
    Dim intFileNumExternal As Integer
    Dim intFileNumOther As Integer
    Dim intFileNumPrinter As Integer
    Dim intFileNumServer As Integer
    Dim intFileNumSwitch As Integer
    Dim intFileNumWrite As Integer

    Dim strLine As String
    Dim strItem() As String

    intFileNumRead = FreeFile
    Open "B:\Hardware Documentation\Synology\nagioshosts.txt" For Input As #intFileNumRead

    intFileNumAP = FreeFile
    Open "B:\Hardware Documentation\Synology\ap.cfg" For Output As intFileNumAP
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "# ap.cfg - lists the wireless access points to be monitored"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10);
    Print #intFileNumAP, "###############################################################################"; Chr(10);
    Print #intFileNumAP, "#"; Chr(10); Chr(10);
    Print #intFileNumAP, "define hostgroup{"; Chr(10);
    Print #intFileNumAP, "        hostgroup_name  ap                      ; The name of the hostgroup"; Chr(10);
    Print #intFileNumAP, "        alias           Local Access Points       ; Long name of the group"; Chr(10);
    Print #intFileNumAP, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumCamera = FreeFile
    Open "B:\Hardware Documentation\Synology\camera.cfg" For Output As intFileNumCamera
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "# camera.cfg - lists the IP cameras to be monitored"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10);
    Print #intFileNumCamera, "###############################################################################"; Chr(10);
    Print #intFileNumCamera, "#"; Chr(10); Chr(10);
    Print #intFileNumCamera, "define hostgroup{"; Chr(10);
    Print #intFileNumCamera, "        hostgroup_name  camera                  ; The name of the hostgroup"; Chr(10);
    Print #intFileNumCamera, "        alias           Local IP Cameras          ; Long name of the group"; Chr(10);
    Print #intFileNumCamera, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumComputer = FreeFile
    Open "B:\Hardware Documentation\Synology\computer.cfg" For Output As intFileNumComputer
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "# computer.cfg - lists the shop floor computers to be monitored"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10);
    Print #intFileNumComputer, "###############################################################################"; Chr(10);
    Print #intFileNumComputer, "#"; Chr(10); Chr(10);
    Print #intFileNumComputer, "define hostgroup{"; Chr(10);
    Print #intFileNumComputer, "        hostgroup_name  computer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumComputer, "        alias           Domain Computers          ; Long name of the group"; Chr(10);
    Print #intFileNumComputer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumExternal = FreeFile
    Open "B:\Hardware Documentation\Synology\external.cfg" For Output As intFileNumExternal
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "# external.cfg - lists the devices external to the LAN network to be monitored"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10);
    Print #intFileNumExternal, "###############################################################################"; Chr(10);
    Print #intFileNumExternal, "#"; Chr(10); Chr(10);
    Print #intFileNumExternal, "define hostgroup{"; Chr(10);
    Print #intFileNumExternal, "        hostgroup_name  external               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumExternal, "        alias           Monitored devices External to the Network ; Long name of the group"; Chr(10);
    Print #intFileNumExternal, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumOther = FreeFile
    Open "B:\Hardware Documentation\Synology\other.cfg" For Output As intFileNumOther
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "# other.cfg - lists the miscellaneous devices to be monitored"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10);
    Print #intFileNumOther, "###############################################################################"; Chr(10);
    Print #intFileNumOther, "#"; Chr(10); Chr(10);
    Print #intFileNumOther, "define hostgroup{"; Chr(10);
    Print #intFileNumOther, "        hostgroup_name  other                 ; The name of the hostgroup"; Chr(10);
    Print #intFileNumOther, "        alias           Miscellaneous Devices ; Long name of the group"; Chr(10);
    Print #intFileNumOther, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumPrinter = FreeFile
    Open "B:\Hardware Documentation\Synology\printer.cfg" For Output As intFileNumPrinter
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "# printer.cfg - lists the printer devices to be monitored"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10);
    Print #intFileNumPrinter, "###############################################################################"; Chr(10);
    Print #intFileNumPrinter, "#"; Chr(10); Chr(10);
    Print #intFileNumPrinter, "define hostgroup{"; Chr(10);
    Print #intFileNumPrinter, "        hostgroup_name  printer               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumPrinter, "        alias           Printers and Copiers  ; Long name of the group"; Chr(10);
    Print #intFileNumPrinter, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumServer = FreeFile
    Open "B:\Hardware Documentation\Synology\server.cfg" For Output As intFileNumServer
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "# server.cfg - lists the servers to be monitored"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10);
    Print #intFileNumServer, "###############################################################################"; Chr(10);
    Print #intFileNumServer, "#"; Chr(10); Chr(10);
    Print #intFileNumServer, "define hostgroup{"; Chr(10);
    Print #intFileNumServer, "        hostgroup_name  server               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumServer, "        alias           Server and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumServer, "        }"; Chr(10); Chr(10); Chr(10);

    intFileNumSwitch = FreeFile
    Open "B:\Hardware Documentation\Synology\switch.cfg" For Output As intFileNumSwitch
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "# switch.cfg - lists the network equipment type devices to be monitored"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# Last Modified: "; Now; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "# HOST GROUP DEFINITIONS"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10);
    Print #intFileNumSwitch, "###############################################################################"; Chr(10);
    Print #intFileNumSwitch, "#"; Chr(10); Chr(10);
    Print #intFileNumSwitch, "define hostgroup{"; Chr(10);
    Print #intFileNumSwitch, "        hostgroup_name  switch               ; The name of the hostgroup"; Chr(10);
    Print #intFileNumSwitch, "        alias           Switche and Similar Devices ; Long name of the group"; Chr(10);
    Print #intFileNumSwitch, "        }"; Chr(10); Chr(10); Chr(10);

    Do While Not (EOF(intFileNumRead))
        Line Input #intFileNumRead, strLine
        strItem = Split(strLine, vbTab)
        'strItem(0) = host_name
        'strItem(1) = alias
        'strItem(2) = address
        'strItem(3) = hostgroups
        Select Case strItem(3)
            Case "ap"
                intFileNumWrite = intFileNumAP
            Case "camera"
                intFileNumWrite = intFileNumCamera
            Case "computer"
                intFileNumWrite = intFileNumComputer
            Case "external"
                intFileNumWrite = intFileNumExternal
            Case "other"
                intFileNumWrite = intFileNumOther
            Case "printer"
                intFileNumWrite = intFileNumPrinter
            Case "server"
                intFileNumWrite = intFileNumServer
            Case "switch"
                intFileNumWrite = intFileNumSwitch
        End Select

        Print #intFileNumWrite, "define host{"; Chr(10);
        Select Case strItem(3)
            Case "ap"
                Print #intFileNumWrite, "        use             ap              ; Inherit default values from a template"; Chr(10);
            Case "camera"
                Print #intFileNumWrite, "        use             camera          ; Inherit default values from a template"; Chr(10);
            Case "computer"
                Print #intFileNumWrite, "        use             computer        ; Inherit default values from a template"; Chr(10);
            Case "external"
                Print #intFileNumWrite, "        use             external        ; Inherit default values from a template"; Chr(10);
            Case "other"
                Print #intFileNumWrite, "        use             other           ; Inherit default values from a template"; Chr(10);
            Case "printer"
                Print #intFileNumWrite, "        use             printer         ; Inherit default values from a template"; Chr(10);
            Case "server"
                Print #intFileNumWrite, "        use             server          ; Inherit default values from a template"; Chr(10);
            Case "switch"
                Print #intFileNumWrite, "        use             switch          ; Inherit default values from a template"; Chr(10);
        End Select
        Print #intFileNumWrite, "        host_name       "; strItem(0); "         ; The name we're giving to this device"; Chr(10);
        Print #intFileNumWrite, "        alias           "; strItem(1); "         ; A longer name associated with the device"; Chr(10);
        Print #intFileNumWrite, "        address         "; strItem(2); "         ; IP address of the device"; Chr(10);
        Print #intFileNumWrite, "        hostgroups      "; strItem(3); "         ; Host groups this device is associated with"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);

        Print #intFileNumWrite, "define service{"; Chr(10);
        Print #intFileNumWrite, "        use                     generic-service ; Inherit values from a template"; Chr(10);
        Print #intFileNumWrite, "        host_name               "; strItem(0); "        ; The name of the host the service is associated with"; Chr(10);
        Print #intFileNumWrite, "        service_description     PING            ; The service description"; Chr(10);
        Print #intFileNumWrite, "        check_command           check_ping!3000,25%!5000,90%    ; The command used to monitor the service"; Chr(10);
        Print #intFileNumWrite, "        normal_check_interval   5               ; Check the service every 5 minutes under normal conditions"; Chr(10);
        Print #intFileNumWrite, "        retry_check_interval    1               ; Re-check the service every minute until its final/hard state is determined"; Chr(10);
        Print #intFileNumWrite, "        }"; Chr(10); Chr(10);
    Loop

    Close #intFileNumRead
    Close #intFileNumAP
    Close #intFileNumCamera
    Close #intFileNumComputer
    Close #intFileNumExternal
    Close #intFileNumOther
    Close #intFileNumPrinter
    Close #intFileNumServer
    Close #intFileNumSwitch
End Sub

The files that are created use Unix/Linux standard line feed end of line marker characters, rather than the Windows standard carriage return/line feed combination characters.  As such, opening the generated files using Notepad is not advised.  Copy the generated files back to the /opt/etc/objects/ path on the DiskStation (copy the files to a Shared Folder on the DiskStation, then use the cp command to copy the files from the share location to /opt/etc/objects/ – the Shared Folders are typically created as a subdirectory in the /volume1/ directory).

If you decided to use some of the non-standard Nagios group names (as I did), those non-standard group names must be defined in the /opt/etc/objects/templates.cfg file:

vi /opt/etc/objects/templates.cfg

A portion of the additional entries that I made in this file include the following:

define host{
       name                    ap      ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, access points are monitored round t
       check_interval          5               ; Actively check the access point every 5 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each access point 10 times (max)
       check_command           check_ping      ; Default command to check if access points are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   30              ; Resend notifications every 30 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              ap ; Host groups that access points should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

define host{
       name                    camera  ; The name of this host template
       use                     generic-host    ; Inherit default values from the generic-host temp
       check_period            24x7            ; By default, cameras are monitored round t
       check_interval          60              ; Actively check the device every 60 minutes
       retry_interval          1               ; Schedule host check retries at 1 minute intervals
       max_check_attempts      10              ; Check each device 10 times (max)
       check_command           check_ping      ; Default command to check if device are "alive"
       notification_period     24x7            ; Send notification out at any time - day or night
       notification_interval   240             ; Resend notifications every 240 minutes
       notification_options    d,r             ; Only send notifications for specific host states
       contact_groups          admins          ; Notifications get sent to the admins by default
       hostgroups              camera ; Host groups that cameras should be a member of
       register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
       }

Nagios will not know that it should read the additional configuration files until it is told to do so by modifying the /opt/etc/nagios.cfg file.

vi /opt/etc/nagios.cfg

Add the following lines to the nagios.cfg file:

# Charles Hooper's object types
cfg_file=/opt/etc/objects/ap.cfg
cfg_file=/opt/etc/objects/camera.cfg
cfg_file=/opt/etc/objects/computer.cfg
cfg_file=/opt/etc/objects/external.cfg
cfg_file=/opt/etc/objects/other.cfg
cfg_file=/opt/etc/objects/printer.cfg
cfg_file=/opt/etc/objects/server.cfg
cfg_file=/opt/etc/objects/switch.cfg

We have made a large number of changes to the configuration files, so it is important to verify that there are no errors in the configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If no errors are found in the configuration, terminate (kill) nagios and then restart as described above.

—-

Update July 28, 2013:

When attempting to start Nagios in daemon mode (/opt/bin/nagios -d /opt/etc/nagios.cfg) I encountered a couple of problems related to permissions for the Nagios user.  The nagios process was not listed when I used the ps command.  I then tried executing the following commands:

touch /opt/var/nagios.log
chown nagios:nagios /opt/var/nagios.log

Nagios was then able to start in daemon mode, but wrote messages similar to the following in the /opt/var/nagios.log file:

[1375058364] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058364] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058364] Bailing out due to errors encountered while attempting to daemonize… (PID=11451)
[1375058656] Nagios 3.5.0 starting… (PID=12936)
[1375058656] Local time is Sun Jul 28 20:44:16 EDT 2013
[1375058656] LOG VERSION: 2.0
[1375058656] Warning: Could not open object cache file ‘/opt/var/objects.cache’ for writing!
[1375058656] Failed to obtain lock on file /opt/var/nagios.lock: Permission denied
[1375058656] Bailing out due to errors encountered while attempting to daemonize… (PID=12936)
[1375060107] Error: Unable to create temp file for writing status data: Permission denied
[1375060117] Error: Unable to create temp file for writing status data: Permission denied
[1375060127] Error: Unable to create temp file for writing status data: Permission denied
[1375060137] Error: Unable to create temp file for writing status data: Permission denied
[1375060147] Error: Unable to create temp file for writing status data: Permission denied
[1375060157] Error: Unable to create temp file for writing status data: Permission denied

I tried to set the permissions for a couple of other files, only to find another long list of Permission denied messages:

touch /opt/var/objects.cache
touch /opt/var/nagios.lock
touch /opt/var/nagios.tmp
chown nagios:nagios /opt/var/objects.cache
chown nagios:nagios /opt/var/nagios.lock
chown nagios:nagios /opt/var/nagios.tmp

I then recalled that I had seen similar messages on the DiskStation DS412+.  I then tried a different approach, creating a nagios directory in the /opt/var directory, creating a couple of subdirectories in that directory, and then assigning nagios as the owner of that directory structure:

mkdir /opt/var/nagios
mkdir /opt/var/nagios/archives
mkdir /opt/var/nagios/spool
mkdir /opt/var/nagios/spool/checkresults
chown nagios:nagios /opt/var/nagios -R
vi /opt/etc/nagios.cfg

In the nagios.cfg file, I made the following changes:

log_file=/opt/var/nagios/nagios.log
status_file=/opt/var/nagios/status.dat
lock_file=/opt/var/nagios/nagios.lock
temp_file=/opt/var/nagios/nagios.tmp
log_archive_path=/opt/var/nagios/archives
check_result_path=/opt/var/nagios/spool/checkresults
state_retention_file=/opt/var/nagios/retention.dat
debug_file=/opt/var/nagios/nagios.debug

After saving the file and exiting vi, I restarted Nagios in daemon mode.  Reading the last 100 lines of the Nagios log file is now accomplished with this command:

tail -n 100 /opt/var/nagios/nagios.log

—-

There are a lot of seemingly interesting Nagios plugins, including check_oracle (I believe that this plugin requires the Oracle client to be installed – good luck with that install).  On one of the DiskStations the check_snmp plugin did not compile, while on the other DiskStation the check_http plugin did not compile.

It might be interesting to see what solutions readers are able to develop from the above starting point.  The above information is the result of many hours of experimentation as well as a couple minutes reading through sections of the Nagios documentation (it reads like the Oracle Database documentation, so it should be an easy read once I am in the right mood) and hopelessly scanning the ‘net for information about obscure error messages.  Have fun, and try not to put the DiskStation out of service due to a mistaken file copy.

Update November 19, 2013:

Installing an updated version of the Synology DSM operating system may temporarily disable Nagios.  Make backups of all Nagios confirguration files (copying the files with the cp command to a directory in /volume1 is generally safe) before installing different versions of the Synology DSM operating system.

The DSM 4.3 operating system installation apparently removed the /var/services/homes directory.  That directory removal makes it impossible for the Nagios user to login to run various commands.  I assume that the removal of the homes directory is intentional, so a work around for that problem:

mkdir /var/services/home
mkdir /var/services/home/nagios
chown nagios:nagios /var/services/home/nagios -R
vi /etc/passwd

In the /etc/passwd file, change all /homes/ entries to /home/ then save and exit vi.

The installation of the different DSM version (including versions before 4.3) will likely also replace/remove the libltdl.* files located in /opt/local/lib and /usr/lib, so we need to copy those files back into the correct directories:

cp /opt/lib/libltdl.so.3 /opt/local/lib/libltdl.so.3
cp /opt/lib/libltdl.so.3 /usr/lib/libltdl.so.3
cp /opt/lib/libltdl.so /usr/lib/

Once the above items are copied, try executing the check_ping command as the nagios user (replace MyDeviceHere with either an IP address or the name of a device on your network).

su - nagios -c "/opt/libexec/check_ping -H MyDeviceHere -w 5000,80% -c 5000,80% -p 5"

If the DiskStation reports that the check_ping command was not found, then copy that file back to the /opt/libexec/ directory.  If the above command was successful, try verifying the Nagios configuration:

/opt/bin/nagios -v /opt/etc/nagios.cfg

If the verification was successful, start Nagios as a daemon:

/opt/bin/nagios -d /opt/etc/nagios.cfg

Execute the ps command and verify that the above command is listed in the running processes:

ps

Finally, verify that Nagios is still set to start automatically as a daemon:

ls /opt/etc/init.d/S81nagios

If a file is listed when the above command is executed, then Nagios should now be fully repaired.

Update August 11, 2015:

You may at some point need to verify that a http web server is online.  If you execute the following command (replace http://www.mydomain.com with an actual web server for a domain to be monitored):

/opt/libexec/check_http -H www.mydomain.com

You may see one of the following errors:

/opt/libexec/check_http: error while loading shared libraries: libssl.so.0.9.8: cannot open shared object file: No such file or directory
/opt/libexec/check_http: error while loading shared libraries: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory

The problem is likely caused by two missing symbolic links.  The following commands worked on the DS415+ (and also both the DS1813+ and the DS412+):

ln -s /usr/lib/libssl.so.1.0.0 /usr/lib/libssl.so.0.9.8
ln -s /lib/libcrypto.so.1.0.0 /usr/lib/libcrypto.so.0.9.8




Review of Synology DS415+, How to Add Memory to the DS415+, Web Pages Not Databases

19 12 2014

December 19, 2014

As frequent readers of this blog likely know, I have not posted very many articles recently.  It seems that lately I struggle to find time to post interesting items to the blog, which is mostly focused on Oracle Database notes, so this is a three in one off-topic post.  I have been busy with a lot of items that are completely unrelated to Oracle Database.  One of those busy (non) work items is setting up a Synology DiskStation DS415+ NAS with four 6TB hard drives.

Part 1: Reviewing the Synology DS415+

Below is my review of that NAS, as posted on Amazon:


http://www.amazon.com/Synology-America-Station-Attached-DS415/dp/B00IKTSSIO/

I have previously purchased and implemented Synology Diskstation DS1813+, DS412+, DS214+, DS212+, DS213j, and DS112j units, so Synology network attached storage (NAS) devices are not entirely new to me (I also have experience with administering various Linux and Windows servers). Most of the Synology NAS units are configured primarily as FTP destinations, although the units also provide one or more Windows shares to network computers using either Active Directory integration or Synology Diskstation internal user accounts, as well as offering network time protocol (NTP) services (to security cameras, Active Directory, and/or a PBX system), and Nagios network monitoring.

For the most part, the Synology NAS units have been very reliable. That said I have experienced occasional problems with most of the NAS units that provide FTP services to security cameras. Eventually, all of the permitted client connections become “in use” due to the Synology sometimes remembering FTP connections long after the security cameras have forgotten about those connections. This connection “remembering” issue causes a situation where client computers attempting to connect for Windows file sharing are denied access to the server, but the problem also affects the web-based access to the Synology DSM operating system. There have been issues with the DiskStation DS412+ locking up roughly 90% of the time that a reboot is attempted through the web-based DSM, resulting in a blue flashing light on the front console that could only be fixed by pulling the electrical power cord (note that it is usually possible to kill phantom connections from the DSM interface, if that interface will display, so that a reboot is typically not required to recover from the “remembered” connections). None of the other DiskStations have experienced lockups during an attempted reboot (or any other lockups that I am able to recall).

The DS415+ was bought to take the place of a DS212+, whose CPU simply cannot keep pace with 15+ high definition security cameras feeding the NAS with motion triggered video clips via FTP. I had considered purchasing the new version of the DS1813+ (possibly called a DS1815+), but that model has not been released yet, probably would have the same Intel CPU model as the DS415+ (the DS1812+, 1813+, and DS412+ all have essentially the same CPU model), and likely would have had a higher electric wattage consumption compared to the DS415+ if I filled all drive bays. So, I selected the DS415+ as a device that had some known compromises, but with also some power efficiency benefits that are not present in the DS1813+ and DS412+.

The DS415+ ships with 2GB of memory in a regular memory slot, rather than being soldered to the system board as is the case for the DS412+, opening the possibility for future memory expansion. With two gigabit network ports, two USB 3 ports (one USB 2), and one eSATA port , the Synology DiskStation DS415+ offers decent storage expansion options, although those options are more limited than what is offered by the DS1813+. The DS415+ internally supports up to four hard drives in one of several software RAID levels (SHR, RAID 1, RAID 5, RAID 6, and RAID 10). Drive installs are potentially performed without using a screwdriver, although screws are provided to hold the drives in place if the screw-less arrangement seems too flimsy. Unlike the DS1813+, the drive carriages are held in place by a thumb-release locking clip, rather than a flimsy lock and key mechanism. The DiskStation DS415+ more than triples in weight with four typical hard drives installed – the light weight construction seems to be typical of the various Synology NAS units (at least those that support eight or fewer drives).

The DS415+ ships without an installed operating system, so the first task after powering on the DS415+ with the hard drives installed involves installing the latest DSM operating system. The process for installing the operating system is fairly simple, unless there is another DiskStation NAS on the same LAN (the directions provided in the printed quick start guide caused the DSM web page for another already set up Synology NAS to appear, rather than the operating system installation page for the DS415+ – the old Synology setup program that used to ship on CD with the NAS units probably would have helped in this situation). Once the NAS has nearly automatically downloaded the latest version of the operating system, the operating system installation should complete in a couple of minutes without a lot of issues.

The Synology DSM operating system offers a fantastic graphical user interface which implements HTML5 and CSS, displaying the interface in a web browser. Unfortunately, Synology tends to rearrange the location of various settings with each DSM version (and change the shape/color of icons), which makes it a little confusing when managing different Synology NAS units. Much like Windows Explorer, the File Station utility that is built into the DSM operating system supports context sensitive drag and drop, and well as right mouse button popup menus. The File Station utility that is included in the latest DSM version supports displaying more than 300 files in a paged view – that 300 file limit was an irritation when attempting to copy, move, or delete several thousand security camera videos on a daily basis through the GUI using older DSM versions. Like the other DSM models, the DS415+ supports telnet sessions, which allow access to the Linux command line and the configuration of scheduled script execution through the modification of the /etc/crontab file (side note: I have had issues with only the DS112j automatically resetting the contents of the /etc/crontab file when the DiskStation was power cycled – I believe that problem was caused by the use of spaces rather than tabs as field delimiters in the file).

A plain vanilla install of the DSM 5.0-4528 (as of today at update 1) offers support for network shares (Windows, MAC, and NFS), iSCSI, Active Directory Integration, FTP (standard FTP, anonymous FTP, FTPS, SFTP, TFTP), website hosting, WebDAV, SNMP, network time protocol (NTP), remote command line with telnet or SSH, integrated firewall, VPN client, USB printer sharing, and a handful of other capabilities. The DSM operating system’s native functionality is easily expanded through the download of free software packages from the Package Center. The packages extend the DS415+’s capabilities to include antivirus, Asterisk IP phone server, Internet radio rebroadcasting to networked computers, DNS server functionality, iTunes Server, VPN server, RADIUS server, email server, CRM and ERP packages, WordPress, IP camera monitoring (now includes a license for two IP cameras, additional licenses are roughly $50 per camera), and a variety of other features. Additionally, ipkg support permits the installation of more than 900 additional applications, including C++ compilers – which in theory suggests that the source for the Nagios network monitoring utility can be downloaded and compiled on the DS415+ (I was able to compile Nagios on a DS1813+, DS412+, and DS212+, and am close to having Nagios working on the DS415+).

I installed four new Western Digital Red 6TB drives, configured in a software RAID 10 array (DSM offered to automatically configure the drives in a SHR array during the initial setup, but did not offer a RAID 10 configuration at that time, so configuring the drives for RAID 10, to reduce recovery time in the event of a drive failure, requires a couple of additional mouse clicks). Peak single network link data transfer speeds so far have been impressive, at close to the maximum possible transfer rate for a gigabit network (achieving roughly 112-115MB/s ~ 919Mb/s), which is virtually identical to the speed seen with the DS1813+ that was using four 3TB Western Digital Red drives, and significantly faster than the DS212+ which has a much slower non-Intel CPU and two Western Digital Green 2TB drives. Pushing approximately 41.6GB of large files to the DS415+ from a client computer consumed between 9% and 11% of the DS415+’s CPU (for comparison, this test consumed 20% of the DS1813+ CPU capacity).

I did not test the DiskStation’s IEEE 802.3ad dynamic link aggregation – there was no apparent benefit when I tested the feature with the DS1813+, an HP 4208vl switch, and two client computers. The gigabit switch to which the DS415+ is attached does not support IEEE 802.3ad dynamic link aggregation, so it would have been a very bad idea to connect both of the supplied network cables to the switch.

Power Consumption of the DS415+(based on the output of a Kill-A-Watt meter):
* 1.1 watts when powered off
* 16 watts with no drives installed and unit is sitting idle
* 44 watts with four Western Digital Red 6TB drives while the unit is receiving files at a rate of 112-115MB/s (for comparison, this test required 46 watts with the DS1813+ when outfitted with four Western Digital Red 3TB drives)
* 39 watts with four Western Digital Red 6TB drives installed while the unit is sitting idle for a couple of minutes (identical to the value measured for the DS1813+)
* 14.5 watts with four Western Digital Red 6TB drives hibernating

Even though the throughput and CPU of the DS415+ with software based RAID are no match for the performance and capacity of a high end Windows or Linux server, the Synology NAS units consume far less electrical power, are competitively priced (even though these units are expensive once four 6TB drives are added), should yield a lower total cost of ownership (TCO), and are likely easier to configure and maintain for their intended purpose than either a Windows or Linux server. Like the DS1813+, the DS415+ supports up to 512 concurrent remote connections from other devices (a computer with five mapped drives pointing to the DS415+ consumes five of those 512 concurrent connections). The 512 connection count may not be the hard upper limit on the Synology NAS units – I have encountered some problems with the DS112J blocking connection attempts long before its 64 concurrent limit is reached – I do not yet know if this issue affects any of the other Synology device models. The lack of an available redundant power supply is a shortcoming of the DS1813+ and other less expensive Synology NAS units, but the power supply for the DS415+ (and the DS412+) is external, so it should be easier to obtain and install replacement power supplies for the DS415+ should the need arise (the power supply may not have a standardized connection, which would permit a replacement power supply to be purchased from a third party supplier).

Synology offers a group of customer support forums. However, those forums are apparently not actively monitored by Synology support staff. So far, other than whether or not Plex on the DS415+ is able to transcode 1080P videos, there has been no significant negative comments about the DS415+ on the Synology forums.

The Synology DiskStation DS212+ has served its role surprisingly well for the last two and a half years, even when equipped with slow Western Digital Green drives in a software RAID 1 array. While that NAS was able to support 15+ cameras that potentially simultaneously send video clips via FTP, concurrently allowing a Windows client to connect to the share for the purpose of reviewing the video clips was often just a bit too much of a load for the less powerful DS212+. I am expecting few problems from the DS415+ when serving in a similar role along with supporting a couple of optional packages such as the Media Server, Audio Station, Nagios (currently receiving a Segmentation fault (core dumped) error message when executing the check_ping test command found in my “Install Nagios on a Synology DiskStation DS1813+ or DS412+” blog article), and possibly Plex. Most of the optional Synology packages appear to be decent. However, the Synology Surveillance Station, while possibly useful, still seems to be an overly fragile, overly expensive, experimental package that tends to tax the wireless and wired network much more than the FTP solution that I use with my cameras (your experience with that package may be different than mine).


Part 2: Voiding the Warranty on the Synology DS415+ (Upgrading the Memory to 8GB)

The DS415+ ships with 2GB of DDR3 1600 MT/s (PC3-12800) CL11 SODIMM memory pre-installed, and from what I am able to determine, Synology does not and will not offer memory upgrades for the DS415+.  The memory is installed in a laptop style memory socket, so… I installed a Crucial 8GB memory card into the DS415+.  The greatest difficulty in the memory upgrade, other than the concern for destroying a $630 device, was breaking into the DS415+ case without destroying the plastic clips that hold the two halves of the case together.  I posted the upgrade process to one of the Synology forum threads, but I thought that I would also post the process in this blog article so that it is easier to find the steps for the process later (the pictures do not fully display in the Synology forum thread).

If you have never disassembled a desktop or laptop computer, consider just being happy with the installed 2GB of memory to avoid damaging the Synology. If you have never removed a memory card from a laptop, consider just being happy with the installed 2GB of memory to avoid damaging the Synology. Upgrading the memory will likely void the warranty – there was a label on the installed memory card indicating that the warranty was void if the label was removed from the memory – the label is still attached to my old memory card (so, maybe my warranty is still in effect ;-) ).

Step 1, unplug the Synology and attach labels to each of the hard drives. Write the numbers 1 through 4 on the labels to indicate the ordered position of the drives in the NAS. Release the latch, and remove the drives. Make certain that you ground yourself by touching a large metal object before attempting to open the NAS. When working on the NAS, do not touch any of the contacts inside the NAS or on the memory card (click for a larger view of the picture):

OLYMPUS DIGITAL CAMERA

There are three screws on the back of the Synology that will need to be removed using a #1 Phillips screw driver. Wait to remove the screw that is pointed to by the screw driver in this picture until you have successfully separated the two halves of the NAS case:

OLYMPUS DIGITAL CAMERA

There are plastic clips permanently attached to the smaller half of the NAS case. Use a flat blade screw driver to gently pry up on the larger section of the case at the top-back of the case near the seam between the two sections of the case. A popping sound should be heard when the latch releases. When the first latch releases, move the screw driver to the position of the next latch and gently pry up to again slightly raise the larger section of the case at the seam until the next latch releases. Continue working to release the remaining latches along the seam. Once all of the latches on the top are released, it should be possible to pivot the larger portion of the case so that the bottom latches release. Separate the two halves, being careful not to damage the retaining clips that normally hold the hard drives in place:

OLYMPUS DIGITAL CAMERA

There are four screws on each side of the drive cage – four of the screws are long, and four are short. Remove the screws using the #1 Phillips screw driver. Make note of where the longer screws were installed. Remove the third screw from the back of the NAS if it was not already removed:

OLYMPUS DIGITAL CAMERA

Gently lift the drive cage straight up and then set it aside. Note that there are two slots pointed to by the arrows in the picture below – when it is time to re-assemble the NAS, the gold colored ends of the two circuit boards must be reinserted into those slots. There are two fan connectors circled in the picture – if the fan connectors are carefully removed from the sockets, it is possible to move the drive cage out of the way. The circuit board holding the memory card is below the metal plate – that metal plate should lift straight out of the enclosure, although there may still be some wires that attach it to the enclosure

OLYMPUS DIGITAL CAMERA

There is a metal clip at each end of the pre-installed memory card. Gently push the left clip to the left, and the right clip to the right until the memory card releases. When removing the memory card, make note of the location of the cut out section of the slot, so that the replacement memory card may be installed in the same orientation:

OLYMPUS DIGITAL CAMERA

Slide the replacement memory card into the slot, and gently tilt it down until the two clips lock the memory card in location.

OLYMPUS DIGITAL CAMERA

Reverse the order of the steps to reassemble the NAS. Reinsert the drives in the correct order by referring to the labels. Plug in the NAS – the blue light on the front of the unit may flash for a minute or two.

If all goes well, the Resource Monitor in the Synology interface should show 8GB of memory installed:

dsm51-8gb

After a day or so, the NAS may show in Resource Monitor that it is using 6.9GB (or 7.0GB) of memory for the file cache, as shown below.

DSM51-8GB-2

Why install additional memory in the DS415+?  The 2GB of memory should be sufficient for most tasks that are typically assigned to a NAS.  I was mostly just curious after seeing a couple of questions on Amazon about memory upgrades, as well as on the Synology forums, without a clear description of the upgrade process, and only a passing mention of the memory specifications in a review of the DS415+.  There were a handful of discussion threads on the Synology forums were people were trying various memory modules in their DS1815+ units, and mostly failing to locate compatible memory modules (the Crucial memory module that I used was reported to not work in the DS1815+’s easily accessible memory slot).  So, I bought the memory, tried to figure out how to break into the DS415+ case, and took pictures as I put the unit back together (I thought that there was little point in me taking pictures while disassembling the NAS, especially if I destroyed the NAS during the upgrade attempt, but doing so while disassembling the unit is probably a good idea).

How does the DS415+ utilize the additional memory?  Mostly for the file cache (the NAS runs Linux at its core) – a day after the upgrade I checked the Resource Monitor and found that the Cached statistic increased from 670.5MB to roughly 7GB.  If there is an interest in running various background packages on the NAS (for instance, Nagios, Plex, WordPress, etc.), then the additional memory could have a significant positive impact on performance.  My installation directions for installing Nagios on a Synology DS412+, DS1813+, and DS212+ almost work with the DS415+.  I receive a Segmentation Fault, Core Dumped error message when trying to run Nagios or the Nagios check_ping plugin – I seem to recall seeing similar error messages when trying to find a compiler that would work on the DS412+ and DS1813+, so maybe there is still hope for Nagios on the DS415+ if I try downloading a different compiler (the ToolChain library for the DS415+ was released roughly a week ago, so there may be a solution – I was able to compile Nagios once and it executed without a Segmentation Fault error, but I could not reproduce the result a second time).

Part 3: Web Pages Not Databases

While I have had this blog on wordpress.com for a bit over five years, I had never tried using the standalone version of WordPress.  An opportunity developed recently to use the standalone version of WordPress.  The website for the company were I work has been in the process of being redesigned since roughly June by an outside web development company.  That web development company was making very slow progress on the website, selected to use the standalone version of WordPress as the development environment, and somehow was apparently given the task of designing the website so that it looked great on an Apple iPad, Apple iPhone, and even a now extinct Motorola Xoom tablet – any compatibility with Windows desktop computers using Internet Explorer seemed to be purely accidental, but the website apparently appeared fine on the developer’s Mac.  (Hint for developers: Test what you create using equipment that is similar to your target audience’s equipment.)

I became involved in the new website development a couple of weeks ago, trying to refocus the web development company on what the target viewers of the website will likely be using to view and interact with the new website – chances are that a too-large-to-ignore percentage of those target viewers are still running Windows Vista or Windows XP, and will be accessing the site using some version of Internet Explorer other than the latest version (some websites still are not compatible with Internet Explorer 11, so the potential target viewer may still be forced to run Internet Explorer 8 or 9 – Internet Explorer 8 is the last version supported on Windows XP, and Internet Explorer 10 is the last version supported on Windows Vista).  Ability to Print?  No, people have no need to print the website’s contents (especially not using a PDF virtual printer from Adobe or BlackIce, where all of the text from the new website was replaced by odd symbols) and have it appear on the printed page anything like what appears on-screen.  Viewing the website in a non-maximized window – who would be so silly to do such a thing?  Hamburgers are not on the menu – they are the menu, or it seems that is the official name for the three parallel white lines that sometimes appear on screen and sometimes in the printed copy.  Developers are full of fun surprises some days.

A week ago (now two weeks ago) the web development company was told to stop development on the website for a variety of reasons.  A lost six months of development, or an opportunity to beat one’s head on the table and hammer out a solution for the issues that still existed with the website?  I installed the WordPress package on a Synology DS213j NAS and on the Synology DS415+ NAS and had a go at fixing the issues with the website without affecting what the web development company had done to date.  I picked up the development process reasonably quickly (the five years of blogging on WordPress helped), but found that I was repeatedly flipping back and forth between WordPress’ Visual editor and the Text editor while trying to fix the issues and add additional text information to the pages.  Additionally, the path to files (and web pages) on the Synology must also include /wordpress (when working in a Telnet session, the actual path is /volume1/web/wordpress).  My HTML and cascading style sheet (CSS) skills were very rusty, so tasks that are incredibly easy in Microsoft Excel and Microsoft Word, such as manipulating tables, took quite a bit of Google search skill, for instance determing how to right align or center certain columns in a table without altering the HTML TD attributes of each table cell in the column when composing the table using the WordPress Text editor.  The WordPress pages appeared acceptable on the Synology NAS units, so the same changes were applied to the web development company’s best efforts – hopefully this is not true, but I think that I made about as much progress on the website in three days time as did the web development company in the last three months.  Since then I have been fixing other minor issues, such as the search box disappearing when the website is viewed using a Windows 8.1 tablet that lacks a keyboard and mouse, and improving the website appearance.  I learned, or relearned a couple of skills along the way, so this process definitely was not a wasted effort.

The VP of Manufacturing at the company where I work has become somewhat of an expert recently at creating video and composing written content for the website, so the newly redesigned website is a good platform for his many years of manufacturing experience.  If you feel so inclined, take a look at the new company website, and leave a message here to let me know what you think about the website.  The website development company didn’t like my green highlight when the mouse pointer passed over links – I guess that explains one reason why I am not a graphics artist.

For future reference, this is the CSS code that I constructed to format some of the tables that appear on the website.  In the HTML code, I assigned the table to have a class of km_equipment_table, and then added the following to the style sheet’s CSS:

Handle general formatting of the table:

.km_equipment_table { width:100%; border:1px solid; padding:8px;  }
.km_equipment_table td { padding:8px; border:1px solid; }
.km_equipment_table th {background: #0072BC;}

Alternate between two colors for all but the header row of the table – note that this code is ignored by Internet Explorer 8.0:

.km_equipment_table tr:nth-child(even) { /*(even) or (2n 0)*/
 background: #F1F1F1; border:1px solid;
}
.km_equipment_table.tr:nth-child(odd) { /*(odd) or (2n 1)*/
 background: #FFFFFF; border:1px solid;
}

Set the column alignment of all tables that were assigned the class of km_equipment_table – the first column is number 1 (not 0) – note that this code is ignored by Internet Explorer 8.0:

.km_equipment_table td:nth-child(1) {
    text-align: left;
}
.km_equipment_table td:nth-child(2) {
    text-align: center;
}
.km_equipment_table td:nth-child(3) {
    text-align: right;
}
.km_equipment_table td:nth-child(4) {
    text-align: right;
}
.km_equipment_table td:nth-child(5) {
    text-align: right;
}
.km_equipment_table td:nth-child(6) {
    text-align: right;
}
.km_equipment_table td:nth-child(7) {
    text-align: right;
}

To keep the printed copy of the page appearing correct, I had to specify @media screen for several of the style sheets.  As such, a special style sheet, print.css, was previously set up to handle formatting when printing.  Among other adjustments in that print.css style sheet, I added the following so that the column alignment worked correctly in the printed copy of the web pages (note that this code did not work on Internet Explorer 8.0):

.km_equipment_table {
    border:solid #000 !important;
    border-width:1px 0 0 1px !important;
}
.km_equipment_table.th, .km_equipment_table.td {
    border:solid #000 !important;
    border-width:0 1px 1px 0 !important;
}
.km_equipment_table td:nth-child(1) {
    text-align: left;
}
.km_equipment_table td:nth-child(2) {
    text-align: center;
}
.km_equipment_table td:nth-child(3) {
    text-align: right;
}
.km_equipment_table td:nth-child(4) {
    text-align: right;
}
.km_equipment_table td:nth-child(5) {
    text-align: right;
}
.km_equipment_table td:nth-child(6) {
    text-align: right;
}
.km_equipment_table td:nth-child(7) {
    text-align: right;
}

In the WordPress Text editor for the page containing the table, I constructed the HTML code for the table to begin as follows – the style and border specifications probably could have been handled in the style sheet, but the printed output was not ideal without these changes:

<table class="km_equipment_table" style="border-collapse: collapse;" border="1" width="100%">

To handle cases where the filenames on the old web server had to be redirected to the correct page on the new web server, adjustments had to be made to the .htaccess file – .htaccess files found in parent directories will apply to child directories also.  I spent a couple of days trying unsuccessfully to make the page redirections work, and then stumbled on a solution, again using the WordPress package on a Synology NAS as a test bed.  The old website had a number of web page addresses that contained ? characters, such as this one:

/page.php?menu_id=10

To send requests for that page to the WordPress permalinks naming convention that uses the article titles as the web page address, I added the following to the .htaccess file (R=301 indicates that this is a permanent redirect, while L indicates that this is the last rule that should be processed):

RewriteCond %{QUERY_STRING} menu_id=10
RewriteRule (.*) /industries/? [R=301,L]

To redirect a web page found on the old server that was named large_machining.htm to the appropriate page in WordPress, I added the following to the .htaccess file:

RewriteRule ^large_machining.htm/?$ /large-machining/ [R=301,L]

Those rewrite rules must be wrapped in the file, so a portion of the file may appear as follows – note that if there were a page on the old server with a menu_id=100, that entry must appear before the entry for menu_id=10, and the entry for menu_id=10 (and menu_id=11) must appear before the entry for menu_id=1 – otherwise the person attempting to visit the website from a saved bookmark (favorite) may be sent to the wrong web page:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
     
    # industries
    RewriteCond %{QUERY_STRING} menu_id=10
    RewriteRule (.*) /industries/? [R=301,L]
     
    # careers
    RewriteCond %{QUERY_STRING} menu_id=12
    RewriteRule (.*) /careers/? [R=301,L]
     
    # ... many more
    RewriteCond %{QUERY_STRING} menu_id=1
    RewriteRule (.*) /about/? [R=301,L]
    
    # Misc. redirects
    RewriteRule ^ABOUT.HTM/?$ /about/ [R=301,L]
    
    RewriteRule ^large_machining.htm/?$ /large-machining/ [R=301,L]
</IfModule>
    
# BEGIN WordPress
<IfModule mod_rewrite.c>
    
    # ... Standard WordPress entries here
</IfModule>
# END WordPress

Part 4: Idle Thoughts (Yes, this was supposed to be a three part article)

So, I was becoming impatient while waiting for the web development company to finally push the new version of the website over to the publically accessible website (I expected this process to happen a week ago, last Friday, but it did not go live until midnight today).  This past Monday I was beginning to think that this task was too much for the web development company.  So, I set out to learn a bit about Amazon’s AWS hosting options.  It turns out that Amazon offers a t2.micro hosting solution that is free for a year.  At the time I was almost completely unfamiliar with Amazon’s hosting offerings, so I signed up for a free account.  After puttering around for a bit, I managed to create an instance on AWS, figured out that I needed to assign an elastic IP address to the instance, determined how to connect to the instance using Putty, installed LAMP in the t2.micro instance, set up an FTP server in the instance, installed WordPress in the instance, imported the new website’s contents, changed the domain’s global DNS settings, and had a fully functional website in probably five or six hours – not too bad for having so little experience with those items. The website in the t2.micro instance crashed three times on me this past Tuesday during final testing due to consuming the full 1GB of memory that is allocated to those types of instances, but at least I think that I was able to demonstrate that the web development company was either seriously dragging this project out longer than necessary, or there is a problem with their method of assigning priority to projects.  The new website went live this morning with a couple of minor issues: 1) No one onsite at my company could view the website because the development company insisted on removing the www. portion of the website address (at companies that use an Active Directory internal domain, that causes the DNS name resolution to point to the web server on one of the internal domain controllers – none of those domain controllers has web server software installed); 2) The .htaccess file that I spent a couple of days creating was overwritten by a file written by the developer – that file either redirected most of the traffic to the wrong page (see my notes above about why the order of entries in that file is important), or sent the person to a 404 error page (page not found) for all historic web page filenames; 3) College was spelled as Collage.  Oh, well.  Close counts in website development, just as it does with horse shoes and hand grenades.





On the Topic of Technology… 7

26 09 2014

September 26, 2014

(Back to the Previous Post in the Series)  (Forward to the Next Post in the Series)

As I was walking from the back of the facility where I work to my office just recently, I started repeatedly saying to myself as I approached the office door, “that huge computer was not here when I started calling my office a second home“.  I guess that I have worked at the same place for so long that I became blind to the interesting technology gadgets (if you can call multi-million dollar computers simply just gadgets) that surround me on a daily basis.

A couple of years ago BlackBerry released a small 8″ tablet, and Motorola released a 10″ Xoom tablet (I think that Motorola sold out their Mobility division to Google, who then sold that division to a farmer to use as fertilizer).  At the time the Xoom and BlackBerry tablets were released, my boss was really excited about the Apple iPads, but he did not care to spend $500 or more of his own money for a toy to use at home.  He had a Windows computer at home, but he seemed to always view that computer as excessively slow (mostly when viewing websites), even though he spent close to $3,000 on the computer six years earlier.  I am not much of an Apple fan, so I decided to have a little fun with my boss’ situation.

On the day that the Xoom tablet became available on Amazon, I placed an order for the tablet.  When it arrived, I brought it into work and showed the boss how quickly it could pull up web pages, along with its support of Adobe Flash playback (the iPad never supported Adobe Flash).  Yet, he continued to go on about the iPad, even showing me newspaper articles written by tech gurus that boasted about the fantastic features of the iPad.  A year earlier I had bought a small Windows netbook with a 10” display, trying to convince him that such a netbook was even better than an iPad, so obviously that prior attempt failed.

When the BlackBerry tablet was released, I made a special trip to Best Buy just to grab the tablet.  I set the tablet up to work with the BlackBerry phone that I had at the time.  Oh neat, I am able to look at the company emails that I receive on the phone using the tablet – certainly, that will convince the boss that something is better than the iPad.  I showed my boss, who was also using a BlackBerry phone at the time, the neat BlackBerry tablet that could not only quickly pull up web pages (along with showing Adobe Flash contents), but could also show company emails and use the phone as a mobile hotspot for viewing web pages.  He spent a couple of minutes looking over the BlackBerry tablet before handing it back to me.  I found a couple more newspaper articles about the iPad on my desk in the weeks that followed.

On a Sunday afternoon, I decided to do some video testing with the two tablets, in a final attempt to convince the boss that something other than an iPad is ideal for his use at home.  I took the two tablets to my second home (that’s the place where my office, and all of those huge computers are located), and decided to do a head to head video test with the two tablets.  I planned to show the best looking video from the two tablets to the boss, and finally win him over.  I held the two tablets side-by-side as I walked down the isles of the huge computers.  As I walked, I wondered what that 40,000 pound part was doing in the big pit that was dug for one of the computers that was expected to arrive in another month or two.  No matter, I continued with my video testing, holding the tablets at head level as I walked.  I received some strange looks from the other employees as I walked about – I simply reassured the other employees that I was just trying to impress the boss.  I took the tablets home and processed the video from the tablets to eliminate meaningless portions of the video.  It seems that both tablets produced 720P video at either 29 or 30 frames per second that was virtually identical in video quality, but the BlackBerry video would playback directly in the Windows Media Player, while the Xoom video required conversion to a compatible format.  I showed the boss the resulting video, that not only could the BlackBerry tablet quickly pull up web pages (along with showing Adobe Flash contents), show company emails and use the phone as a mobile hotspot for viewing web pages, but also record 720P video that easily plays back on your Windows computer at home.  The boss thought for a minute or two, and then said, “did you have a chance to read Walt Mossberg’s latest Wall Street Journal article, there is a new iPad out now.”

Ah, fond memories.

I recently found the video clips that I recorded using the tablets back in 2011, and after reviewing the videos, I still can’t see much difference between the videos captured by either tablet.  The video looks nice when playing back, but pausing either video to take a screen capture results in a blurry single-frame mess 90% of the time.  The video showed the big pit that was dug for the large computer – yep, that pit now contains a multi-million dollar computer, and the wall that had been next to the pit was removed during a later expansion project.

In the nearly five years since I created the first article on this blog, I really have not said much about the company where I work.  I have posted a lot of Oracle Database book reviews on Amazon, as well as several reviews of security cameras.  Some readers on Amazon were convinced that I worked for a couple of book publishing companies, writing fake book reviews to promote the publishers books; people who actually read the book reviews should know better than that – the reviews are brutally honest.  Some other customers on Amazon thought that I was working for a security camera company and/or living in California; no, not the case.  As a result, I put together an article that shows some of the interesting technology and multi-million dollar computers that are located just feet from my office at work.  In the article, I included some still frames from the video that I captured in the walk through with the tablets in 2011.

Below are three pictures from the article that I recently posted.  I am still trying to come up with good captions for the last two pictures, captions such as “taking a break” and “breaking in a new truck” seem to come in mind.

Cincinnati CL-707 Laser Burner Slicing Through 1In the Deep EndNeed a Bigger TRuck








Follow

Get every new post delivered to your Inbox.

Join 166 other followers