AWS Storage Blog

Auditing user and administrative actions in Amazon FSx for NetApp ONTAP using Splunk

Update (10/26/2022): This blog was updated to recommend that you validate that your file system can contact the log forwarding IP addresses.


Log and audit event monitoring is a vital part of any organization’s security practices. For file systems, this involves logging end-user activities (such as file access attempts) as well as administrative actions that modify a file system’s configuration. Amazon FSx for NetApp ONTAP has security features to help organizations validate their security posture and identify possible gaps in their access control policies.

Amazon FSx for NetApp ONTAP is a storage service that allows you to launch and run fully managed NetApp ONTAP file systems in the AWS Cloud. It provides the familiar features, performance, capabilities, and APIs of ONTAP file systems with the agility, scalability, and simplicity of a fully managed AWS service.

In this post, I walk through FSx for ONTAP’s built-in logging and file access auditing capabilities. Both administrative and file access logs can be pulled and aggregated into external search and analytics tools for reporting, monitoring, and visualization. This provides a way to automate, monitor, and react to end-user activities in near real time by using AWS services or AWS Partner solutions like Splunk. To meet compliance objectives, organizations need to know and demonstrate who is accessing­­—and what actions are performed on—files, folders, and file shares.

Solution overview

The first portion of this blog covers forwarding audit log events from an FSx for ONTAP file system. The second portion covers setting up file access auditing for SMB and NFS volumes.

Overview of FSx for ONTAP logs and file access auditing with Splunk

Figure 1: Overview of FSx for ONTAP logs and file access auditing with Splunk

Audit log events

FSx for ONTAP sends audit events for SET operations (modifies the file system) that originate from the ONTAP CLI and the ONTAP API. The audit events are forwarded to a syslog destination on the Splunk Enterprise and Universal Forwarder (UF) instances. Log forwarding from FSx for ONTAP multi-AZ file systems is limited to the preferred and standby local subnets, hence we need a syslog server in each subnet. I use syslog-ng as a syslog server on each Splunk Enterprise and UF node, while Splunk monitors the log files for new log events.

File access auditing

Configuring file access auditing involves using the ONTAP CLI to create and enable audit policies and configuring System Audit Control Lists (SACLs) on the SMB shares. SACLs define which access types and for which users or groups to publish audit events. You can use the ONTAP CLI or Windows File Explorer to configure SACLs on SMB shares.

To visualize and monitor the NTFS and Unix file access events, I use Splunk to index the audit events, and create a dashboard to show near real-time monitoring of file access events.

Configuring audit log forwarding

In this section, I configure FSx for ONTAP to forward management activities performed on a cluster to a syslog destination.

Step 1: Set up Splunk index, Universal Forwarder (UF), and syslog-ng

To get started, create a separate Splunk index for storing the audit events and set up syslog-ng on the Splunk Enterprise and UF. Set up the forwarding and receiving between the Splunk Enterprise and UF, and monitor the log files.

  1. Select Settings, then select Indexes.
  2. On the top right corner, select New Index.
  3. Enter an index name, for example, ontap_syslogs.
  4. Leave the default settings and select Save to create the index.

Create Splunk index for logs

Figure 2: Create Splunk index for logs

Now set up the syslog server on the Splunk Enterprise and Universal Forwarders:

  1. Install syslog-ng on the Splunk Enterprise and UF servers.
    sudo amazon-linux-extras install epel
    sudo yum install syslog-ng
  1. Modify the syslog-ng configuration file (/etc/syslog-ng/syslog-ng.conf) and add the following lines:
source s_fsx {
          tcp(ip(0.0.0.0) port(514));
         };

         destination d_fsx {
         file(
         "/var/log/fsx/fsxontap.log"
         create_dirs (yes)
         owner("root")
         group("root")
         perm(0755)
         ); };

         log { source(s_fsx); destination(d_fsx); };
  1. Restart the syslog-ng service.
sudo systemctl restart syslog-ng
  1. Repeat steps 1 to 3 on the Universal Forwarder.

Next, configure receiving data on the Splunk Enterprise server:

  1. On the Splunk Enterprise console, go to Settings, select Forwarding and Receiving.
  2. From the Receive data section, select Configure receiving. On the top-right corner, select New Receiving Port.
  3. Enter a port i.e. 9997 to listen on and select Save.

Setup Splunk to receive syslog data

Figure 3: Setup Splunk to receive syslog data

Now configure the Splunk Universal Forwarder to send events to the Splunk Enterprise server and monitor the local /var/log/fsx/fsxontap.log file.

  1. On the Universal Forwarder:

a. Configure the UF to forward data.

./splunk add forward-server <Splunk_Enterprise_IP>:9997

b. Monitor the log file for events.

./splunk add monitor /var/log/fsx/fsxontap.log -index syslog_index -sourcetype syslog
  1. On the Splunk Enterprise server:

a. Monitor the log file for events.

./splunk add monitor /var/log/fsx/fsxontap.log -index syslog_index -sourcetype syslog

Step 2: Enable log forwarding from ONTAP to Splunk

To configure audit log-forwarding to Splunk, you can use the ONTAP CLI (see Using the ONTAP CLI in the FSx for ONTAP user guide). To set up logging to your Splunk endpoint, enter the following commands:

cluster log-forwarding create -destination <splunk-enterprise-IP> -port 514 -protocol tcp-unencrypted -verify-server false -facility user

cluster log-forwarding create -destination <universal-forwarder-IP> -port 514 -protocol tcp-unencrypted -verify-server false -facility user

If you get the following error:

Error: command failed: Cannot contact destination host (172.31.34.125) from node "FsxId01234abcdef56-01". Verify connectivity to desired host or skip the connectivity check with the "-force" parameter.

Do not use the -force parameter. Instead, validate that your security groups and other routing is configured so that the connectivity test will succeed, to ensure your file system operates as expected.

Once you’ve validated that splunk-enterprise-IP and universal-forwarder-IP can be contacted from both nodes, verify that the log-forwarding destination has been created:

::>cluster log-forwarding show
                                                 Verify  Syslog
Destination Host         Port   Protocol        Server  Facility
------------------------ ------ --------------- ------  --------
172.31.2.69            514    tcp-unencrypted false   user
172.31.29.178          514    tcp-unencrypted false   user

To verify that audit logs are being forwarded to Splunk, create a volume through Amazon FSx, or through the ONTAP CLI.

FsxIdxxxxxxxxxxxxxxxxx::> volume create -volume cli_vol -aggregate aggr1 -vserver <vserver_name>
[Job 685] Job succeeded: Successful

Now perform a throughput scaling operation to force a failover using the AWS CLI:

aws fsx update-file-system --file-system-id fs-abcdefghij123456 --ontap-configuration ThroughputCapacity=256

Search the Splunk index (ontap_syslogs) to make sure the audit events are delivered. In the Splunk console, go to Search and Reporting and search for incoming events.

index="ontap_syslogs" | table _time,_raw | sort -_time

Searching Splunk for log events

Figure 4: Searching Splunk for log events

Here is a sample Success event for a throughput scaling operation:

2022-03-07T17:56:57+00:00 ip-172-31-16-155.eu-west-1.compute.internal FsxIdXXXXXXXXXXXXXXXXX-02: FsxIdXXXXXXXXXXXXXXXXX-02: 00000002.00007e75 00006ba9 Mon Mar 07 2022 17:56:50 +00:00 [kern_audit:info:4089] 8503e90000000246 :: FsxIdXXXXXXXXXXXXXXXXX:http :: 52.210.44.239:3167 :: FsxIdXXXXXXXXXXXXXXXXX:fsx-control-plane :: POST /api/private/cli/storage/failover/takeover : {"ofnode":" FsxIdXXXXXXXXXXXXXXXXX-02","halt":"true","option":"normal"} :: Success:

Configuring NTFS access auditing

File access audit logs are not integrated with the log-forwarding framework in FSx for ONTAP. The access audit events must be saved to a local path on the file system.

To configure NTFS access auditing using the ONTAP CLI, start by creating a volume to store the file access audit events and then enabling file access auditing on the SVM. Then configure audit policies on NTFS security-style files and directories on the volume.

Step 1: Enable storage virtual machine (SVM) auditing

  1. Create a volume to store access audit logs.
volume create -volume audit -vserver <vserver_name> -aggregate aggr1 -size 10G -state online -security-style mixed -junction-path /audit
  1. Create a vserver audit configuration to send the logs to the /audit path and rotate the logs for example every 5 minute. Depending on your auditing requirements, you can rotate the logs less often and retain them based on your retention needs.
vserver audit create -vserver <vserver_name> -destination /audit -rotate-limit 1440 -events file-ops,cifs-logon-logoff,cap-staging,file-share,user-account,security-group,authorization-policy-change -format xml -rotate-schedule-minute 0,5,10, 15,20,25,30,35,40,45,50,55 -rotate-size 100MB
  1. Enable the audit configuration for the vserver
vserver audit enable -vserver <vserver name>

Step 2: Enable NTFS SACLs (audit policies) on NTFS files and folders for file access auditing

  1. Create a volume with NTFS security style.
volume create -volume ntfs -aggregate aggr1 -size 10G -security-style ntfs -type RW -junction-path /ntfs -vserver <vserver_name>
  1. Create a share for the volume just created.
cifs share create -share-name ntfs -path /ntfs -share-properties oplocks,browsable,show-previous-versions -vserver <vserver_name>
  1. Create an NTFS security descriptor. This requires advanced privileges in the ONTAP CLI.
set -privilege advanced
vserver security file-directory ntfs create -ntfs-sd sd1 -vserver <vserver_name> -owner EXAMPLE\Admin
  1. Add NTFS SACL access control entries to the NTFS security descriptor and create two entries for successful and failed access attempts.
vserver security file-directory ntfs sacl add -vserver <vserver_name> -ntfs-sd sd1 -access-type failure -account Everyone -rights full-control

vserver security file-directory ntfs sacl add -vserver <vserver_name> -ntfs-sd sd1 -access-type success -account Everyone -rights full-control
  1. Create an audit policy for the SVM. Policies act as a container for various tasks that contain associations between the NTFS security descriptor and the file and folder paths.
vserver security file-directory policy create -policy-name policy1 -vserver <vserver_name>
  1. Add a task to the security policy. The task associates the success and failure access control entries in the security descriptor sd1 to the /ntfs files and folders.
vserver security file-directory policy task add -vserver <vserver_name> -policy-name policy1 -path /ntfs -security-type ntfs -ntfs-mode propagate -ntfs-sd sd1 -index-num 1 -access-control file-directory
  1. Finally, apply the security policy to the NTFS files and folders within the /ntfs
vserver security file-directory apply -vserver <vserver_name> -policy-name policy1

To verify that file access audit events are logged, access the ntfs share using SMB and create some content. Then access the audit volume to see the .xml formatted log files containing the access events.

File access audit logs in XML

Figure 5: File access audit logs in XML

Configuring UNIX access auditing

Now let’s configure file access auditing for NFS access to UNIX security style files and folders. We accomplish this by adding audit access control entries ACEs to NFSv4.x ACLs. For FSx for ONTAP to audit NFS events, NFSv4 has to be enabled.

  1. On the FSx for ONTAP file system, enable NFSv4 ACL support.
vserver nfs modify -vserver <vserver name> -v4.0 enabled -v4.0-acl enabled -v4.1-acl enabled
  1. Create a volume with UNIX security style.
volume create -volume unix -aggregate aggr1 -size 10G -security-style unix -type RW -junction-path /unix -vserver <vserver_name>

3. On an EC2 instance, mount the volume /unix.

mkdir /mnt/unix
sudo mount -t nfs <svm-nas-endpoint>:/unix /mnt/unix
cd /mnt/unix

4. Recursively add the auditing flags to the unix folder /mnt/unix.

nfs4_setfacl -R -a U:fdS:EVERYONE@:Cd unix

5. On the FSx for ONTAP file system, enable SVM auditing. Follow the instructions in Step 1 of Configuring NTFS access auditing above. If you have already done this, move on to the next action.

6. Create some files in the /mnt/unix directory from a UNIX instance. Mount the /audit volume and verify that UNIX access events on the /mnt/unix directory are logged.

cd /mnt/unix/
sudo sh -c "echo 'Hello World' > file01"
sudo mkdir /mnt/audit
sudo mount -t nfs <svm-nas-endpoint>:/audit /mnt/audit
cd /mnt/audit
grep "file01" audit_svm01_*

Visualizing audit and file access logs with Splunk

In this section, I create a dashboard to monitor access events. Splunk mounts the /audit volume and monitors the files for changes and forward the content into an index.

Step 1: Ingesting access events into Splunk

  1. To simplify the search queries, install the Splunk add-on for Microsoft Windows. Go to the Splunk console, choose  +Find More Apps, then search for “Add-on for Microsoft Windows.” Select Install and then enter your Splunk.com user name and password, accept the EULA conditions, then log in and install.

Splunk Add-on for Microsoft Windows

Figure 6: Splunk Add-on for Microsoft Windows

  1. Mount the /audit volume on your Splunk Enterprise or UF server.
sudo mkdir /mnt/audit
sudo mount -t nfs <svm-nas-endpoint>:/audit /mnt/audit

3. Create a separate Splunk index for file access auditing.

a. Select Settings, then select Indexes.

b. On the top right corner select New Index.

c. Enter an Index Name, for example, ontap_access.

d. Leave the default settings and select Saveto create index.

4. To monitor the /mnt/auditdirectory and forward the events into the ontap_access index, create an conf file with the content below. Replace <svm_name> with the SVM name.

/opt/splunk/etc/system/local/inputs.conf

[monitor:///mnt/audit/audit_<svm_name>_D*]
disabled = false
index = ontap_audits
source = XmlWinEventLog:Security
initCrcLength = 1024

This captures the newly rotated files every 5 minutes and exclude the audit_<svm_name>_last.xml log file. By default, Splunk uses the first 256 bytes (the head) to determine if this is a file that it has seen before and the last 256 bytes (the tail) to see if it has changed since the last time it has seen it. I override the default initCrcLength with a higher value of 1024 bytes to show the difference when files are rotated for CRC check.

  1. Next, create a props.conf file to split the access events into separate lines.
/opt/splunk/etc/system/local/props.conf

[XmlWinEventLog:Security]
SHOULD_LINEMERGE = false
KV_MODE = xml
MUST_BREAK_AFTER = \</Event\>
  1. Restart the Splunk server via the Splunk console (Setting > Server Controls > Restart Splunk) or via the server CLI (sudo /$SPLUNK_HOME/bin/splunk restart).

Step 2: Searching and creating a dashboard

Before I create a dashboard, I verify that events are being delivered to Splunk. Go to the Splunk console, select Search and Reporting, and search for events delivered to the file access auditing index created in Step 1.3 above, that is, ontap_access.

index="ontap_access"

Searching for file access events in Splunk

Figure 7: Searching for file access events in Splunk

Now let’s build a dashboard to display these events. Go to the Splunk console and select Search and Reporting from the Apps list on the left-hand side. Enter the following search queries in the search field:

  • Audit events timeline for success and failure
index="ontap_access" | replace "0x8010000000000000" with "Failure", "0x8020000000000000" with Success | timechart count by Keywords
  • Count the number of deletes
index="ontap_access" EventCode="4659" OR EventCode="4660" | stats count
  • Distinct SMB users accessing data
index="ontap_access" Source="CIFS" SubjectUserName | spath output=CifsUser path=Event.EventData.Data{6} | stats dc(CifsUser) as CifsUser
  • Distinct UNIX users accessing data
index="ontap_access" Source="NFSv4" SubjectUnix | spath output=UnixUser path=Event.EventData.Data{@Uid} | stats dc(UnixUser) as UnixUser
  • Count of successful access events
index="ontap_access" Keywords=0x8020000000000000 | stats count
  • Count of failed access events
index="ontap_access" Keywords=0x8010000000000000 | stats count
  • Users performing bulk deletes >50
index="ontap_access" EventCode="4659" OR EventCode="4660" | spath output=SubjectUserName path=Event.EventData.Data{6} | stats count by SubjectUserName | where count &gt; 50
  • Top 10 SMB users generating events
index="ontap_access" Source="CIFS" SubjectUserName | spath output=SubjectUserName path=Event.EventData.Data{6} | table SubjectUserName | top limit=10 SubjectUserName
  • Top 10 UNIX users generating events
index="ontap_access" Source="NFSv4" SubjectUnix | spath output=SubjectUnix path=Event.EventData.Data{@Uid} | table SubjectUnix | top limit=10 SubjectUnix
  • SMB event summary
index="ontap_access" Source="CIFS" SubjectUserName | spath output=SubjectUserName path=Event.EventData.Data{6} | spath output=ObjectName path=Event.EventData.Data{10} | table _time,SubjectUserName,ObjectName,Keywords,EventCode | replace "0x8010000000000000" with "Failure", "0x8020000000000000" with Success | sort -_time
  • UNIX event summary
index="ontap_access" Source="NFSv4" SubjectUnix | spath output=SubjectUnix path=Event.EventData.Data{@Uid} | spath output=ObjectName path=Event.EventData.Data{6} | table _time,SubjectUnix,ObjectName,Keywords,EventCode | replace "0x8010000000000000" with "Failure", "0x8020000000000000" with Success | sort -_time

For each search result above, go to the Visualization tab and modify the settings:

  • Select Single Valuefrom the Select visualization options for Deletes, Distinct SMB, and UNIX users, Successful and failed access event.
  • Select Pie Chart for Top 10 SMB and UNIX users generating events.
  • Select Bar Chart for Users Performing Bulk Deletes >50 and
  • Select Line Chart for Audit events timeline for success and failure.

Splunk query visualization

Figure 8: Splunk query visualization

For each search result, add the modified visualization to a dashboard:

  1. Go to the Save As dropdown menu, select New Dashboard, enter a Dashboard Title and a Panel Title for the search result, and choose Save to Dashboard.
  2. After creating the dashboard for the first search result, add subsequent visualizations to the same dashboard. Select Save As, then Existing Dashboard, then choose your existing dashboard name and enter a Panel Title, and choose Save to Dashboard.

Creating a dashboard

Figure 9: Creating a dashboard

Access the dashboard by going to Dashboards and selecting the dashboard created from the list. Edit the dashboard with the Edit button, and move and resize the charts to your desired layout. Here’s a sample dashboard displaying the access activities:

Sample dashboard for file access events monitoring

Figure 10: Sample dashboard for file access events monitoring

Conclusion

In this blog post, I demonstrated capturing FSx for ONTAP file system audit events. I also demonstrated capturing when SMB and NFS files and folders are accessed, and how to visualize the events in a dashboard with Splunk.

This solution can help you meet compliance and regulatory standards by logging administrative user events when storing sensitive file data, whether financial, personal, or medical, on AWS.

Splunk is an AWS Competency Partner. Their software and cloud services enable customers to search, monitor, analyze, and visualize machine-generated big data from websites, applications, servers, networks, IoT, and mobile devices.