New – AWS Application Discovery Service Console

Update (March 2020) – In the years that have passed since this post was published, we have removed the network visualization feature.

AWS Application Discovery Service helps you to plan your migration to the cloud. As a central component of the AWS Cloud Adoption Framework, it simplifies the process of automating the process of discovering and collecting important information about your system (read New – AWS Application Discovery Service – Plan Your Cloud Migration to learn more).

There are two different data collection options. You can install a lightweight agent on your physical servers or VMs, or you can run the Agentless Discovery Connector in your VMWare environment. Either way, AWS Application Discovery Service collects the following information:

System identification information (hostname, IP addresses, MAC addresses, operating system name & version)
System resource specifications (CPU, RAM, storage)
System-level resource utilization

The lightweight agent also collects information about TCP listening ports and associated processes.

The information is collected, stored locally for optional review, and then uploaded to the cloud across a secure connection on port 443. It is processed and correlated, and then stored in a repository in encrypted form. You can then use the information to analyze the total cost of ownership (TCO) of running your existing on-premises environment on AWS. You can also use it to group the discovered servers into applications for migration planning.

New Application Discovery Service Console
The new AWS Application Discovery Service Console is now part of the AWS Migration Hub in order to simplify tracking of migrations (this happens after the discovery and grouping process). The landing page gives you an overview of the service, with a listing of the benefits and features. Click on Get Started with AWS Application Discovery Service to move ahead:

Choose your data collection option (agent on the servers or VMs, or agentless in your VMware environment). You can click on Learn more for detailed setup instructions:

With the agents and connectors (you can use both together) set up and ready to go, you can start discovery from selected agents/connectors by clicking on Start data collection on the Data Collectors page:

You can see the servers as they are discovered:

You can select one or more servers and group them into a named application, again with a couple of clicks:

You can add one or more tags to each server:

You can see all of the detailed information for each server and export the network connections and processes that are producing or consuming network traffic (for agents only) by clicking the Export server details button:

You can see a list of the applications (each one consisting of on one or more discovered servers) in the Discover->Applications section:

Export System Performance Data for all Servers
After starting the data collection, you can export a summary of the system performance data for all the servers discovered by agents and collectors from the AWS Command Line Interface (AWS CLI). Install and configure it, using us-west-2 for the default region and text for the default output format.

Begin by starting an export task:

$ aws discovery start-export-task
{
    "exportId": "export-8125f0db-49b4-474d-b75b-0efd7f85d3c5"
}

List all of the tasks, find the one with the matching exportId, and capture the configurationsDownloadUrl (I have simplified it here for clarity, and also replaced my AWS account Id with xxxxxxxxxxxx):

$ aws discovery --region us-west-2 describe-export-tasks --export-ids
{
    "exportsInfo": [
        {
            "exportId": "export-8125f0db-49b4-474d-b75b-0efd7f85d3c5",
            "exportStatus": "SUCCEEDED",
            "exportRequestTime": 1524580989.0,
            "configurationsDownloadUrl": "https://s3.us-west-2.amazonaws.com/prod.pdx.poseidon.discovery.exporter/xxxxxxxxxxxx/xxxxxxxxxxxx_export-8125f0db-49b4-474d-b75b-0efd7f85d3c5.zip",
            "isTruncated": false,
            "statusMessage": "Data export ran successfully and is accessible from the download URL. The URL will expire in 24 hours. The export data expires in 10 days."
        }
    ],
    "nextToken": ""
}

Use the URL to download a ZIP file that contains system performance data for all of the discovered servers:

$ unzip xxxxxxxxxxxx_export-8125f0db-49b4-474d-b75b-0efd7f85d3c5.zip
Archive:  xxxxxxxxxxxx_export-8125f0db-49b4-474d-b75b-0efd7f85d3c5.zip
  inflating: xxxxxxxxxxxx_Server.csv
  inflating: xxxxxxxxxxxx_NetworkInterface.csv
  inflating: xxxxxxxxxxxx_SystemPerformance.csv
  inflating: xxxxxxxxxxxx_Applications.csv
  inflating: xxxxxxxxxxxx_Tags.csv

Data Exploration in Athena
You can also use the AWS Discovery Utilities scripts to download the system performance data and transform it for use in Amazon Athena, using an S3 bucket for storage. The utilities package includes the following scripts:

export.py – Perform a bulk, CSV-format export of all servers discovered by agents.

convert_csv.py – Convert the CSV files to Parquet format and upload them to an S3 bucket.

discovery_athena.ddl – Import the Parquet files to Athena (this script must be modified to reference the actual S3 bucket).

After you have run all of the scripts you can query the data in the Athena console using SQL commands.

Here’s a query that identifies network communication between servers on a per-port basis:

WITH valid_ips AS
    (SELECT DISTINCT source_ip
    FROM source_process_connection ), outer_query AS
    (SELECT agent_id,
         source_ip,
         destination_ip,
         destination_port,
         count(*) AS frequency
    FROM source_process_connection
    WHERE ip_version = 'IPv4'
            AND destination_ip IN
        (SELECT *
        FROM valid_ips)
        GROUP BY  agent_id, source_ip, destination_ip, destination_port )
    SELECT source_ip AS Source,
         'Port ' || cast(destination_port AS varchar(20)) AS Edge, destination_ip AS Target, Frequency
FROM outer_query;

Here’s the result:

Here’s a query that identifies system performance data for cost analysis:

SELECT DISTINCT SP.AGENT_ID,
         OS.OS_NAME,
         OS.OS_VERSION,
         MAX(SP.total_num_cores) AS Cores,
         MAX(SP.total_num_cpus) AS CPU,
         MAX(SP.total_disk_size_in_gb) AS StorageTotal,
         MAX(SP.total_disk_free_size_in_gb) AS StorageFree,
         MAX(SP.total_ram_in_mb) AS RAM,
         MAX(SP.total_disk_read_ops_per_sec) AS IOPS_Read,
         MAX(SP.total_disk_bytes_written_per_sec_in_kbps) AS IOPS_Write
FROM SYSTEM_PERFORMANCE AS SP, OS_INFO AS OS
WHERE SP.AGENT_ID = OS.AGENT_ID
GROUP BY  SP.AGENT_ID, OS.OS_NAME, OS.OS_VERSION;

With this information at hand, you will be ready to plan and execute your migration to the AWS Cloud! To learn more, read the Application Discovery Service User Guide.

— Jeff;

PS – Our Application Discovery Service Partners would love to help you with your cloud migration.

PPS – I edited this post on April 24, 2018 to address some changes to the service.

AWS News Blog

New – AWS Application Discovery Service Console

Resources

Follow