How Exxeta Improves IT Planning with Use-Case Driven Architecture on AWS

By Johannes Müller, Manager Data Engineering – Exxeta
By Ulrich Buschbaum, Manager Data Engineering – Exxeta
By Tobias Maier, Sr. Account Manager ISV – AWS

Exxeta

When IT managers are designing solution architectures, it may seem pragmatic to use familiar technology but this one-size-fits-all approach can mean missed opportunities for improved performance, lower maintenance costs, and faster results.

Exxeta is an IT consultancy that proposes a different way: the use-case driven architecture approach, which can help organizations fit IT components to business processes, resulting in faster response times and lower maintenance costs.

Exxeta is an AWS Partner that combines technology and business know-how, advising customers across a variety of industries on market innovations, new business models, and technical solutions leveraging Amazon Web Services (AWS).

Use-Case Driven Architecture: Technology That Fits

In a use-case driven architecture, the business process scenario, which is also referred to as a use case, serves as the central point of focus. This approach prioritizes the perspective and needs of users during the design phase, while also taking into account the existing IT components.

Figure 1 – Three steps to use-case driven architecture design.

The use-case driven architecture process involves three steps:

Identify and outline use cases: Define data sources, processes, and interfaces, while collecting required components and defining priorities.
Specify primary use cases: To derive concrete requirements, select tools and infrastructure, and develop universally applicable architecture.
Implement the architecture: Based on the primary use cases, build a prototype, test, benchmark, and gather user feedback.

Real-World Example

Exxeta deployed the use-case driven architecture approach to design a data analytics solution for an automotive company. The company’s IT department had set out to design a data warehouse solution to replace its previous database, which was a popular solution hitting its 8 TB capacity limit.

At first, the project team considered setting up another off-the-shelf data warehouse solution, but instead decided to work with Exxeta and follow the use-case driven architecture approach.

The first step was to interview users of the existing solution to create a use-case map, which resulted in identifying four use cases:

Use case 1: Read and filter data from the error table for a defined plant and timeframe.
Use case 2: Find combinations of errors in the error table for a defined plant and timeframe.
Use case 3: Correlate errors with data from other tables; for example, data describing the type of vehicle (passenger car or van).
Use case 4: Write planning data to the database.

Two of the four use cases relate to the error table, which made up nearly 80% of the data analysis work done by employees on this database. Furthermore, the error table (including indices) made up 6 of the total 8 TB of data. This made use cases 1 and 2 the obvious primary use cases.

The next step was to select a database system that fit the primary use cases. Instead of a data warehouse, the team chose an open-source, non-relational distributed database. As a fast, distributed, and scalable wide-column store, it perfectly fit use cases 1 and 2. Both use cases are well supported by this non-relational distributed database, as it provides the necessary performance at a lower cost than a comparable data warehouse solution.

Configuring Amazon EMR

For these first two use cases, it’s possible to select the entries in the new database using a combination of plant-identifier and timestamp as the primary keys. This allows it to return relevant data with low latency.

Setup and configuration are similar to other Hadoop ecosystem tools, and preconfigured Amazon EMR templates reduce setting up the database cluster to just one command line statement:

aws emr create-cluster --name "Test cluster" --release-label emr-5.36.0 \
--applications Name=HBase --use-default-roles --ec2-attributes KeyName=myKey \
--instance-type m5.xlarge --instance-count 3

Afterwards, you need to initiate your HBase table, which you can do using the HBase shell once you connected to your head node:

> ./bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.4.8, rf844d09157d9dce6c54fcd53975b7a45865ee9ac, Wed Oct 27 08:48:57 PDT 2021
Took 0.0027 seconds
hbase> create 'productiondata', 'd', 'm'
hbase> exit

Here, we are setting up a productiondata table with two column families (‘d’ and ‘m’) for data and metadata. Column families can improve lookup times tremendously but should have very short names for performance reasons.

If you want to preload some data, you can leave the HBase shell and use the ImpotTsv tool as follows:

# use distcp first to copy your data from s3 into hdfs -> https://aws.amazon.com/premiumsupport/knowledge-center/copy-s3-hdfs-emr/
s3-dist-cp --src="s3://vds-sample-data/gen2/2021_03/0320_2021_03_01_400_10.csv" --dest="hdfs:///copied"
# now you can run ImportTSV
> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
        -Dimporttsv.separator=',' \
        -Dimporttsv.columns='HBASE_ROW_KEY,d:dataname,d:data,d:proddate,m:prodid,m:hardwareid,m:hostid,m:countrycode, ' \
        factuserdata file:///tmp/factuserdata_hbase.csv

Here, you can refer to a file location in your Hadoop Distributed File System (HDFS). If you’re loading larger amounts of data, it’s key to prepartition your cluster. A good number of partitions to start with would be four partitions per node in your cluster.

Divide the pragmatically possible keyspace of your table into roughly equally sized portions and run the following command in your HBase shell when creating the table:

create 'factuserdata', 'm', 'd', SPLITS =>['1000_A', '4000_A', … , '8000_M']

Now, your cluster won’t have to repartition as you are preloading your data.

Using Amazon S3 with EMR

The project team chose Amazon Simple Storage Service (Amazon S3) as the underlying file storage instead of a dedicated HDFS. Amazon S3 combines scalability, ease of use, security, and reliability in one storage solution at a low price point.

In combination with Amazon EMR (if used in a serverless variant), the benefits include:

No operating system or virtual machine to be maintained.
Scalable storage capacity, compute power, and memory.
Storage replication, encryption, versioning, backup, and automated tiering.

Furthermore, the combination of Amazon S3 and EMR can be upgraded to additionally support use case 3, correlating errors with data from other tables.

Figure 2 – Technical solution diagram supporting all use cases.

Exxeta chose Amazon Aurora PostgreSQL as the serverless variant, which best matches the relational data in the remaining part of the database.

In the rare occasions of use case 3, where performing more complex data warehouse-like queries with joins is a requirement, a query engine supporting multiple backends can be used. Trino, an open-source distributed SQL query engine which is also available through pre-configured EMR templates, supports Apache HBase and PostgreSQL as backends and enables queries spanning both systems.

Using Trino, it was possible to correlate the error data from Apache HBase to data from PostgreSQL.

This made joining the vehicle type to error occurrences possible. Because the amount of data in the relational tables was relatively small and the data in HBase could be selected with a primary key range condition, query response times were fast despite spanning two database systems and 8 TB of data.

Conclusion

In this blog, we defined and shared the benefits of a use-case centered architecture, which finds targeted IT solutions to fit business challenges informed by user experience.

The project team in this real-world example found a much simpler and cheaper solution by following the use-case driven architecture approach. Instead of a huge and expensive data warehouse, the automotive company was provided with a fast, scalable, and low maintenance solution at a lower price point by best fitting the IT components to business processes.

This process results in improved business performance, faster response times, and lower maintenance costs.

Exxeta is well-versed in deploying use-case centered architectures and is ready to help find the best fitting IT solution based on your business needs.

If you’re interested in how AWS approaches problem understanding and use-case definition from a customer perspective, learn about the AWS Digital Innovation Program. The program is inspired by the same customer-centric methods used by Amazon to develop breakthrough innovations our customers love, such as Amazon Prime, Kindle, AWS, Amazon Echo, and Alexa. Learn about Amazon’s peculiar, customer-centric approach to innovation, the key elements of Amazon’s culture of innovation, how to frame your business challenge, and define a new product, service, or experience that will delight your customers.

.

.

Exxeta – AWS Partner Spotlight

Exxeta is an AWS Partner that combines technology and business know-how, advising customers across a variety of industries on market innovations, new business models, and technical solutions leveraging AWS.

Contact Partner | Partner Overview