Overview
Joom Spark Platform contains the essential components to run Apache Spark in your own account. We provide pre-configured images, Hive Metastore, Spark History Server and more in a turn-key product, so that you can get productive right away.
Highlights
- Open-source Spark preconfigured for AWS. The minimal set of AWS libraries is already included in the images, and the defaults are changed to play well with S3 data stores.
- Hive metastore is included and Spark is configured to use it, so you can operate with databases and tables right away.
- Spark History Server provides observability into all the jobs.
Details
Features and programs
Quick Launch
Pricing
Vendor refund policy
This product is free, therefore no refunds are possible or offered.
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Helm Chart
- Amazon EKS
Helm chart
Helm charts are Kubernetes YAML manifests combined into a single package that can be installed on Kubernetes clusters. The containerized application is deployed on a cluster by running a single Helm install command to install the seller-provided Helm chart.
Version release notes
Initial public release
- Spark images with built-in support for AWS
- Contains Spark Operator, Hive Metastore and Postgresql
Additional details
Usage instructions
Please start by creating a namespace and a service account. For initial testing, create a service account with read-only access to S3
kubectl create namespace spark
eksctl create iamserviceaccount \
--name spark \
--namespace spark \
--cluster <ENTER_YOUR_CLUSTER_NAME_HERE> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve \
--override-existing-service accounts
Use Helm 3.8.0 or later, and login to the Helm Registry
aws ecr get-login-password \
--region us-east-1 | helm registry login \
--username AWS \
--password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com/joom
Then, install the Helm chart:
helm install --namespace spark joom-spark-platform oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/joom/joom-spark-platform --version 1.0.0
Obtain the example Spark application manifest and apply it:
aws s3 cp s3://joom-analytics-cloud-public/examples/minimal/minimal.yaml minimal.yaml
kubectl apply -f minimal.yaml
Finally, watch the output of the Spark job
kubectl -n spark logs demo-minimal-driver -f
You should see a Spark session starting, and a test dataframe printed.
If you want to test writing data, you need to decide on an S3 bucket for testing. We recommend creating a new bucket. Then
Give write permissions to the service account
eksctl delete iamserviceaccount --name spark --namespace spark --cluster <ENTER_YOUR_CLUSTER_NAME_HERE>
eksctl create iamserviceaccount \
--name spark \
--namespace spark \
--cluster <ENTER_YOUR_CLUSTER_NAME_HERE> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
--approve \
--override-existing-serviceaccounts
Obtain a Spark application manifest:
aws s3 cp s3://joom-analytics-cloud-public/examples/minimal/minimal-write.yaml minimal-write.yaml
In the file, modify the DATA_BUCKET environment variable to the name of your bucket.
Then, apply the manifest and review job logs
kubectl apply -f minimal-write.yaml
kubectl -n spark logs demo-minimal-write-driver logs
In the logs, you will see that test data is written to S3 and registered in the Hive metastore.
At this point, you can modify the manifests to use your Spark workloads.
Support
Vendor support
This free product does not have any support SLA, but we welcome feedback by email at vladimir@joom.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.