Pushing data via Amazon S3

Synoptic data aggregation leverages Amazon’s S3 object storage service as a platform for receiving data from providers in a secure and highly available manner.

About S3

S3 stands for “Simple Storage Service” and it is an object storage service operated by Amazon’s AWS, which is used to upload and download data. Objects in this context can be thought of as files with names. Using standardized technical interfaces, any provider can securely connect to the Amazon S3 service, and publish their data files.

S3 requires object storage service interfaces, you generally cannot use older technologies like SFTP or SCP to interface with any object storage services.

Synoptic leverages AWS functionalities on top of S3, such as services to trigger automatic processing of newly uploaded files to quickly ingest the submitted data.

Amazon S3 is one of many object storage services which generally utilize the same mechanics for interaction. These include Pandas, and other cloud vendors also operate equivalent services.

Core concepts

The two main concepts in s3 are the bucket: isolated inventories of objects, and keys: (not to be confused with access keys) which is the location of the object - essentially the file path. Synoptic gives providers exclusive access to a single bucket for publishing data.

You can store an unlimited number of objects in a bucket, and the use of slashes in keys (e.g. a/b/c) is optional (the key is the whole string), but is used by some file browsers to organize objects.

We encourage using different objects to upload additional data. Using timestamp-related factors to define keys can be useful for optimizing reading, but the main efficiency considerations come from avoiding pushing data you already pushed, and avoiding overwriting existing data.

Accessing S3

You will need to access S3 either programmatically, or via a file browser tool like Transmit, much like SFTP. To authenticate there are 3 options we support

Authenticating with AWS Roles

The most robust way for us to provide you secure access to a bucket is to use AWS’s IAM system. If a provider already operates in AWS, we can simply permit a role from your account to write data to a bucket we own. That way we never have to exchange credentials.

A provider who does not operate in AWS can also implement methods to assert identity with AWS via the use of trusted certificates. This arrangement is more complicated and can be configured with individual providers and our ingest engineering team.

Authenticating with keys and secrets

A simpler but less preferred method to access an S3 bucket is provisioned keys and secrets. A provider can be securely provided these credentials, and then the code that publishes the data would use those values to access the bucket.

This is much less secure as compromised keys can be used by others. To reduce this risk, we require providers using keys to update them annually if not more frequently depending on your compliance requirements. Our ingest team will ensure you securely use these credentials, and are able to update them on our cycle.

AWS credentials are very different from Synoptic product credentials.

If this is the right option for you, we will provide credentials to you.

Accessing a bucket you own

A third option, which is really a variant of the first, is a reverse process, we can access an object storage that you own. If you are using AWS S3, we can work with you to configure role based access for our data provider account, and then we can read and track your data. If you use an alternative object store service, we would treat that as a fetch access, which is not addressed in this gude.

It is very secure for us to access a bucket in your control, but it does require you to have your own AWS account.

Programmatic access

Because S3 uses an open protocol, there are a variety of software tools and packages that can be used to connect. Because we are specifically addressing the Amazon AWS service, the easiest ways to implement the full authentication process would be to utilize one of the AWS Developer toolkits, such as boto3 for Python.

It would be best to refer to the documentation relevant to the toolkit you use, but the following functions will likely be used

Functions for targeting a bucket (we will give you the bucket reference)
Functions for uploading data (such as put object)
Functions for reviewing and reading data, including list objects v2 and get object

Functions will normally use these expressions.

If you receive an error, read it carefully, it likely specifies a function you are calling which we likely did not grant permission to perform. As always, contact our provider help desk for any assistance.

Helpful scripts

The following scripts are examples, using python and boto3 of how you can interface with an S3 bucket

Check if a file was uploaded

PY

def verify_file_in_s3(bucket_name, s3_key):
    """
    Verifies if a specific file exists in the S3 bucket.
    Args:
        bucket_name (str): Name of the S3 bucket.
        s3_key (str): The key (filename) to check in the bucket.
    Returns:
        bool: True if the file is found, False otherwise.
    """
    # Initialize S3 client
    s3 = boto3.client('s3')
    try:
        # List objects in the specified bucket
        response = s3.list_objects_v2(Bucket=bucket_name)
        if 'Contents' not in response:
            print(f"The bucket '{bucket_name}' is empty or the key does not exist.")
            return False
        # Check if the specified file (s3_key) is present in the bucket
        filenames = [item['Key'] for item in response['Contents']]
        if s3_key in filenames:
            print(f"File '{s3_key}' exists in the bucket '{bucket_name}'.")
            return True
        else:
            print(f"File '{s3_key}' not found in the bucket '{bucket_name}'.")
            return False
    except Exception as e:
        print(f"An error occurred: {e}")
        return False