Pushing data via Amazon S3
Synoptic data aggregation leverages Amazon’s S3 object storage service as a platform for receiving data from providers in a secure and highly available manner.
About S3
S3 stands for “Simple Storage Service” and it is an object storage service operated by Amazon’s AWS, which is used to upload and download data. Objects in this context can be thought of as files with names. Using standardized technical interfaces, any provider can securely connect to the Amazon S3 service, and publish their data files.
S3 requires object storage service interfaces, you generally cannot use older technologies like SFTP or SCP to interface with any object storage services.
Synoptic leverages AWS functionalities on top of S3, such as services to trigger automatic processing of newly uploaded files to quickly ingest the submitted data.
Amazon S3 is one of many object storage services which generally utilize the same mechanics for interaction. These include Pandas, and other cloud vendors also operate equivalent services.
Core concepts
The two main concepts in s3 are the bucket: isolated inventories of objects, and keys: (not to be confused with access keys) which is the location of the object - essentially the file path. Synoptic gives providers exclusive access to a single bucket for publishing data.
You can store an unlimited number of objects in a bucket, and the use of slashes in keys (e.g. a/b/c
) is optional (the key is the whole string), but is used by some file browsers to organize objects.
We encourage using different objects to upload additional data. Using timestamp-related factors to define keys can be useful for optimizing reading, but the main efficiency considerations come from avoiding pushing data you already pushed, and avoiding overwriting existing data.
Accessing S3
You will need to access S3 either programmatically, or via a file browser tool like Transmit, much like SFTP. To authenticate there are 3 options we support
Authenticating with AWS Roles
The most robust way for us to provide you secure access to a bucket is to use AWS’s IAM system. If a provider already operates in AWS, we can simply permit a role from your account to write data to a bucket we own. That way we never have to exchange credentials.
A provider who does not operate in AWS can also implement methods to assert identity with AWS via the use of trusted certificates. This arrangement is more complicated and can be configured with individual providers and our ingest engineering team.
Authenticating with keys and secrets
A simpler but less preferred method to access an S3 bucket is provisioned keys and secrets. A provider can be securely provided these credentials, and then the code that publishes the data would use those values to access the bucket.
This is much less secure as compromised keys can be used by others. To reduce this risk, we require providers using keys to update them annually if not more frequently depending on your compliance requirements. Our ingest team will ensure you securely use these credentials, and are able to update them on our cycle.
AWS credentials are very different from Synoptic product credentials.
If this is the right option for you, we will provide credentials to you.
Accessing a bucket you own
A third option, which is really a variant of the first, is a reverse process, we can access an object storage that you own. If you are using AWS S3, we can work with you to configure role based access for our data provider account, and then we can read and track your data. If you use an alternative object store service, we would treat that as a fetch access, which is not addressed in this gude.
It is very secure for us to access a bucket in your control, but it does require you to have your own AWS account.
Programmatic access
Because S3 uses an open protocol, there are a variety of software tools and packages that can be used to connect. Because we are specifically addressing the Amazon AWS service, the easiest ways to implement the full authentication process would be to utilize one of the AWS Developer toolkits, such as boto3
for Python.
It would be best to refer to the documentation relevant to the toolkit you use, but the following functions will likely be used
Functions for targeting a bucket (we will give you the bucket reference)
Functions for uploading data (such as
put object
)Functions for reviewing and reading data, including
list objects v2
andget object
Functions will normally use these expressions.
If you receive an error, read it carefully, it likely specifies a function you are calling which we likely did not grant permission to perform. As always, contact our provider help desk for any assistance.
Helpful scripts
The following scripts are examples, using python and boto3
of how you can interface with an S3 bucket
Check if a file was uploaded
def verify_file_in_s3(bucket_name, s3_key):
"""
Verifies if a specific file exists in the S3 bucket.
Args:
bucket_name (str): Name of the S3 bucket.
s3_key (str): The key (filename) to check in the bucket.
Returns:
bool: True if the file is found, False otherwise.
"""
# Initialize S3 client
s3 = boto3.client('s3')
try:
# List objects in the specified bucket
response = s3.list_objects_v2(Bucket=bucket_name)
if 'Contents' not in response:
print(f"The bucket '{bucket_name}' is empty or the key does not exist.")
return False
# Check if the specified file (s3_key) is present in the bucket
filenames = [item['Key'] for item in response['Contents']]
if s3_key in filenames:
print(f"File '{s3_key}' exists in the bucket '{bucket_name}'.")
return True
else:
print(f"File '{s3_key}' not found in the bucket '{bucket_name}'.")
return False
except Exception as e:
print(f"An error occurred: {e}")
return False