surveydata.storagesystem module

Core interface (informal) for survey data storage systems.

class surveydata.storagesystem.StorageSystem

Bases: object

Largely-abstract base class for survey data storage systems.

__init__()

Initialize storage system.

attachments_supported() bool

Query whether storage system supports attachments.

Returns

True if attachments supported, otherwise False

Return type

bool

get_attachment(attachment_location: str = '', submission_id: str = '', attachment_name: str = '') BinaryIO

Get submission attachment from storage.

Parameters
  • attachment_location (str) – Attachment location string (as returned when attachment stored)

  • submission_id (str) – Unique submission ID (in lieu of attachment_location)

  • attachment_name (str) – Attachment filename (in lieu of attachment_location)

Returns

Attachment as file-like object (though, note: it doesn’t support seeking)

Return type

BinaryIO

Must pass either attachment_location or both submission_id and attachment_name.

get_data_timezone() timezone

Get the timezone for timestamps in the data.

Returns

Timezone for timestamps in the data (defaults to datetime.timezone.utc if unknown)

Return type

datetime.timezone

get_dataframe(metadata_id: str) DataFrame

Get Pandas DataFrame from a binary file in storage.

Parameters

metadata_id (str) – Unique metadata ID (should begin and end with __ and not conflict with any submission ID)

Returns

Metadata string from storage, or empty string if no such metadata exists

Return type

pd.DataFrame

get_dataframe_csv(metadata_id: str) DataFrame

Get Pandas DataFrame from a .csv file in storage.

Parameters

metadata_id (str) – Unique metadata ID (should begin and end with __ and not conflict with any submission ID)

Returns

Metadata string from storage, or empty string if no such metadata exists

Return type

pd.DataFrame

get_metadata(metadata_id: str) str

Get metadata string from storage.

Parameters

metadata_id (str) – Unique metadata ID (should not conflict with any submission ID)

Returns

Metadata string from storage, or empty string if no such metadata exists

Return type

str

get_metadata_binary(metadata_id: str) bytes

Get metadata bytes from storage.

Parameters

metadata_id (str) – Unique metadata ID (should not conflict with any submission ID)

Returns

Metadata bytes from storage, or empty bytes array if no such metadata exists

Return type

bytes

get_submission(submission_id: str) dict

Get submission data from storage.

Parameters

submission_id (str) – Unique submission ID

Returns

Submission data (or empty dictionary if submission not found)

Return type

dict

get_submissions() list

Get all submission data from storage.

Returns

List of dictionaries, one for each submission

Return type

list

get_submissions_df() DataFrame

Get all submission data from storage, organized into a Pandas DataFrame.

Returns

Pandas DataFrame containing all submissions currently in storage

Return type

pandas.DataFrame

list_attachments(submission_id: str = '') list

List all attachments currently in storage.

Parameters

submission_id (str) – Optional submission ID, to list only attachments for specific submission

Returns

List of attachments, each as dict with name, submission_id, and location_string

Return type

list

list_submissions() list

List all submissions currently in storage.

Returns

List of submission IDs

Return type

list

query_attachment(attachment_location: str = '', submission_id: str = '', attachment_name: str = '') bool

Query whether specific submission attachment exists in storage.

Parameters
  • attachment_location (str) – Attachment location string (as returned when attachment stored)

  • submission_id (str) – Unique submission ID (in lieu of attachment_location)

  • attachment_name (str) – Attachment filename (in lieu of attachment_location)

Returns

True if submission exists in storage; otherwise False

Return type

bool

query_submission(submission_id: str) bool

Query whether specific submission exists in storage.

Parameters

submission_id (str) – Unique submission ID

Returns

True if submission exists in storage; otherwise False

Return type

bool

set_data_timezone(tz: timezone)

Set the timezone for timestamps in the data.

Parameters

tz (datetime.timezone) – Timezone for timestamps in the data

store_attachment(submission_id: str, attachment_name: str, attachment_data: BinaryIO) str

Store submission attachment in storage.

Parameters
  • submission_id (str) – Unique submission ID

  • attachment_name (str) – Attachment filename

  • attachment_data (BinaryIO) – File-type object containing the attachment data

Returns

Location string for stored attachment

Return type

str

store_dataframe(metadata_id: str, df: DataFrame)

Store Pandas DataFrame as binary file in storage.

Parameters
  • metadata_id (str) – Unique metadata ID to save as (should begin and end with __ and not conflict with any submission ID)

  • df (pd.DataFrame) – Pandas DataFrame to store as binary file

store_dataframe_csv(metadata_id: str, df: DataFrame)

Store Pandas DataFrame as .csv file in storage.

Parameters
  • metadata_id (str) – Unique metadata ID to save as (should begin and end with __ and not conflict with any submission ID)

  • df (pd.DataFrame) – Pandas DataFrame to store as .csv file

store_metadata(metadata_id: str, metadata: str)

Store metadata string in storage.

Parameters
  • metadata_id (str) – Unique metadata ID (should begin and end with __ and not conflict with any submission ID)

  • metadata (str) – Metadata string to store

store_metadata_binary(metadata_id: str, metadata: bytes)

Store metadata bytes in storage.

Parameters
  • metadata_id (str) – Unique metadata ID (should begin and end with __ and not conflict with any submission ID)

  • metadata (bytes) – Metadata bytes to store

store_submission(submission_id: str, submission_data: dict)

Store submission data in storage.

Parameters
  • submission_id (str) – Unique submission ID

  • submission_data (dict) – Submission data to store