Skip to content

empml.data

Object Description
CSVDownloader Class for reading a CSV file and returning a Polars LazyFrame.
ParquetDownloader Class for reading a Parquet file and returning a Polars LazyFrame.
ExcelDownloader Class for reading an Excel file and returning a Polars LazyFrame.
SQLDownloader Class for reading data from any SQL database via connection URI.
PostgreSQLDownloader Class for reading data from PostgreSQL.
MySQLDownloader Class for reading data from MySQL.
MSSQLDownloader Class for reading data from Microsoft SQL Server.
SQLiteDownloader Class for reading data from a SQLite database.
OracleDownloader Class for reading data from Oracle Database.
RedshiftDownloader Class for reading data from Amazon Redshift.
BigQueryDownloader Class for reading data from Google BigQuery.
SnowflakeDownloader Class for reading data from Snowflake.
DatabricksDownloader Class for reading data from Databricks SQL.

CSVDownloader

Class for reading a CSV file and returning a Polars LazyFrame.

Methods

def __init__(self, path : str, separator : str = ';'):
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.scan_csv(self.path, separator = self.separator)

ParquetDownloader

Class for reading a Parquet file and returning a Polars LazyFrame.

Methods

def __init__(self, path : str):
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.scan_parquet(self.path)

ExcelDownloader

Class for reading an Excel file and returning a Polars LazyFrame.

Methods

def __init__(self, path : str, sheet_name : str | None = None):
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_excel(self.path, sheet_name = self.sheet_name).lazy()

SQLDownloader

Class for reading data from any SQL database via connection URI.

Uses connectorx under the hood (pip install connectorx). Supported URI schemes: postgresql://, mysql://, mssql://, sqlite://, oracle://, and more.

Methods

def __init__(self, query: str, connection_uri: str):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    connection_uri : str
        Full connection URI (e.g., 'postgresql://user:pass@host:5432/db').
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

PostgreSQLDownloader

Class for reading data from PostgreSQL and returning a Polars LazyFrame.

Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, host: str, user: str, password: str, database: str, port: int = 5432):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    host : str
        PostgreSQL server hostname.
    user : str
        Database username.
    password : str
        Database password.
    database : str
        Database name.
    port : int
        Server port (default: 5432).
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

MySQLDownloader

Class for reading data from MySQL and returning a Polars LazyFrame.

Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, host: str, user: str, password: str, database: str, port: int = 3306):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    host : str
        MySQL server hostname.
    user : str
        Database username.
    password : str
        Database password.
    database : str
        Database name.
    port : int
        Server port (default: 3306).
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

MSSQLDownloader

Class for reading data from Microsoft SQL Server and returning a Polars LazyFrame.

Also works with Azure SQL Database and Azure Synapse Analytics since they use the same protocol. Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, host: str, user: str, password: str, database: str, port: int = 1433):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    host : str
        SQL Server hostname.
    user : str
        Database username.
    password : str
        Database password.
    database : str
        Database name.
    port : int
        Server port (default: 1433).
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

SQLiteDownloader

Class for reading data from a SQLite database and returning a Polars LazyFrame.

Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, path: str):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    path : str
        Path to the SQLite database file.
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

OracleDownloader

Class for reading data from Oracle Database and returning a Polars LazyFrame.

Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, host: str, user: str, password: str, database: str, port: int = 1521):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    host : str
        Oracle server hostname.
    user : str
        Database username.
    password : str
        Database password.
    database : str
        Database name (service name).
    port : int
        Server port (default: 1521).
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

RedshiftDownloader

Class for reading data from Amazon Redshift and returning a Polars LazyFrame.

Requires connectorx (pip install connectorx).

Methods

def __init__(self, query: str, host: str, user: str, password: str, database: str, port: int = 5439):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    host : str
        Redshift cluster endpoint.
    user : str
        Database username.
    password : str
        Database password.
    database : str
        Database name.
    port : int
        Server port (default: 5439).
    """
    pass

def get_data(self) -> pl.LazyFrame:
    return pl.read_database_uri(self.query, self.connection_uri).lazy()

BigQueryDownloader

Class for reading data from Google BigQuery and returning a Polars LazyFrame.

Requires google-cloud-bigquery (pip install google-cloud-bigquery).

Methods

def __init__(self, query: str, project_id: str, credentials_path: str | None = None):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    project_id : str
        Google Cloud project ID.
    credentials_path : str | None
        Path to service account JSON credentials file.
        If None, uses Application Default Credentials.
    """
    pass

def get_data(self) -> pl.LazyFrame:
    # Uses google.cloud.bigquery client and Arrow for efficient transfer
    ...

SnowflakeDownloader

Class for reading data from Snowflake and returning a Polars LazyFrame.

Requires snowflake-connector-python (pip install snowflake-connector-python).

Methods

def __init__(self, query: str, account: str, user: str, password: str, warehouse: str, database: str, schema: str):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    account : str
        Snowflake account identifier.
    user : str
        Snowflake username.
    password : str
        Snowflake password.
    warehouse : str
        Snowflake warehouse name.
    database : str
        Snowflake database name.
    schema : str
        Snowflake schema name.
    """
    pass

def get_data(self) -> pl.LazyFrame:
    # Uses snowflake-connector-python and Arrow for efficient transfer
    ...

DatabricksDownloader

Class for reading data from Databricks SQL and returning a Polars LazyFrame.

Requires databricks-sql-connector (pip install databricks-sql-connector).

Methods

def __init__(self, query: str, server_hostname: str, http_path: str, access_token: str):
    """
    Parameters:
    -----------
    query : str
        SQL query to execute.
    server_hostname : str
        Databricks workspace server hostname.
    http_path : str
        HTTP path for the SQL warehouse or cluster.
    access_token : str
        Databricks personal access token.
    """
    pass

def get_data(self) -> pl.LazyFrame:
    # Uses databricks-sql-connector and Arrow for efficient transfer
    ...