Object storage in Orvanta (S3)

Workspace object storage

Orvanta enables connecting your workspace to S3, Azure Blob Storage, or Google Cloud Storage buckets. This allows users to read and write files without direct credential access. The system automatically tracks data flows through the Assets feature.

Setup

After creating an S3, Azure Blob, or Google Cloud Storage resource, navigate to workspace settings > S3 Storage, select your resource, and save.

Resources permissions

Advanced S3 permissions (Enterprise only) enable fine-grained access control. By default, users can read/write all files but cannot list them. The permission system enforces path-based rules:

User alice accesses files in u/alice/**/* and shared group paths like g/group1/**/*
Read-only folder access prevents write/delete operations
Rules use Unix glob syntax and support interpolated variables like {username}
Admins always have full access

All S3 interactions route through Orvanta’s backend, preventing unauthorized credential exposure.

S3 input and output UI

Scripts accepting S3 files display a file uploader or bucket explorer. Output files can be downloaded or previewed directly (text, CSV, images, PDFs, parquet). The backend streams only necessary rows, enabling handling of infinite-sized objects.

Multiple files can be returned as arrays:

export async function main() {
  return [{ s3: "path/to/file_1" }, { s3: "path/to/file_2" }];
}

Reading files

The loadS3File function loads entire content into memory, while loadS3FileStream processes large files incrementally using streams.

TypeScript (Bun):

import * as orvanta from 'orvanta-client';
import { S3Object } from 'orvanta-client';

export async function main() {
  const example_file: S3Object = { s3: 'path/to/file' };
  const file_content = await orvanta.loadS3File(example_file);
  const decoder = new TextDecoder();
  console.log(decoder.decode(file_content));
}

Python:

import orvanta
from orvanta import S3Object

def main():
  example_file = S3Object(s3='path/to/file')
  file_content = orvanta.load_s3_file(example_file)
  print(file_content.decode('utf-8'))

Accepting file inputs

Scripts can declare S3Object parameters for auto-generated UI file uploaders or manual path entry with bucket browsing support.

Writing files

The writeS3File function (TypeScript) or orvanta.write_s3_file (Python) creates files in S3. Content can be strings, bytes, or streams.

TypeScript:

import * as orvanta from 'orvanta-client';
import { S3Object } from 'orvanta-client';

export async function main(s3_file_path: string) {
  const s3_file_output: S3Object = { s3: s3_file_path };
  await orvanta.writeS3File(s3_file_output, 'Hello Orvanta!');
  return s3_file_output;
}

Secondary storage

Configure additional storage buckets from workspace settings. Specify secondary storage in S3 objects:

const file = { s3: 'folder/hello.txt', storage: 'storage_1' };

Supported types include S3, Azure Blob, Google Cloud Storage, AWS OIDC, and Azure Workload Identity.

Polars and DuckDB integration

Data pipelines using Polars and DuckDB handle S3 reading/writing natively and efficiently without manual bucket interaction.

Dynamic S3 object access in public apps

Use orvanta.signS3Object() or orvanta.signS3Objects() (TypeScript) / orvanta.sign_s3_object() or orvanta.sign_s3_objects() (Python) to generate presigned URLs for unauthenticated public app access.

Instance object storage

Enterprise Edition provides instance-level features including large-scale log management and distributed dependency caching via S3, Azure Blob, Google Cloud Storage, or AWS OIDC.

Large job logs management

Orvanta treats the database as a temporary buffer (up to 5000 characters per job). Enterprise users benefit from seamless S3 log streaming, handling large logs without overwhelming local resources. The Monitor logs on S3 toggle defaults to enabled, mirroring service logs and job chunks to the configured bucket.

Storage usage by folder

The Instance settings Object Storage panel displays a “Storage usage by folder” view showing top-level prefixes sorted by size. This superadmin-only tool enumerates objects with a 30-second timeout and doesn’t mutate data.

Manual log cleanup

The “Clean up expired logs” button triggers immediate cleanup logic:

Deletes expired service log rows and associated files (in 2000-row batches)
Removes expired job logs with transactional semantics and batches S3 deletes (up to 1000 objects per request)

The panel displays progress, per-phase counters, deleted object counts, errors, and last-run timestamps.

Distributed cache for Python, Rust, Go

Workers cache dependencies locally, but larger clusters reduce cache hit ratios. Instance object storage provides a global S3-backed cache. For Python, when a dependency version isn’t cached locally, workers check S3 for the “piptar” (pre-installed snapshot). If found, extraction replaces slower PyPI installation. If missing, installation occurs, then the snapshot uploads to S3.

For Bun, Rust, and Go, binary bundles cache on disk by default but can use instance object storage for distributed caching across workers.

Service logs storage

Logs store in S3 when instance object storage is configured, providing more scalable storage for larger deployments requiring long-term retention.

S3 proxy

Orvanta exposes workspace storages via the S3 protocol at:

http://{base_url}/api/w/{workspaceId}/s3_proxy

This endpoint hides resource credentials and enforces advanced permissions. DuckDB scripts use this automatically. Custom S3 clients can authenticate using JWT tokens, where the header and payload form the access key ID and the signature serves as the secret key.

const s3Client = new S3Client({
  region: 'us-east-1',
  endpoint: `${base_url}/api/w/${workspaceId}/s3_proxy`,
  credentials: { accessKeyId, secretAccessKey }
});

For Azure Blob Storage, a lightweight translation layer converts S3 requests, though relying on it isn’t recommended.

Streaming large SQL results to S3 (Enterprise)

SQL scripts returning excessive data (exceeding 10,000 row limits) can stream query results directly to S3 files instead of returning data to Orvanta. See SQL to S3 streaming.