Object storage in Orvanta (S3)
Workspace object storage
Section titled “Workspace object storage”Orvanta enables connecting your workspace to S3, Azure Blob Storage, or Google Cloud Storage buckets. This allows users to read and write files without direct credential access. The system automatically tracks data flows through the Assets feature.
After creating an S3, Azure Blob, or Google Cloud Storage resource, navigate to workspace settings > S3 Storage, select your resource, and save.
Resources permissions
Section titled “Resources permissions”Advanced S3 permissions (Enterprise only) enable fine-grained access control. By default, users can read/write all files but cannot list them. The permission system enforces path-based rules:
- User
aliceaccesses files inu/alice/**/*and shared group paths likeg/group1/**/* - Read-only folder access prevents write/delete operations
- Rules use Unix glob syntax and support interpolated variables like
{username} - Admins always have full access
All S3 interactions route through Orvanta’s backend, preventing unauthorized credential exposure.
S3 input and output UI
Section titled “S3 input and output UI”Scripts accepting S3 files display a file uploader or bucket explorer. Output files can be downloaded or previewed directly (text, CSV, images, PDFs, parquet). The backend streams only necessary rows, enabling handling of infinite-sized objects.
Multiple files can be returned as arrays:
export async function main() { return [{ s3: "path/to/file_1" }, { s3: "path/to/file_2" }];}Reading files
Section titled “Reading files”The loadS3File function loads entire content into memory, while loadS3FileStream processes large files incrementally using streams.
TypeScript (Bun):
import * as orvanta from 'orvanta-client';import { S3Object } from 'orvanta-client';
export async function main() { const example_file: S3Object = { s3: 'path/to/file' }; const file_content = await orvanta.loadS3File(example_file); const decoder = new TextDecoder(); console.log(decoder.decode(file_content));}Python:
import orvantafrom orvanta import S3Object
def main(): example_file = S3Object(s3='path/to/file') file_content = orvanta.load_s3_file(example_file) print(file_content.decode('utf-8'))Accepting file inputs
Section titled “Accepting file inputs”Scripts can declare S3Object parameters for auto-generated UI file uploaders or manual path entry with bucket browsing support.
Writing files
Section titled “Writing files”The writeS3File function (TypeScript) or orvanta.write_s3_file (Python) creates files in S3. Content can be strings, bytes, or streams.
TypeScript:
import * as orvanta from 'orvanta-client';import { S3Object } from 'orvanta-client';
export async function main(s3_file_path: string) { const s3_file_output: S3Object = { s3: s3_file_path }; await orvanta.writeS3File(s3_file_output, 'Hello Orvanta!'); return s3_file_output;}Secondary storage
Section titled “Secondary storage”Configure additional storage buckets from workspace settings. Specify secondary storage in S3 objects:
const file = { s3: 'folder/hello.txt', storage: 'storage_1' };Supported types include S3, Azure Blob, Google Cloud Storage, AWS OIDC, and Azure Workload Identity.
Polars and DuckDB integration
Section titled “Polars and DuckDB integration”Data pipelines using Polars and DuckDB handle S3 reading/writing natively and efficiently without manual bucket interaction.
Dynamic S3 object access in public apps
Section titled “Dynamic S3 object access in public apps”Use orvanta.signS3Object() or orvanta.signS3Objects() (TypeScript) / orvanta.sign_s3_object() or orvanta.sign_s3_objects() (Python) to generate presigned URLs for unauthenticated public app access.
Instance object storage
Section titled “Instance object storage”Enterprise Edition provides instance-level features including large-scale log management and distributed dependency caching via S3, Azure Blob, Google Cloud Storage, or AWS OIDC.
Large job logs management
Section titled “Large job logs management”Orvanta treats the database as a temporary buffer (up to 5000 characters per job). Enterprise users benefit from seamless S3 log streaming, handling large logs without overwhelming local resources. The Monitor logs on S3 toggle defaults to enabled, mirroring service logs and job chunks to the configured bucket.
Storage usage by folder
Section titled “Storage usage by folder”The Instance settings Object Storage panel displays a “Storage usage by folder” view showing top-level prefixes sorted by size. This superadmin-only tool enumerates objects with a 30-second timeout and doesn’t mutate data.
Manual log cleanup
Section titled “Manual log cleanup”The “Clean up expired logs” button triggers immediate cleanup logic:
- Deletes expired service log rows and associated files (in 2000-row batches)
- Removes expired job logs with transactional semantics and batches S3 deletes (up to 1000 objects per request)
The panel displays progress, per-phase counters, deleted object counts, errors, and last-run timestamps.
Distributed cache for Python, Rust, Go
Section titled “Distributed cache for Python, Rust, Go”Workers cache dependencies locally, but larger clusters reduce cache hit ratios. Instance object storage provides a global S3-backed cache. For Python, when a dependency version isn’t cached locally, workers check S3 for the “piptar” (pre-installed snapshot). If found, extraction replaces slower PyPI installation. If missing, installation occurs, then the snapshot uploads to S3.
For Bun, Rust, and Go, binary bundles cache on disk by default but can use instance object storage for distributed caching across workers.
Service logs storage
Section titled “Service logs storage”Logs store in S3 when instance object storage is configured, providing more scalable storage for larger deployments requiring long-term retention.
S3 proxy
Section titled “S3 proxy”Orvanta exposes workspace storages via the S3 protocol at:
http://{base_url}/api/w/{workspaceId}/s3_proxyThis endpoint hides resource credentials and enforces advanced permissions. DuckDB scripts use this automatically. Custom S3 clients can authenticate using JWT tokens, where the header and payload form the access key ID and the signature serves as the secret key.
const s3Client = new S3Client({ region: 'us-east-1', endpoint: `${base_url}/api/w/${workspaceId}/s3_proxy`, credentials: { accessKeyId, secretAccessKey }});For Azure Blob Storage, a lightweight translation layer converts S3 requests, though relying on it isn’t recommended.
Streaming large SQL results to S3 (Enterprise)
Section titled “Streaming large SQL results to S3 (Enterprise)”SQL scripts returning excessive data (exceeding 10,000 row limits) can stream query results directly to S3 files instead of returning data to Orvanta. See SQL to S3 streaming.