<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data-Transformation on Datro - From Data to Action | Tailored Web Apps. Real Business Value.</title><link>https://datro.co.za/tags/data-transformation/</link><description>Recent content in Data-Transformation on Datro - From Data to Action | Tailored Web Apps. Real Business Value.</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 26 Sep 2025 10:37:13 +0200</lastBuildDate><atom:link href="https://datro.co.za/tags/data-transformation/index.xml" rel="self" type="application/rss+xml"/><item><title>Apache Iceberg - Table Format for Data Lakes</title><link>https://datro.co.za/tech/iceberg/</link><pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate><guid>https://datro.co.za/tech/iceberg/</guid><description>&lt;h1 id="apache-iceberg-table-format-for-data-lakes">Apache Iceberg: Table Format for Data Lakes&lt;/h1>
&lt;h2 id="why-we-choose-apache-iceberg">Why We Choose Apache Iceberg&lt;/h2>
&lt;p>Apache Iceberg represents the future of data lake management - providing ACID compliance, schema evolution, and time travel capabilities that transform how we store, query, and manage large-scale data. Here&amp;rsquo;s why it&amp;rsquo;s the foundation of our modern data architecture.&lt;/p>
&lt;h3 id="acid-compliance-for-data-lakes">&lt;strong>ACID Compliance for Data Lakes&lt;/strong>&lt;/h3>
&lt;p>Iceberg brings enterprise-grade reliability to data lakes:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>ACID Transactions&lt;/strong>: Full atomicity, consistency, isolation, and durability&lt;/li>
&lt;li>&lt;strong>Schema Evolution&lt;/strong>: Safe schema changes without data corruption&lt;/li>
&lt;li>&lt;strong>Time Travel&lt;/strong>: Query data at any point in time&lt;/li>
&lt;li>&lt;strong>Hidden Partitioning&lt;/strong>: Logical partitioning independent of physical storage&lt;/li>
&lt;li>&lt;strong>Metadata Management&lt;/strong>: Efficient metadata handling for large datasets&lt;/li>
&lt;/ul>
&lt;h3 id="performance-and-scalability">&lt;strong>Performance and Scalability&lt;/strong>&lt;/h3>
&lt;p>Iceberg delivers exceptional performance characteristics:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Partition Pruning&lt;/strong>: Intelligent partition elimination for faster queries&lt;/li>
&lt;li>&lt;strong>Column Projection&lt;/strong>: Read only the columns you need&lt;/li>
&lt;li>&lt;strong>File Skipping&lt;/strong>: Skip irrelevant files based on metadata&lt;/li>
&lt;li>&lt;strong>Compaction&lt;/strong>: Automatic file optimization and cleanup&lt;/li>
&lt;li>&lt;strong>Caching&lt;/strong>: Efficient metadata caching for repeated queries&lt;/li>
&lt;/ul>
&lt;h3 id="key-benefits-for-our-clients">&lt;strong>Key Benefits for Our Clients&lt;/strong>&lt;/h3>
&lt;h4 id="1-data-reliability">1. &lt;strong>Data Reliability&lt;/strong>&lt;/h4>
&lt;p>ACID compliance ensures your data is always consistent and recoverable, even in distributed environments.&lt;/p></description></item><item><title>Apache Trino - Distributed SQL Query Engine</title><link>https://datro.co.za/tech/trino/</link><pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate><guid>https://datro.co.za/tech/trino/</guid><description>&lt;h1 id="apache-trino-distributed-sql-query-engine">Apache Trino: Distributed SQL Query Engine&lt;/h1>
&lt;h2 id="why-we-choose-apache-trino">Why We Choose Apache Trino&lt;/h2>
&lt;p>Apache Trino represents the pinnacle of distributed SQL query engines - providing lightning-fast, interactive analytics across multiple data sources with ANSI SQL compliance. Here&amp;rsquo;s why it&amp;rsquo;s the foundation of our data query strategy.&lt;/p>
&lt;h3 id="high-performance-sql-engine">&lt;strong>High-Performance SQL Engine&lt;/strong>&lt;/h3>
&lt;p>Trino delivers exceptional query performance characteristics:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Interactive Queries&lt;/strong>: Sub-second response times for complex analytics&lt;/li>
&lt;li>&lt;strong>Distributed Processing&lt;/strong>: Parallel query execution across multiple nodes&lt;/li>
&lt;li>&lt;strong>Memory-Optimized&lt;/strong>: In-memory processing for maximum speed&lt;/li>
&lt;li>&lt;strong>Query Optimization&lt;/strong>: Advanced cost-based query optimization&lt;/li>
&lt;li>&lt;strong>Columnar Processing&lt;/strong>: Efficient columnar data processing&lt;/li>
&lt;/ul>
&lt;h3 id="multi-data-source-federation">&lt;strong>Multi-Data-Source Federation&lt;/strong>&lt;/h3>
&lt;p>Trino excels at querying across diverse data sources:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Unified SQL Interface&lt;/strong>: Single SQL dialect across all data sources&lt;/li>
&lt;li>&lt;strong>Real-Time Queries&lt;/strong>: Live data access without ETL delays&lt;/li>
&lt;li>&lt;strong>Schema Discovery&lt;/strong>: Automatic schema detection and mapping&lt;/li>
&lt;li>&lt;strong>Federated Queries&lt;/strong>: JOIN data across different systems&lt;/li>
&lt;li>&lt;strong>Extensible Connectors&lt;/strong>: Rich ecosystem of data source connectors&lt;/li>
&lt;/ul>
&lt;h3 id="key-benefits-for-our-clients">&lt;strong>Key Benefits for Our Clients&lt;/strong>&lt;/h3>
&lt;h4 id="1-lightning-fast-analytics">1. &lt;strong>Lightning-Fast Analytics&lt;/strong>&lt;/h4>
&lt;p>Interactive query performance enables real-time business intelligence and ad-hoc analysis.&lt;/p>
&lt;h4 id="2-data-source-flexibility">2. &lt;strong>Data Source Flexibility&lt;/strong>&lt;/h4>
&lt;p>Query any data source with a single SQL interface, eliminating data silos.&lt;/p></description></item></channel></rss>