Skip to content

Apache Pinot (incubating) 0.7.1

Compare
Choose a tag to compare
@xiangfu0 xiangfu0 released this 07 Apr 17:58
· 4971 commits to master since this release

Summary

This release introduced several awesome new features, including JSON index, lookup-based join support, geospatial support, TLS support for pinot connections, and various performance optimizations and improvements. It also adds several new APIs to better manage the segments and upload data to offline table. It also contains many key bug fixes. See details below.

The release was cut from the following commit: 78152cd

and the following cherry-picks:

Notable New Features

  • Add a server metric: queriesDisabled to check if queries disabled or not. (#6586)
  • Optimization on GroupKey to save the overhead of ser/de the group keys (#6593) (#6559)
  • Support validation for jsonExtractKey and jsonExtractScalar functions (#6246) (#6594)
  • Real-Time Provisioning Helper tool improvement to take data characteristics as input instead of an actual segment (#6546)
  • Add the isolation level config isolation.level to Kafka consumer (2.0) to ingest transactionally committed messages only (#6580)
  • Enhance StarTreeIndexViewer to support multiple trees (#6569)
  • Improves ADLSGen2PinotFS with service principal-based auth, auto-create container on the initial run. It's backward compatible with key-based auth. (#6531)
  • Add metrics for minion tasks status (#6549)
  • Use minion data directory as tmp directory for SegmentGenerationAndPushTask to ensure directory is always cleaned up (#6560)
  • Add optional HTTP basic auth to pinot broker, which enables user- and table-level authentication of incoming queries. (#6552)
  • Add Access Control for REST endpoints of Controller (#6507)
  • Add date_trunc to scalar functions to support date_trunc during ingestion (#6538)
  • Allow tar gz with > 8gb size (#6533)
  • Add Lookup UDF Join support (#6530), (#6465), (#6383) (#6286)
  • Add cron scheduler metrics reporting (#6502)
  • Support generating derived column during segment load, so that derived columns can be added on-the-fly (#6494)
  • Support chained transform functions (#6495)
  • Add scalar function JsonPathArray to extract arrays from json (#6490)
  • Add a guard against multiple consuming segments for same partition (#6483)
  • Remove the usage of deprecated range delimiter (#6475)
  • Handle scheduler calls with a proper response when it's disabled. (#6474)
  • Simplify SegmentGenerationAndPushTask handling getting schema and table config (#6469)
  • Add a cluster config to config number of concurrent tasks per instance for minion task: SegmentGenerationAndPushTaskGenerator (#6468)
  • Replace BrokerRequestOptimizer with QueryOptimizer to also optimize the PinotQuery (#6423)
  • Add additional string scalar functions (#6458)
  • Add additional scalar functions for array type (#6446)
  • Add CRON scheduler for Pinot tasks (#6451)
  • Set default Data Type while setting type in Add Schema UI dialog (#6452)
  • Add ImportData sub command in pinot admin (#6396)
  • H3-based geospatial index (#6409) (#6306)
  • Add JSON index support (#6408) (#6216) (#6346)
  • Make minion tasks pluggable via reflection (#6395)
  • Add compatibility test for segment operations upload and delete (#6382)
  • Add segment reset API that disables and then enables the segment (#6336)
  • Add Pinot minion segment generation and push task. (#6340)
  • Add a version option to pinot admin to show all the component versions (#6380)
  • Add FST index using Lucene lib to speedup REGEXP_LIKE operator on text (#6120)
  • Add APIs for uploading data to an offline table. (#6354)
  • Allow the use of environment variables in stream configs (#6373)
  • Enhance task schedule api for single type/table support (#6352)
  • Add broker time range based pruner for routing. Query operators supported: RANGE, =, <, <=, >, >=, AND, OR(#6259)
  • Add json path functions to extract values from json object (#6347)
  • Create a pluggable interface for Table config tuner (#6255)
  • Add a Controller endpoint to return table creation time (#6331)
  • Add tooltips, ability to enable-disable table state to the UI (#6327)
  • Add Pinot Minion client (#6339)
  • Add more efficient use of RoaringBitmap in OnHeapBitmapInvertedIndexCreator and OffHeapBitmapInvertedIndexCreator (#6320)
  • Add decimal percentile support. (#6323)
  • Add API to get status of consumption of a table (#6322)
  • Add support to add offline and realtime tables, individually able to add schema and schema listing in UI (#6296)
  • Improve performance for distinct queries (#6285)
  • Allow adding custom configs during the segment creation phase (#6299)
  • Use sorted index based filtering only for dictionary encoded column (#6288)
  • Enhance forward index reader for better performance (#6262)
  • Support for text index without raw (#6284)
  • Add api for cluster manager to get table state (#6211)
  • Perf optimization for SQL GROUP BY ORDER BY (#6225)
  • Add support using environment variables in the format of ${VAR_NAME:DEFAULT_VALUE} in Pinot table configs. (#6271)

Special notes

  • Pinot controller metrics prefix is fixed to add a missing dot (#6499). This is a backward-incompatible change that JMX query on controller metrics must be updated

  • Legacy group key delimiter (\t) was removed to be backward-compatible with release 0.5.0 (#6589)

  • Upgrade zookeeper version to 3.5.8 to fix ZOOKEEPER-2184: Zookeeper Client should re-resolve hosts when connection attempts fail. (#6558)

  • Add TLS-support for client-pinot and pinot-internode connections (#6418)
    Upgrades to a TLS-enabled cluster can be performed safely and without downtime. To achieve a live-upgrade, go through the following steps:

    • First, configure alternate ingress ports for https/netty-tls on brokers, controllers, and servers. Restart the components with a rolling strategy to avoid cluster downtime.
    • Second, verify manually that https access to controllers and brokers is live. Then, configure all components to prefer TLS-enabled connections (while still allowing unsecured access). Restart the individual components.
    • Third, disable insecure connections via configuration. You may also have to set controller.vip.protocol and controller.vip.port and update the configuration files of any ingestion jobs. Restart components a final time and verify that insecure ingress via http is not available anymore.
  • PQL endpoint on Broker is deprecated (#6607)

    • Apache Pinot has adopted SQL syntax and semantics. Legacy PQL (Pinot Query Language) is deprecated and no longer supported. Please use SQL syntax to query Pinot on broker endpoint /query/sql and controller endpoint /sql

Major Bug fixes

  • Fix the SIGSEGV for large index (#6577)
  • Handle creation of segments with 0 rows so segment creation does not fail if the data source has 0 rows. (#6466)
  • Fix QueryRunner tool for multiple runs (#6582)
  • Use URL encoding for the generated segment tar name to handle characters that cannot be parsed to URI. (#6571)
  • Fix a bug of miscounting the top nodes in StarTreeIndexViewer (#6569)
  • Fix the raw bytes column in real-time segment (#6574)
  • Fixes a bug to allow using JSON_MATCH predicate in SQL queries (#6535)
  • Fix the overflow issue when loading the large dictionary into the buffer (#6476)
  • Fix empty data table for distinct query (#6363)
  • Fix the default map return value in DictionaryBasedGroupKeyGenerator (#6712)
  • Fix log message in ControllerPeriodicTask (#6709)
  • Fix bug #6671: RealtimeTableDataManager shuts down SegmentBuildTimeLeaseExtender for all tables in the host (#6682)
  • Fix license headers and plugin checks