Apache Sparkに関する非互換を以下に示します。
基準 | 代表的な事例 | JIRA | Summary |
外部仕様の変更 | コマンド仕様の変更(注1) | SPARK-19287 | JavaPairRDD flatMapValues requires function returning Iterable, not Iterator |
SPARK-23429 | Add executor memory metrics to heartbeat and expose in executors REST API | ||
SPARK-24958 | Add executors' process tree total memory information to heartbeat signals | ||
SPARK-25865 | Add GC information to ExecutorMetrics | ||
SPARK-26140 | Enable custom shuffle metrics implementation in shuffle reader | ||
SPARK-26141 | Enable custom shuffle metrics implementation in shuffle write | ||
SPARK-26877 | Support user-level app staging directory in yarn mode when spark.yarn.stagingDir specified | ||
SPARK-27071 | Expose additional metrics in status.api.v1.StageData | ||
SPARK-27575 | Spark overwrites existing value of spark.yarn.dist.* instead of merging value | ||
SPARK-31449 | Investigate the difference between JDK and Spark's time zone offset calculation | ||
オプションの内容/値の変更/省略値の変更(注2) | SPARK-23472 | Add config properties for administrator JVM options | |
SPARK-24203 | Make executor's bindAddress configurable | ||
SPARK-25040 | Empty string should be disallowed for data types except for string and binary types in JSON | ||
SPARK-25641 | Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100 | ||
SPARK-26089 | Handle large corrupt shuffle blocks | ||
SPARK-26771 | Make .unpersist(), .destroy() consistently non-blocking by default | ||
SPARK-27868 | Better document shuffle / RPC listen backlog | ||
SPARK-31582 | Being able to not populate Hadoop classpath | ||
チェック強化(注3) | SPARK-26340 | Ensure cores per executor is greater than cpu per task | |
SPARK-26530 | Validate heartheat arguments in HeartbeatReceiver | ||
SPARK-31968 | write.partitionBy() creates duplicate subdirectories when user provides duplicate columns | ||
公開しているファイルの内容/形式(注4) | SPARK-22860 | Spark workers log ssl passwords passed to the executors | |
SPARK-23191 | Workers registration failes in case of network drop | ||
SPARK-25118 | Need a solution to persist Spark application console outputs when running in shell/yarn client mode | ||
SPARK-25855 | Don't use Erasure Coding for event log files | ||
SPARK-29112 | Expose more details when ApplicationMaster reporter faces a fatal exception | ||
メッセージ内容の変更(注5) | SPARK-24345 | Improve ParseError stop location when offending symbol is a token | |
SPARK-24355 | Improve Spark shuffle server responsiveness to non-ChunkFetch requests | ||
SPARK-24544 | Print actual failure cause when look up function failed | ||
SPARK-25683 | Updated the log for the firstTime event Drop occurs. | ||
SPARK-25689 | Move token renewal logic to driver in yarn-client mode | ||
SPARK-25712 | Improve usage message of start-master.sh and start-slave.sh | ||
SPARK-25773 | Cancel zombie tasks in a result stage when the job finishes | ||
SPARK-26117 | use SparkOutOfMemoryError instead of OutOfMemoryError when catch exception | ||
SPARK-26195 | Correct exception messages in some classes | ||
SPARK-26529 | Add debug logs for confArchive when preparing local resource | ||
SPARK-26600 | Update spark-submit usage message | ||
SPARK-26660 | Add warning logs for large taskBinary size | ||
SPARK-26697 | ShuffleBlockFetcherIterator can log block sizes in addition to num blocks | ||
SPARK-27010 | find out the actual port number when hive.server2.thrift.port=0 | ||
SPARK-27192 | spark.task.cpus should be less or equal than spark.task.cpus when use static executor allocation | ||
SPARK-27219 | Misleading exceptions in transport code's SASL fallback path | ||
SPARK-27989 | Add retries on the connection to the driver | ||
SPARK-28676 | Avoid Excessive logging from ContextCleaner | ||
SPARK-28907 | Review invalid usage of new Configuration() | ||
SPARK-28929 | Spark Logging level should be INFO instead of Debug in Executor Plugin API[SPARK-24918] | ||
SPARK-29070 | Make SparkLauncher log full spark-submit command line | ||
SPARK-29833 | Add FileNotFoundException check for spark.yarn.jars | ||
SPARK-29885 | Improve the exception message when reading the daemon port | ||
SPARK-31485 | Barrier stage can hang if only partial tasks launched | ||
SPARK-31532 | SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession | ||
SPARK-31941 | Handling the exception in SparkUI for getSparkUser method | ||
SPARK-32003 | Shuffle files for lost executor are not unregistered if fetch failure occurs after executor is lost | ||
SPARK-32560 | improve exception message | ||
メッセージの追加・削除(注6) | SPARK-9853 | Optimize shuffle fetch of contiguous partition IDs | |
SPARK-22590 | Broadcast thread propagates the localProperties to task | ||
SPARK-25829 | remove duplicated map keys with last wins policy | ||
SPARK-26060 | Track SparkConf entries and make SET command reject such entries. | ||
SPARK-26892 | saveAsTextFile throws NullPointerException when null row present | ||
SPARK-27348 | HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend | ||
SPARK-27637 | If exception occured while fetching blocks by netty block transfer service, check whether the relative executor is alive before retry | ||
SPARK-27665 | Split fetch shuffle blocks protocol from OpenBlocks | ||
SPARK-28483 | Canceling a spark job using barrier mode but barrier tasks do not exit | ||
SPARK-30416 | Log a warning for deprecated SQL config in `set()` and `unset()` | ||
SPARK-30590 | can't use more than five type-safe user-defined aggregation in select statement | ||
SPARK-3137 | Use finer grained locking in TorrentBroadcast.readObject | ||
SPARK-31632 | The ApplicationInfo in KVStore may be accessed before it's prepared | ||
使用リソースの増加 | 使用メモリ量の増加 | SPARK-25035 | Replicating disk-stored blocks should avoid memory mapping |
SPARK-25998 | TorrentBroadcast holds strong reference to broadcast object | ||
実行結果の変更 | 誤った実装の修正(注7) | SPARK-23643 | XORShiftRandom.hashSeed allocates unnecessary memory |
SPARK-29273 | Spark peakExecutionMemory metrics is zero | ||
SPARK-30752 | Wrong result of to_utc_timestamp() on daylight saving day | ||
SPARK-30793 | Wrong truncations of timestamps before the epoch to minutes and seconds | ||
SPARK-30826 | LIKE returns wrong result from external table using parquet | ||
SPARK-30857 | Wrong truncations of timestamps before the epoch to hours and days | ||
SPARK-31456 | If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook | ||
SPARK-31500 | collect_set() of BinaryType returns duplicate elements | ||
SPARK-31519 | Cast in having aggregate expressions returns the wrong result | ||
SPARK-31663 | Grouping sets with having clause returns the wrong result | ||
SPARK-31935 | Hadoop file system config should be effective in data source options | ||
SPARK-32115 | Incorrect results for SUBSTRING when overflow | ||
SPARK-32167 | nullability of GetArrayStructFields is incorrect | ||
SPARK-32364 | Use CaseInsensitiveMap for DataFrameReader/Writer options | ||
SPARK-32377 | CaseInsensitiveMap should be deterministic for addition | ||
SPARK-32693 | Compare two dataframes with same schema except nullable property | ||
SPARK-32810 | CSV/JSON data sources should avoid globbing paths when inferring schema |
注1)実行結果、実行権限、実行多重度の変更など
注2)設定画面、操作画面など、画面情報含む
注3)指定可能範囲の変更、定義間の整合チェック、チェックの厳密化による有効範囲の拡大/縮小
注4)ログファイルの出力項目や形式の変更など
注5)ポップアップメッセージなどの変更により従前の操作が変わるものを含む。メッセージ内容、メッセージレベルの変更、メッセージ改善
注6)既存機能を使用する範囲で障害修正、改善などによるメッセージ新規追加・削除。
注7)外部仕様に反した外部動作を正規の動作に修正する場合、または誤った解釈の基で実装した標準的な技術の動作を正規の動作に修正する場合
参照
詳細は、下記のサイトを参照してください。