ページの先頭行へ戻る
Big Data Integration ServerV1.3.0 リリース情報
FUJITSU Software

2.1.4 Apache Sparkに関する非互換

Apache Sparkに関する非互換を以下に示します。

基準

代表的な事例

JIRA

Summary

外部仕様の変更

コマンド仕様の変更(注1)

SPARK-19287

JavaPairRDD flatMapValues requires function returning Iterable, not Iterator

SPARK-23429

Add executor memory metrics to heartbeat and expose in executors REST API

SPARK-24958

Add executors' process tree total memory information to heartbeat signals

SPARK-25865

Add GC information to ExecutorMetrics

SPARK-26140

Enable custom shuffle metrics implementation in shuffle reader

SPARK-26141

Enable custom shuffle metrics implementation in shuffle write

SPARK-26877

Support user-level app staging directory in yarn mode when spark.yarn.stagingDir specified

SPARK-27071

Expose additional metrics in status.api.v1.StageData

SPARK-27575

Spark overwrites existing value of spark.yarn.dist.* instead of merging value

SPARK-31449

Investigate the difference between JDK and Spark's time zone offset calculation

オプションの内容/値の変更/省略値の変更(注2)

SPARK-23472

Add config properties for administrator JVM options

SPARK-24203

Make executor's bindAddress configurable

SPARK-25040

Empty string should be disallowed for data types except for string and binary types in JSON

SPARK-25641

Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100

SPARK-26089

Handle large corrupt shuffle blocks

SPARK-26771

Make .unpersist(), .destroy() consistently non-blocking by default

SPARK-27868

Better document shuffle / RPC listen backlog

SPARK-31582

Being able to not populate Hadoop classpath

チェック強化(注3)

SPARK-26340

Ensure cores per executor is greater than cpu per task

SPARK-26530

Validate heartheat arguments in HeartbeatReceiver

SPARK-31968

write.partitionBy() creates duplicate subdirectories when user provides duplicate columns

公開しているファイルの内容/形式(注4)

SPARK-22860

Spark workers log ssl passwords passed to the executors

SPARK-23191

Workers registration failes in case of network drop

SPARK-25118

Need a solution to persist Spark application console outputs when running in shell/yarn client mode

SPARK-25855

Don't use Erasure Coding for event log files

SPARK-29112

Expose more details when ApplicationMaster reporter faces a fatal exception

メッセージ内容の変更(注5)

SPARK-24345

Improve ParseError stop location when offending symbol is a token

SPARK-24355

Improve Spark shuffle server responsiveness to non-ChunkFetch requests

SPARK-24544

Print actual failure cause when look up function failed

SPARK-25683

Updated the log for the firstTime event Drop occurs.

SPARK-25689

Move token renewal logic to driver in yarn-client mode

SPARK-25712

Improve usage message of start-master.sh and start-slave.sh

SPARK-25773

Cancel zombie tasks in a result stage when the job finishes

SPARK-26117

use SparkOutOfMemoryError instead of OutOfMemoryError when catch exception

SPARK-26195

Correct exception messages in some classes

SPARK-26529

Add debug logs for confArchive when preparing local resource

SPARK-26600

Update spark-submit usage message

SPARK-26660

Add warning logs for large taskBinary size

SPARK-26697

ShuffleBlockFetcherIterator can log block sizes in addition to num blocks

SPARK-27010

find out the actual port number when hive.server2.thrift.port=0

SPARK-27192

spark.task.cpus should be less or equal than spark.task.cpus when use static executor allocation

SPARK-27219

Misleading exceptions in transport code's SASL fallback path

SPARK-27989

Add retries on the connection to the driver

SPARK-28676

Avoid Excessive logging from ContextCleaner

SPARK-28907

Review invalid usage of new Configuration()

SPARK-28929

Spark Logging level should be INFO instead of Debug in Executor Plugin API[SPARK-24918]

SPARK-29070

Make SparkLauncher log full spark-submit command line

SPARK-29833

Add FileNotFoundException check for spark.yarn.jars

SPARK-29885

Improve the exception message when reading the daemon port

SPARK-31485

Barrier stage can hang if only partial tasks launched

SPARK-31532

SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession

SPARK-31941

Handling the exception in SparkUI for getSparkUser method

SPARK-32003

Shuffle files for lost executor are not unregistered if fetch failure occurs after executor is lost

SPARK-32560

improve exception message

メッセージの追加・削除(注6)

SPARK-9853

Optimize shuffle fetch of contiguous partition IDs

SPARK-22590

Broadcast thread propagates the localProperties to task

SPARK-25829

remove duplicated map keys with last wins policy

SPARK-26060

Track SparkConf entries and make SET command reject such entries.

SPARK-26892

saveAsTextFile throws NullPointerException when null row present

SPARK-27348

HeartbeatReceiver doesn't remove lost executors from CoarseGrainedSchedulerBackend

SPARK-27637

If exception occured while fetching blocks by netty block transfer service, check whether the relative executor is alive before retry

SPARK-27665

Split fetch shuffle blocks protocol from OpenBlocks

SPARK-28483

Canceling a spark job using barrier mode but barrier tasks do not exit

SPARK-30416

Log a warning for deprecated SQL config in `set()` and `unset()`

SPARK-30590

can't use more than five type-safe user-defined aggregation in select statement

SPARK-3137

Use finer grained locking in TorrentBroadcast.readObject

SPARK-31632

The ApplicationInfo in KVStore may be accessed before it's prepared

使用リソースの増加

使用メモリ量の増加

SPARK-25035

Replicating disk-stored blocks should avoid memory mapping

SPARK-25998

TorrentBroadcast holds strong reference to broadcast object

実行結果の変更

誤った実装の修正(注7)

SPARK-23643

XORShiftRandom.hashSeed allocates unnecessary memory

SPARK-29273

Spark peakExecutionMemory metrics is zero

SPARK-30752

Wrong result of to_utc_timestamp() on daylight saving day

SPARK-30793

Wrong truncations of timestamps before the epoch to minutes and seconds

SPARK-30826

LIKE returns wrong result from external table using parquet

SPARK-30857

Wrong truncations of timestamps before the epoch to hours and days

SPARK-31456

If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

SPARK-31500

collect_set() of BinaryType returns duplicate elements

SPARK-31519

Cast in having aggregate expressions returns the wrong result

SPARK-31663

Grouping sets with having clause returns the wrong result

SPARK-31935

Hadoop file system config should be effective in data source options

SPARK-32115

Incorrect results for SUBSTRING when overflow

SPARK-32167

nullability of GetArrayStructFields is incorrect

SPARK-32364

Use CaseInsensitiveMap for DataFrameReader/Writer options

SPARK-32377

CaseInsensitiveMap should be deterministic for addition

SPARK-32693

Compare two dataframes with same schema except nullable property

SPARK-32810

CSV/JSON data sources should avoid globbing paths when inferring schema

注1)実行結果、実行権限、実行多重度の変更など

注2)設定画面、操作画面など、画面情報含む

注3)指定可能範囲の変更、定義間の整合チェック、チェックの厳密化による有効範囲の拡大/縮小

注4)ログファイルの出力項目や形式の変更など

注5)ポップアップメッセージなどの変更により従前の操作が変わるものを含む。メッセージ内容、メッセージレベルの変更、メッセージ改善

注6)既存機能を使用する範囲で障害修正、改善などによるメッセージ新規追加・削除。

注7)外部仕様に反した外部動作を正規の動作に修正する場合、または誤った解釈の基で実装した標準的な技術の動作を正規の動作に修正する場合

参照

詳細は、下記のサイトを参照してください。

https://issues.apache.org/jira/secure/Dashboard.jspa