Benchmark

Inference (GPU)

Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
CPU Assign=32 (Inference=4, API=26, Prometheus=1, Grafana=1)
Concurrency=100

Throughput

Models

Version

URI

GPU=T4

GPU=A10

GPU=A30

Face

1.9.0

/detect

672

683

810

/feature-only

674

980

1,025

/feature (+detect)

352

512

626

1.10.0

/detect

661

657

667

/feature-only

820

1,015

1,036

/feature (+detect)

405

587

620

Latency

Summary

* 처리과정: Detect (320x320) + Align + Feature-Extraction (112x112)
* 초당 특징점 추출 이미지 500 이상 (A10 기준)
* A10은, T4 대비 1.5 가량 성능 우위
* GPU 2개는, 1 대비 1.5 가량 성능 우위

Inference (CPU)

Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
CPU Assign=32 (Inference=28, API=4)
Concurrency=50

Throughput & Latency

Models

Version

URI

RPS

Avg-Latency (ms)

Face

1.0.4

/detect

130

367

/feature-only

78

615

/feature (+detect)

62

767

Matching

* Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
* Gallery= 100K | 250K | 500K
* CPU Assign=80 (Inference=72, API=8)
* Concurrency=100
* URI=/search

Throughput

Milvus Version

Gallery Size

URI

RPS

2.2.2

100K

/search

568

250K

/search

380

500K

/search

271

2.1.2 vs 2.2.2

* CPU Assignment (CPUs = 4|8|16|32|80)
  . CPUs= 4: (Milvus= 3, API=1)
  . CPUs= 8: (Milvus= 6, API=2)
  . CPUs=16: (Milvus=14, API=2)
  . CPUs=32: (Milvus=28, API=4)
  . CPUs=80: (Milvus=72, API=8)
* Concurrency=10~100