소스 코드 및 NVMe 프로토콜 버전

  • SPDK: spdk-17.07.1
  • DPDK: dpdk-17.08
  • NVMe Spec. Rev. 1.4c

SPDK/DPDK Code 분석 - SSD Device Scan

1. NVMe SSD 식별 방법

NVMe SSD와 같은 PCIe Device를 어떻게 식별하는지 코드를 통해 알아 봅니다. 시스템 부착된 PCI Device를 식별하는 방법은 Linux 커널 NVMe 드라이버가 사용 방식(Device ID + Vendor ID 이용)과 SPDK가 사용 하는 방식(Class Code) 이 있습니다.

/* spdk-17.07.1/include/spdk/pci_ids.h */

52 /**
53  * PCI class code for NVMe devices.
54  *
55  * Base class code 01h: mass storage
56  * Subclass code 08h: non-volatile memory
57  * Programming interface 02h: NVM Express
58  */
59 #define SPDK_PCI_CLASS_NVME          0x010802
  • Class Code(0x010802)는 NVMe Spec. Rev. 1.4c Chap2 PCI Header에 다음과 같이 정의되어 있습니다 .

img


2. Hello World 예제

새로운 언어나 개발 키트를 배우기 시작할 때 가장 첫번째 예제는 항상 “Hello World” 입니다. SPDK도 예외는 아닙니다. hello_world.c 파일에 구현된 main() 함수가 SPDK/NVMe 드라이버 API를 사용하는 방법을 살펴보면 NVMe SSD를 사용하는 방법을 이해하는데 도움이 됩니다.

/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */

306 int main(int argc, char **argv)
307 {
308     int rc;
309     struct spdk_env_opts opts;
310
311     /*
312      * SPDK relies on an abstraction around the local environment
313      * named env that handles memory allocation and PCI device operations.
314      * This library must be initialized first.
315      *
316      */
317     spdk_env_opts_init(&opts);
318     opts.name = "hello_world";
319     opts.shm_id = 0;
320     spdk_env_init(&opts);
321
322     printf("Initializing NVMe Controllers\n");
323
324     /*
325      * Start the SPDK NVMe enumeration process.  probe_cb will be called
326      *  for each NVMe controller found, giving our application a choice on
327      *  whether to attach to each controller.  attach_cb will then be
328      *  called for each controller after the SPDK NVMe driver has completed
329      *  initializing the controller we chose to attach.
330      */
331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
332     if (rc != 0) {
333             fprintf(stderr, "spdk_nvme_probe() failed\n");
334             cleanup();
335             return 1;
336     }
337
338     if (g_controllers == NULL) {
339             fprintf(stderr, "no NVMe controllers found\n");
340             cleanup();
341             return 1;
342     }
343
344     printf("Initialization complete.\n");
345     hello_world();
346     cleanup();
347     return 0;
348 }

2.1 main() 함수

main()의 처리 흐름은 다음과 같습니다.

001 - 317     spdk_env_opts_init(&opts);
002 - 320     spdk_env_init(&opts);
003 - 331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
004 - 345     hello_world();
005 - 346     cleanup();
  • 001-002 spdk 운영 환경 초기화
  • 003 다음에 분석할 핵심 기능인 spdk_nvme_probe() 함수를 호출하여 NVMe SSD 장치를 검색합니다.
  • 004 hello_world() 함수를 호출하여 간단한 읽기 및 쓰기 작업을 수행합니다.
  • 005 cleanup() 함수를 호출하여 메모리 리소스를 해제하고 NVMe SSD 장치를 분리하는 등의 작업을 수행합니다.

spdk_nvme_probe()를 분석하기 전에 아래 두 가지 질문에 대해 생각해 봅시다.

  • 질문 1: 각 NVMe SSD에는 컨트롤러(Controller)가 있으므로 발견된 모든 NVMe SSD(즉, NVMe 컨트롤러)는 어떻게 함께 구성됩니까?
  • 질문 2: 각 NVMe SSD는 여러 네임스페이스(논리 파티션 개념과 유사)로 나눌 수 있습니다. 그렇다면 이러한 네임스페이스는 어떻게 함께 구성됩니까?

숙련된 C 프로그래머의 경우, 이 두 가지 질문에 답하는 것은 매우 쉽습니다. 바로 연결 목록입니다. hello_world.c도 마찬가지입니다.

L39-53 코드를 살펴보십시오.

/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */

39 struct ctrlr_entry {
40      struct spdk_nvme_ctrlr  *ctrlr;
41      struct ctrlr_entry      *next;
42      char                    name[1024];
43 };
44
45 struct ns_entry {
46      struct spdk_nvme_ctrlr  *ctrlr;
47      struct spdk_nvme_ns     *ns;
48      struct ns_entry         *next;
49      struct spdk_nvme_qpair  *qpair;
50 };
51
52 static struct ctrlr_entry *g_controllers = NULL;
53 static struct ns_entry *g_namespaces = NULL;
  • g_controllers는 모든 NVMe SSD(예: NVMe 컨트롤러)를 관리하는 글로벌 연결 목록의 헤드입니다.
  • g_namespaces는 모든 네임스페이스를 관리하는 글로벌 목록의 헤드입니다.

main()의 L338-342로 돌아가서 g_controllers 포인터가 NULL인 경우, NVMe SSD 디스크를 찾지 못했기 때문에 정리 후 종료되됩니다.

/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */

338     if (g_controllers == NULL) {
339             fprintf(stderr, "no NVMe controllers found\n");
340             cleanup();
341             return 1;
342     }


2.2 spdk_nvme_probe() 함수

이제 hello_world.c가 spdk_nvme_probe()를 사용하는 방법을 살펴봅니다.

/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */

331     rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);

변수명을 미뤄볼 때, probe_cb와 attach_cb는 두 개의 콜백 함수입니다(사실 L331에서 사용하지 않는 remove_cb도 있습니다).

  • probe_cb: NVMe 장치가 열거될 때 호출됩니다.
  • attach_cb: NVMe 장치가 사용자 모드 NVMe 드라이버에 연결(마운트?)되었을 때 호출됩니다.

probe_cb, attach_cb 및 remove_cb의 관련 정의는 다음과 같습니다.

/* spdk-17.07.1/include/spdk/nvme.h */

268 /**
269  * Callback for spdk_nvme_probe() enumeration.
270  *
271  * \param opts NVMe controller initialization options.  This structure will be populated with the
272  * default values on entry, and the user callback may update any options to request a different
273  * value.  The controller may not support all requested parameters, so the final values will be
274  * provided during the attach callback.
275  * \return true to attach to this device.
276  */
277 typedef bool (*spdk_nvme_probe_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
278                                struct spdk_nvme_ctrlr_opts *opts);
279
280 /**
281  * Callback for spdk_nvme_probe() to report a device that has been attached to the userspace NVMe driver.
282  *
283  * \param opts NVMe controller initialization options that were actually used.  Options may differ
284  * from the requested options from the probe call depending on what the controller supports.
285  */
286 typedef void (*spdk_nvme_attach_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
287                                 struct spdk_nvme_ctrlr *ctrlr,
288                                 const struct spdk_nvme_ctrlr_opts *opts);
289
290 /**
291  * Callback for spdk_nvme_probe() to report that a device attached to the userspace NVMe driver
292  * has been removed from the system.
293  *
294  * The controller will remain in a failed state (any new I/O submitted will fail).
295  *
296  * The controller must be detached from the userspace driver by calling spdk_nvme_detach()
297  * once the controller is no longer in use.  It is up to the library user to ensure that
298  * no other threads are using the controller before calling spdk_nvme_detach().
299  *
300  * \param ctrlr NVMe controller instance that was removed.
301  */
302 typedef void (*spdk_nvme_remove_cb)(void *cb_ctx, struct spdk_nvme_ctrlr *ctrlr);
303
304 /**
305  * \brief Enumerate the bus indicated by the transport ID and attach the userspace NVMe driver
306  * to each device found if desired.
307  *
308  * \param trid The transport ID indicating which bus to enumerate. If the trtype is PCIe or trid is NULL,
309  * this will scan the local PCIe bus. If the trtype is RDMA, the traddr and trsvcid must point at the
310  * location of an NVMe-oF discovery service.
311  * \param cb_ctx Opaque value which will be passed back in cb_ctx parameter of the callbacks.
312  * \param probe_cb will be called once per NVMe device found in the system.
313  * \param attach_cb will be called for devices for which probe_cb returned true once that NVMe
314  * controller has been attached to the userspace driver.
315  * \param remove_cb will be called for devices that were attached in a previous spdk_nvme_probe()
316  * call but are no longer attached to the system. Optional; specify NULL if removal notices are not
317  * desired.
318  *
319  * This function is not thread safe and should only be called from one thread at a time while no
320  * other threads are actively using any NVMe devices.
321  *
322  * If called from a secondary process, only devices that have been attached to the userspace driver
323  * in the primary process will be probed.
324  *
325  * If called more than once, only devices that are not already attached to the SPDK NVMe driver
326  * will be reported.
327  *
328  * To stop using the the controller and release its associated resources,
329  * call \ref spdk_nvme_detach with the spdk_nvme_ctrlr instance returned by this function.
330  */
331 int spdk_nvme_probe(const struct spdk_nvme_transport_id *trid,
332                 void *cb_ctx,
333                 spdk_nvme_probe_cb probe_cb,
334                 spdk_nvme_attach_cb attach_cb,
335                 spdk_nvme_remove_cb remove_cb);

proce_cb, attach_cb, remove_cb에 현혹되지 않기 위해 struct spdk_nvme_transport_id 구조체 및 spdk_nvme_probe() 함수의 주요 로직을 살펴보겠습니다.

/* spdk-17.07.1/include/spdk/nvme.h */

142 /**
143  * NVMe transport identifier.
144  *
145  * This identifies a unique endpoint on an NVMe fabric.
146  *
147  * A string representation of a transport ID may be converted to this type using
148  * spdk_nvme_transport_id_parse().
149  */
150 struct spdk_nvme_transport_id {
151     /**
152      * NVMe transport type.
153      */
154     enum spdk_nvme_transport_type trtype;
155
156     /**
157      * Address family of the transport address.
158      *
159      * For PCIe, this value is ignored.
160      */
161     enum spdk_nvmf_adrfam adrfam;
162
163     /**
164      * Transport address of the NVMe-oF endpoint. For transports which use IP
165      * addressing (e.g. RDMA), this should be an IP address. For PCIe, this
166      * can either be a zero length string (the whole bus) or a PCI address
167      * in the format DDDD:BB:DD.FF or DDDD.BB.DD.FF
168      */
169     char traddr[SPDK_NVMF_TRADDR_MAX_LEN + 1];
170
171     /**
172      * Transport service id of the NVMe-oF endpoint.  For transports which use
173      * IP addressing (e.g. RDMA), this field shoud be the port number. For PCIe,
174      * this is always a zero length string.
175      */
176     char trsvcid[SPDK_NVMF_TRSVCID_MAX_LEN + 1];
177
178     /**
179      * Subsystem NQN of the NVMe over Fabrics endpoint. May be a zero length string.
180      */
181     char subnqn[SPDK_NVMF_NQN_MAX_LEN + 1];
182 };

NVMe over PCIe의 경우 “NVMe 전송 유형” 항목을 통해 구분이 가능합니다.

/* spdk-17.07.1/include/spdk/nvme.h */

154    enum spdk_nvme_transport_type trtype;

L130-140 코드에 따르면 현재 PCIe 및 RDMA의 두 가지 전송 유형이 지원됩니다. PCIe를 통한 NVMe에 관심이 있기 때문에 RDMA 문제에 대해서는 다루지 않겠습니다 .

130 enum spdk_nvme_transport_type {
131     /**
132      * PCIe Transport (locally attached devices)
133      */
134     SPDK_NVME_TRANSPORT_PCIE = 256,
135
136     /**
137      * RDMA Transport (RoCE, iWARP, etc.)
138      */
139     SPDK_NVME_TRANSPORT_RDMA = SPDK_NVMF_TRTYPE_RDMA,
140 };

다음으로 spdk_nvme_probe() 함수의 코드를 살펴 봅니다 .

/* spdk-17.07.1/lib/nvme/nvme.c */

396 int
397 spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, void *cb_ctx,
398             spdk_nvme_probe_cb probe_cb, spdk_nvme_attach_cb attach_cb,
399             spdk_nvme_remove_cb remove_cb)
400 {
401     int rc;
402     struct spdk_nvme_ctrlr *ctrlr;
403     struct spdk_nvme_transport_id trid_pcie;
404
405     rc = nvme_driver_init();
406     if (rc != 0) {
407             return rc;
408     }
409
410     if (trid == NULL) {
411             memset(&trid_pcie, 0, sizeof(trid_pcie));
412             trid_pcie.trtype = SPDK_NVME_TRANSPORT_PCIE;
413             trid = &trid_pcie;
414     }
415
416     if (!spdk_nvme_transport_available(trid->trtype)) {
417             SPDK_ERRLOG("NVMe trtype %u not available\n", trid->trtype);
418             return -1;
419     }
420
421     nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
422
423     nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
424
425     if (!spdk_process_is_primary()) {
426             TAILQ_FOREACH(ctrlr, &g_spdk_nvme_driver->attached_ctrlrs, tailq) {
427                     nvme_ctrlr_proc_get_ref(ctrlr);
428
429                     /*
430                      * Unlock while calling attach_cb() so the user can call other functions
431                      *  that may take the driver lock, like nvme_detach().
432                      */
433                     nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
434                     attach_cb(cb_ctx, &ctrlr->trid, ctrlr, &ctrlr->opts);
435                     nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
436             }
437
438             nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
439             return 0;
440     }
441
442     nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
443     /*
444      * Keep going even if one or more nvme_attach() calls failed,
445      *  but maintain the value of rc to signal errors when we return.
446      */
447
448     rc = nvme_init_controllers(cb_ctx, attach_cb);
449
450     return rc;
451 }

spdk_nvme_probe()의 처리 흐름은 다음과 같습니다.

001 - 405:     	rc = nvme_driver_init();
002 - 410-414: 	set trid if it is NULL
003 - 416:     	check NVMe trtype via spdk_nvme_transport_available(trid->trtype)
004 - 423:     	nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb); 
005 - 425:     	check spdk process is primary, if not, do something at L426-440
006 - 448:     	rc = nvme_init_controllers(cb_ctx, attach_cb);
  • 003 다음에 분석할 핵심 기능인 nvme_transport_ctrlr_scan() 함수를 호출하여 NVMe SSD 장치를 검색합니다.


2.3 nvme_transport_ctrlr_scan() 함수

다음으로 nvme_transport_ctrlr_scan() 함수를 살펴 보겠습니다.

/* spdk-17.07.1/lib/nvme/nvme.c */

423     nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
/* spdk-17.07.1/lib/nvme/nvme_transport.c#92 */

91 int
92 nvme_transport_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
93                        void *cb_ctx,
94                        spdk_nvme_probe_cb probe_cb,
95                        spdk_nvme_remove_cb remove_cb)
96 {
97      NVME_TRANSPORT_CALL(trid->trtype, ctrlr_scan, (trid, cb_ctx, probe_cb, remove_cb));
98 }

매크로 NVME_TRANSPORT_CALL의 정의는 다음과 같습니다. 따라서, nvme_transport_ctrlr_scan()은 PCIe를 통한 NVMe의 경우 nvme_pcie_ctrlr_scan() 호출로 변환됩니다. PCIe 및 RDMA의 두 가지 전송 유형에 따른 함수 호출 변환 기법을 눈여겨 보시기 바랍니다.

/* spdk-17.07.1/lib/nvme/nvme_transport.c#60 */

52 #define TRANSPORT_PCIE(func_name, args)      case SPDK_NVME_TRANSPORT_PCIE: return nvme_pcie_ ## func_name args;
..
60 #define NVME_TRANSPORT_CALL(trtype, func_name, args)         \
61      do {                                                    \
62              switch (trtype) {                               \
63              TRANSPORT_PCIE(func_name, args)                 \
64              TRANSPORT_FABRICS_RDMA(func_name, args)         \
65              TRANSPORT_DEFAULT(trtype)                       \
66              }                                               \
67              SPDK_UNREACHABLE();                             \
68      } while (0)
..


2.4 nvme_pcie_ctrlr_scan() 함수

다음으로 nvme_pcie_ctrlr_scan() 함수를 살펴 보겠습니다.

/* spdk-17.07.1/lib/nvme/nvme_pcie.c#620 */

619 int
620 nvme_pcie_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
621                  void *cb_ctx,
622                  spdk_nvme_probe_cb probe_cb,
623                  spdk_nvme_remove_cb remove_cb)
624 {
625     struct nvme_pcie_enum_ctx enum_ctx = {};
626
627     enum_ctx.probe_cb = probe_cb;
628     enum_ctx.cb_ctx = cb_ctx;
629
630     if (strlen(trid->traddr) != 0) {
631             if (spdk_pci_addr_parse(&enum_ctx.pci_addr, trid->traddr)) {
632                     return -1;
633             }
634             enum_ctx.has_pci_addr = true;
635     }
636
637     if (hotplug_fd < 0) {
638             hotplug_fd = spdk_uevent_connect();
639             if (hotplug_fd < 0) {
640                     SPDK_TRACELOG(SPDK_TRACE_NVME, "Failed to open uevent netlink socket\n");
641             }
642     } else {
643             _nvme_pcie_hotplug_monitor(cb_ctx, probe_cb, remove_cb);
644     }
645
646     if (enum_ctx.has_pci_addr == false) {
647             return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
648     } else {
649             return spdk_pci_nvme_device_attach(pcie_nvme_enum_cb, &enum_ctx, &enum_ctx.pci_addr);
650     }
651 }

L647에 해당하는 spck_pci_nvme_enumerate() 함수에 초점을 맞춥니다. 우리의 목표는 Classs Code를 사용하여 SSD 장치를 검색하는 방법을 이해하는 것이기 때문입니다.

/* spdk-17.07.1/lib/nvme/nvme_pcie.c */

647         return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);


2.5 spdk_pci_nvme_enumerate() 함수

다음으로 spdk_pci_nvme_enumerate() 함수를 살펴 보겠습니다.

/* spdk-17.07.1/lib/env_dpdk/pci_nvme.c */

81 int
82 spdk_pci_nvme_enumerate(spdk_pci_enum_cb enum_cb, void *enum_ctx)
83 {
84      return spdk_pci_enumerate(&g_nvme_pci_drv, enum_cb, enum_ctx);
85 }

참고 : L84의 첫 번째 매개변수는 전역 변수 g_nvme_pci_drv의 주소입니다 (전역 구조체 변수를 보는 것은 항상 흥미진진합니다 :-) ) .

/* spdk-17.07.1/lib/env_dpdk/pci_nvme.c */

38 static struct rte_pci_id nvme_pci_driver_id[] = {
39 #if RTE_VERSION >= RTE_VERSION_NUM(16, 7, 0, 1)
40      {
41              .class_id = SPDK_PCI_CLASS_NVME,
42              .vendor_id = PCI_ANY_ID,
43              .device_id = PCI_ANY_ID,
44              .subsystem_vendor_id = PCI_ANY_ID,
45              .subsystem_device_id = PCI_ANY_ID,
46      },
47 #else
48      {RTE_PCI_DEVICE(0x8086, 0x0953)},
49 #endif
50      { .vendor_id = 0, /* sentinel */ },
51 };
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54      .driver = {
55              .drv_flags      = RTE_PCI_DRV_NEED_MAPPING,
56              .id_table       = nvme_pci_driver_id,
..
66      },
67
68      .cb_fn = NULL,
69      .cb_arg = NULL,
70      .mtx = PTHREAD_MUTEX_INITIALIZER,
71      .is_registered = false,
72 };

아하! Class Code(SPDK_PCI_CLASS_NVME = 0x010802)와 관련이 있습니다 . 전역 변수 g_nvme_pci_drv는 라인 L53에서 정의되고 g_nvme_pci_drv.driver.id_table은 L38에서 정의됩니다.

38 static struct rte_pci_id nvme_pci_driver_id[] = {
..
41              .class_id = SPDK_PCI_CLASS_NVME,
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54      .driver = {
..
56              .id_table       = nvme_pci_driver_id,
..


2.6 spdk_pci_enumerate() 함수

이제 spdk_pci_enumerate()를 파고들어 SSD 장치가 어떻게 검색되는지 알아내면 됩니다…

/* spdk-17.07.1/lib/env_dpdk/pci.c#150 */

149 int
150 spdk_pci_enumerate(struct spdk_pci_enum_ctx *ctx,
151                spdk_pci_enum_cb enum_cb,
152                void *enum_ctx)
153 {
...
168
169 #if RTE_VERSION >= RTE_VERSION_NUM(17, 05, 0, 4)
170     if (rte_pci_probe() != 0) {
171 #else
172     if (rte_eal_pci_probe() != 0) {
173 #endif
...
184     return 0;
185 }

일부 코드는 생략하고 L170에 초점을 맞춥니다.

170     if (rte_pci_probe() != 0) {


2.7 rte_pci_probe() 함수

rte_pci_probe() 함수를 시작으로 DPDK의 내부에 대해 살펴 보겠습니다 . 코드는 다음과 같습니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#413 */

407 /*
408  * Scan the content of the PCI bus, and call the probe() function for
409  * all registered drivers that have a matching entry in its id_table
410  * for discovered devices.
411  */
412 int
413 rte_pci_probe(void)
414 {
415     struct rte_pci_device *dev = NULL;
416     size_t probed = 0, failed = 0;
417     struct rte_devargs *devargs;
418     int probe_all = 0;
419     int ret = 0;
420
421     if (rte_pci_bus.bus.conf.scan_mode != RTE_BUS_SCAN_WHITELIST)
422             probe_all = 1;
423
424     FOREACH_DEVICE_ON_PCIBUS(dev) {
425             probed++;
426
427             devargs = dev->device.devargs;
428             /* probe all or only whitelisted devices */
429             if (probe_all)
430                     ret = pci_probe_all_drivers(dev);
431             else if (devargs != NULL &&
432                     devargs->policy == RTE_DEV_WHITELISTED)
433                     ret = pci_probe_all_drivers(dev);
434             if (ret < 0) {
435                     RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
436                              " cannot be used\n", dev->addr.domain, dev->addr.bus,
437                              dev->addr.devid, dev->addr.function);
438                     rte_errno = errno;
439                     failed++;
440                     ret = 0;
441             }
442     }
443
444     return (probed && probed == failed) ? -1 : 0;
445 }

L430가 우리의 관심의 대상입니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */

430             ret = pci_probe_all_drivers(dev);


2.8 pci_probe_all_drivers () 함수

함수의 구현은 다음과 같습니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#307 */

301 /*
302  * If vendor/device ID match, call the probe() function of all
303  * registered driver for the given device. Return -1 if initialization
304  * failed, return 1 if no driver is found for this device.
305  */
306 static int
307 pci_probe_all_drivers(struct rte_pci_device *dev)
308 {
309     struct rte_pci_driver *dr = NULL;
310     int rc = 0;
311
312     if (dev == NULL)
313             return -1;
314
315     /* Check if a driver is already loaded */
316     if (dev->driver != NULL)
317             return 0;
318
319     FOREACH_DRIVER_ON_PCIBUS(dr) {
320             rc = rte_pci_probe_one_driver(dr, dev);
321             if (rc < 0)
322                     /* negative value is an error */
323                     return -1;
324             if (rc > 0)
325                     /* positive value means driver doesn't support it */
326                     continue;
327             return 0;
328     }
329     return 1;
330 }

L320가 우리의 관심의 대상입니다.

320             rc = rte_pci_probe_one_driver(dr, dev);


2.9 rte_pci_probe_one_driver() 함수

함수의 구현은 다음과 같습니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#200 */

195 /*
196  * If vendor/device ID match, call the probe() function of the
197  * driver.
198  */
199 static int
200 rte_pci_probe_one_driver(struct rte_pci_driver *dr,
201                      struct rte_pci_device *dev)
202 {
203     int ret;
204     struct rte_pci_addr *loc;
205
206     if ((dr == NULL) || (dev == NULL))
207             return -EINVAL;
208
209     loc = &dev->addr;
210
211     /* The device is not blacklisted; Check if driver supports it */
212     if (!rte_pci_match(dr, dev))
213             /* Match of device and driver failed */
214             return 1;
215
216     RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
217                     loc->domain, loc->bus, loc->devid, loc->function,
218                     dev->device.numa_node);
219
220     /* no initialization when blacklisted, return without error */
221     if (dev->device.devargs != NULL &&
222             dev->device.devargs->policy ==
223                     RTE_DEV_BLACKLISTED) {
224             RTE_LOG(INFO, EAL, "  Device is blacklisted, not"
225                     " initializing\n");
226             return 1;
227     }
228
229     if (dev->device.numa_node < 0) {
230             RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
231             dev->device.numa_node = 0;
232     }
233
234     RTE_LOG(INFO, EAL, "  probe driver: %x:%x %s\n", dev->id.vendor_id,
235             dev->id.device_id, dr->driver.name);
236
237     if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
238             /* map resources for devices that use igb_uio */
239             ret = rte_pci_map_device(dev);
240             if (ret != 0)
241                     return ret;
242     }
243
244     /* reference driver structure */
245     dev->driver = dr;
246     dev->device.driver = &dr->driver;
247
248     /* call the driver probe() function */
249     ret = dr->probe(dr, dev);
250     if (ret) {
251             dev->driver = NULL;
252             dev->device.driver = NULL;
253             if ((dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) &&
254                     /* Don't unmap if device is unsupported and
255                      * driver needs mapped resources.
256                      */
257                     !(ret > 0 &&
258                             (dr->drv_flags & RTE_PCI_DRV_KEEP_MAPPED_RES)))
259                     rte_pci_unmap_device(dev);
260     }
261
262     return ret;
263 }

L212가 우리의 관심의 대상입니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */

212     if (!rte_pci_match(dr, dev))


2.10 rte_pci_match() 함수

함수의 구현은 다음과 같습니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#163 */

151 /*
152  * Match the PCI Driver and Device using the ID Table
153  *
154  * @param pci_drv
155  *  PCI driver from which ID table would be extracted
156  * @param pci_dev
157  *  PCI device to match against the driver
158  * @return
159  *  1 for successful match
160  *  0 for unsuccessful match
161  */
162 static int
163 rte_pci_match(const struct rte_pci_driver *pci_drv,
164               const struct rte_pci_device *pci_dev)
165 {
166     const struct rte_pci_id *id_table;
167
168     for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
169          id_table++) {
170             /* check if device's identifiers match the driver's ones */
171             if (id_table->vendor_id != pci_dev->id.vendor_id &&
172                             id_table->vendor_id != PCI_ANY_ID)
173                     continue;
174             if (id_table->device_id != pci_dev->id.device_id &&
175                             id_table->device_id != PCI_ANY_ID)
176                     continue;
177             if (id_table->subsystem_vendor_id !=
178                 pci_dev->id.subsystem_vendor_id &&
179                 id_table->subsystem_vendor_id != PCI_ANY_ID)
180                     continue;
181             if (id_table->subsystem_device_id !=
182                 pci_dev->id.subsystem_device_id &&
183                 id_table->subsystem_device_id != PCI_ANY_ID)
184                     continue;
185             if (id_table->class_id != pci_dev->id.class_id &&
186                             id_table->class_id != RTE_CLASS_ANY_ID)
187                     continue;
188
189             return 1;
190     }
191
192     return 0;
193 }

아래 코드와 같이 드디어 SSD 장치가 어떻게 발견되는지 알아냈습니다.

/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */

185             if (id_table->class_id != pci_dev->id.class_id &&c
186                             id_table->class_id != RTE_CLASS_ANY_ID)
187                     continue;

rte_pci_driverrte_pci_device 구조체의 정의는 다음과 같습니다.

/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#100 */

96  /**
97   * A structure describing an ID for a PCI driver. Each driver provides a
98   * table of these IDs for each device that it supports.
99   */
100 struct rte_pci_id {
101     uint32_t class_id;            /**< Class ID (class, subclass, pi) or RTE_CLASS_ANY_ID. */
102     uint16_t vendor_id;           /**< Vendor ID or PCI_ANY_ID. */
103     uint16_t device_id;           /**< Device ID or PCI_ANY_ID. */
104     uint16_t subsystem_vendor_id; /**< Subsystem vendor ID or PCI_ANY_ID. */
105     uint16_t subsystem_device_id; /**< Subsystem device ID or PCI_ANY_ID. */
106 };

/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#120 */

120 /**
121  * A structure describing a PCI device.
122  */
123 struct rte_pci_device {
124     TAILQ_ENTRY(rte_pci_device) next;       /**< Next probed PCI device. */
125     struct rte_device device;               /**< Inherit core device */
126     struct rte_pci_addr addr;               /**< PCI location. */
127     struct rte_pci_id id;                   /**< PCI ID. */
128     struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE];
129                                             /**< PCI Memory Resource */
130     struct rte_intr_handle intr_handle;     /**< Interrupt handle */
131     struct rte_pci_driver *driver;          /**< Associated driver */
132     uint16_t max_vfs;                       /**< sriov enable if not zero */
133     enum rte_kernel_driver kdrv;            /**< Kernel driver passthrough */
134     char name[PCI_PRI_STR_SIZE+1];          /**< PCI location (ASCII) */
135 };

/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#178 */

175 /**
176  * A structure describing a PCI driver.
177  */
178 struct rte_pci_driver {
179     TAILQ_ENTRY(rte_pci_driver) next;       /**< Next in list. */
180     struct rte_driver driver;               /**< Inherit core driver. */
181     struct rte_pci_bus *bus;                /**< PCI bus reference. */
182     pci_probe_t *probe;                     /**< Device Probe function. */
183     pci_remove_t *remove;                   /**< Device Remove function. */
184     const struct rte_pci_id *id_table;      /**< ID table, NULL terminated. */
185     uint32_t drv_flags;                     /**< Flags contolling handling of device. */
186 };


3. 요약

지금까지 SSD 장치의 발견 과정을 다음과 같이 요약할 수 있습니다 .

  • 01 - Class Code(0x010802)를 SSD 장치 검색의 기반으로 사용
  • 02 - SSD 장치가 발견되면 SPDK에서 DPDK까지 함수 호출 스택은 다음과 같습니다.
00 hello_word.c
01 -> main()
02 --> spdk_nvme_probe()
03 ---> nvme_transport_ctrlr_scan()
04 ----> nvme_pcie_ctrlr_scan()
05 -----> spdk_pci_nvme_enumerate()
06 ------> spdk_pci_enumerate(&g_nvme_pci_drv, ...)                 | SPDK |
   =========================================================================
07 -------> rte_pci_probe()                                         | DPDK |
08 --------> pci_probe_all_drivers()
09 ---------> rte_pci_probe_one_driver()
10 ----------> rte_pci_match()
  • 03 - DPDK의 환경 추상화 계층(EAL: Environment Abstraction Layer)의 rte_pci_match() 함수는 SSD 장치를 검색하는 핵심 로직입니다.
  • 04 - DPDK 아키텍처에서 DPDK의 EAL 위치는 다음 그림과 같습니다.

img