SPDK/DPDK 코드 분석 - SSD Device Scan
소스 코드 및 NVMe 프로토콜 버전
- SPDK: spdk-17.07.1
- DPDK: dpdk-17.08
- NVMe Spec. Rev. 1.4c
SPDK/DPDK Code 분석 - SSD Device Scan
1. NVMe SSD 식별 방법
NVMe SSD와 같은 PCIe Device를 어떻게 식별하는지 코드를 통해 알아 봅니다. 시스템 부착된 PCI Device를 식별하는 방법은 Linux 커널 NVMe 드라이버가 사용 방식(Device ID + Vendor ID 이용)과 SPDK가 사용 하는 방식(Class Code) 이 있습니다.
/* spdk-17.07.1/include/spdk/pci_ids.h */
52 /**
53 * PCI class code for NVMe devices.
54 *
55 * Base class code 01h: mass storage
56 * Subclass code 08h: non-volatile memory
57 * Programming interface 02h: NVM Express
58 */
59 #define SPDK_PCI_CLASS_NVME 0x010802
- Class Code(0x010802)는 NVMe Spec. Rev. 1.4c Chap2 PCI Header에 다음과 같이 정의되어 있습니다 .
2. Hello World 예제
새로운 언어나 개발 키트를 배우기 시작할 때 가장 첫번째 예제는 항상 “Hello World” 입니다. SPDK도 예외는 아닙니다. hello_world.c 파일에 구현된 main() 함수가 SPDK/NVMe 드라이버 API를 사용하는 방법을 살펴보면 NVMe SSD를 사용하는 방법을 이해하는데 도움이 됩니다.
/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */
306 int main(int argc, char **argv)
307 {
308 int rc;
309 struct spdk_env_opts opts;
310
311 /*
312 * SPDK relies on an abstraction around the local environment
313 * named env that handles memory allocation and PCI device operations.
314 * This library must be initialized first.
315 *
316 */
317 spdk_env_opts_init(&opts);
318 opts.name = "hello_world";
319 opts.shm_id = 0;
320 spdk_env_init(&opts);
321
322 printf("Initializing NVMe Controllers\n");
323
324 /*
325 * Start the SPDK NVMe enumeration process. probe_cb will be called
326 * for each NVMe controller found, giving our application a choice on
327 * whether to attach to each controller. attach_cb will then be
328 * called for each controller after the SPDK NVMe driver has completed
329 * initializing the controller we chose to attach.
330 */
331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
332 if (rc != 0) {
333 fprintf(stderr, "spdk_nvme_probe() failed\n");
334 cleanup();
335 return 1;
336 }
337
338 if (g_controllers == NULL) {
339 fprintf(stderr, "no NVMe controllers found\n");
340 cleanup();
341 return 1;
342 }
343
344 printf("Initialization complete.\n");
345 hello_world();
346 cleanup();
347 return 0;
348 }
2.1 main() 함수
main()의 처리 흐름은 다음과 같습니다.
001 - 317 spdk_env_opts_init(&opts);
002 - 320 spdk_env_init(&opts);
003 - 331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
004 - 345 hello_world();
005 - 346 cleanup();
- 001-002 spdk 운영 환경 초기화
- 003 다음에 분석할 핵심 기능인 spdk_nvme_probe() 함수를 호출하여 NVMe SSD 장치를 검색합니다.
- 004 hello_world() 함수를 호출하여 간단한 읽기 및 쓰기 작업을 수행합니다.
- 005 cleanup() 함수를 호출하여 메모리 리소스를 해제하고 NVMe SSD 장치를 분리하는 등의 작업을 수행합니다.
spdk_nvme_probe()를 분석하기 전에 아래 두 가지 질문에 대해 생각해 봅시다.
- 질문 1: 각 NVMe SSD에는 컨트롤러(Controller)가 있으므로 발견된 모든 NVMe SSD(즉, NVMe 컨트롤러)는 어떻게 함께 구성됩니까?
- 질문 2: 각 NVMe SSD는 여러 네임스페이스(논리 파티션 개념과 유사)로 나눌 수 있습니다. 그렇다면 이러한 네임스페이스는 어떻게 함께 구성됩니까?
숙련된 C 프로그래머의 경우, 이 두 가지 질문에 답하는 것은 매우 쉽습니다. 바로 연결 목록입니다. hello_world.c도 마찬가지입니다.
L39-53 코드를 살펴보십시오.
/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */
39 struct ctrlr_entry {
40 struct spdk_nvme_ctrlr *ctrlr;
41 struct ctrlr_entry *next;
42 char name[1024];
43 };
44
45 struct ns_entry {
46 struct spdk_nvme_ctrlr *ctrlr;
47 struct spdk_nvme_ns *ns;
48 struct ns_entry *next;
49 struct spdk_nvme_qpair *qpair;
50 };
51
52 static struct ctrlr_entry *g_controllers = NULL;
53 static struct ns_entry *g_namespaces = NULL;
- g_controllers는 모든 NVMe SSD(예: NVMe 컨트롤러)를 관리하는 글로벌 연결 목록의 헤드입니다.
- g_namespaces는 모든 네임스페이스를 관리하는 글로벌 목록의 헤드입니다.
main()의 L338-342로 돌아가서 g_controllers 포인터가 NULL인 경우, NVMe SSD 디스크를 찾지 못했기 때문에 정리 후 종료되됩니다.
/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */
338 if (g_controllers == NULL) {
339 fprintf(stderr, "no NVMe controllers found\n");
340 cleanup();
341 return 1;
342 }
2.2 spdk_nvme_probe() 함수
이제 hello_world.c가 spdk_nvme_probe()를 사용하는 방법을 살펴봅니다.
/* spdk-17.07.1/examples/nvme/hello_world/hello_world.c */
331 rc = spdk_nvme_probe(NULL, NULL, probe_cb, attach_cb, NULL);
변수명을 미뤄볼 때, probe_cb와 attach_cb는 두 개의 콜백 함수입니다(사실 L331에서 사용하지 않는 remove_cb도 있습니다).
- probe_cb: NVMe 장치가 열거될 때 호출됩니다.
- attach_cb: NVMe 장치가 사용자 모드 NVMe 드라이버에 연결(마운트?)되었을 때 호출됩니다.
probe_cb, attach_cb 및 remove_cb의 관련 정의는 다음과 같습니다.
/* spdk-17.07.1/include/spdk/nvme.h */
268 /**
269 * Callback for spdk_nvme_probe() enumeration.
270 *
271 * \param opts NVMe controller initialization options. This structure will be populated with the
272 * default values on entry, and the user callback may update any options to request a different
273 * value. The controller may not support all requested parameters, so the final values will be
274 * provided during the attach callback.
275 * \return true to attach to this device.
276 */
277 typedef bool (*spdk_nvme_probe_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
278 struct spdk_nvme_ctrlr_opts *opts);
279
280 /**
281 * Callback for spdk_nvme_probe() to report a device that has been attached to the userspace NVMe driver.
282 *
283 * \param opts NVMe controller initialization options that were actually used. Options may differ
284 * from the requested options from the probe call depending on what the controller supports.
285 */
286 typedef void (*spdk_nvme_attach_cb)(void *cb_ctx, const struct spdk_nvme_transport_id *trid,
287 struct spdk_nvme_ctrlr *ctrlr,
288 const struct spdk_nvme_ctrlr_opts *opts);
289
290 /**
291 * Callback for spdk_nvme_probe() to report that a device attached to the userspace NVMe driver
292 * has been removed from the system.
293 *
294 * The controller will remain in a failed state (any new I/O submitted will fail).
295 *
296 * The controller must be detached from the userspace driver by calling spdk_nvme_detach()
297 * once the controller is no longer in use. It is up to the library user to ensure that
298 * no other threads are using the controller before calling spdk_nvme_detach().
299 *
300 * \param ctrlr NVMe controller instance that was removed.
301 */
302 typedef void (*spdk_nvme_remove_cb)(void *cb_ctx, struct spdk_nvme_ctrlr *ctrlr);
303
304 /**
305 * \brief Enumerate the bus indicated by the transport ID and attach the userspace NVMe driver
306 * to each device found if desired.
307 *
308 * \param trid The transport ID indicating which bus to enumerate. If the trtype is PCIe or trid is NULL,
309 * this will scan the local PCIe bus. If the trtype is RDMA, the traddr and trsvcid must point at the
310 * location of an NVMe-oF discovery service.
311 * \param cb_ctx Opaque value which will be passed back in cb_ctx parameter of the callbacks.
312 * \param probe_cb will be called once per NVMe device found in the system.
313 * \param attach_cb will be called for devices for which probe_cb returned true once that NVMe
314 * controller has been attached to the userspace driver.
315 * \param remove_cb will be called for devices that were attached in a previous spdk_nvme_probe()
316 * call but are no longer attached to the system. Optional; specify NULL if removal notices are not
317 * desired.
318 *
319 * This function is not thread safe and should only be called from one thread at a time while no
320 * other threads are actively using any NVMe devices.
321 *
322 * If called from a secondary process, only devices that have been attached to the userspace driver
323 * in the primary process will be probed.
324 *
325 * If called more than once, only devices that are not already attached to the SPDK NVMe driver
326 * will be reported.
327 *
328 * To stop using the the controller and release its associated resources,
329 * call \ref spdk_nvme_detach with the spdk_nvme_ctrlr instance returned by this function.
330 */
331 int spdk_nvme_probe(const struct spdk_nvme_transport_id *trid,
332 void *cb_ctx,
333 spdk_nvme_probe_cb probe_cb,
334 spdk_nvme_attach_cb attach_cb,
335 spdk_nvme_remove_cb remove_cb);
proce_cb, attach_cb, remove_cb에 현혹되지 않기 위해 struct spdk_nvme_transport_id 구조체 및 spdk_nvme_probe() 함수의 주요 로직을 살펴보겠습니다.
/* spdk-17.07.1/include/spdk/nvme.h */
142 /**
143 * NVMe transport identifier.
144 *
145 * This identifies a unique endpoint on an NVMe fabric.
146 *
147 * A string representation of a transport ID may be converted to this type using
148 * spdk_nvme_transport_id_parse().
149 */
150 struct spdk_nvme_transport_id {
151 /**
152 * NVMe transport type.
153 */
154 enum spdk_nvme_transport_type trtype;
155
156 /**
157 * Address family of the transport address.
158 *
159 * For PCIe, this value is ignored.
160 */
161 enum spdk_nvmf_adrfam adrfam;
162
163 /**
164 * Transport address of the NVMe-oF endpoint. For transports which use IP
165 * addressing (e.g. RDMA), this should be an IP address. For PCIe, this
166 * can either be a zero length string (the whole bus) or a PCI address
167 * in the format DDDD:BB:DD.FF or DDDD.BB.DD.FF
168 */
169 char traddr[SPDK_NVMF_TRADDR_MAX_LEN + 1];
170
171 /**
172 * Transport service id of the NVMe-oF endpoint. For transports which use
173 * IP addressing (e.g. RDMA), this field shoud be the port number. For PCIe,
174 * this is always a zero length string.
175 */
176 char trsvcid[SPDK_NVMF_TRSVCID_MAX_LEN + 1];
177
178 /**
179 * Subsystem NQN of the NVMe over Fabrics endpoint. May be a zero length string.
180 */
181 char subnqn[SPDK_NVMF_NQN_MAX_LEN + 1];
182 };
NVMe over PCIe의 경우 “NVMe 전송 유형” 항목을 통해 구분이 가능합니다.
/* spdk-17.07.1/include/spdk/nvme.h */
154 enum spdk_nvme_transport_type trtype;
L130-140 코드에 따르면 현재 PCIe 및 RDMA의 두 가지 전송 유형이 지원됩니다. PCIe를 통한 NVMe에 관심이 있기 때문에 RDMA 문제에 대해서는 다루지 않겠습니다 .
130 enum spdk_nvme_transport_type {
131 /**
132 * PCIe Transport (locally attached devices)
133 */
134 SPDK_NVME_TRANSPORT_PCIE = 256,
135
136 /**
137 * RDMA Transport (RoCE, iWARP, etc.)
138 */
139 SPDK_NVME_TRANSPORT_RDMA = SPDK_NVMF_TRTYPE_RDMA,
140 };
다음으로 spdk_nvme_probe() 함수의 코드를 살펴 봅니다 .
/* spdk-17.07.1/lib/nvme/nvme.c */
396 int
397 spdk_nvme_probe(const struct spdk_nvme_transport_id *trid, void *cb_ctx,
398 spdk_nvme_probe_cb probe_cb, spdk_nvme_attach_cb attach_cb,
399 spdk_nvme_remove_cb remove_cb)
400 {
401 int rc;
402 struct spdk_nvme_ctrlr *ctrlr;
403 struct spdk_nvme_transport_id trid_pcie;
404
405 rc = nvme_driver_init();
406 if (rc != 0) {
407 return rc;
408 }
409
410 if (trid == NULL) {
411 memset(&trid_pcie, 0, sizeof(trid_pcie));
412 trid_pcie.trtype = SPDK_NVME_TRANSPORT_PCIE;
413 trid = &trid_pcie;
414 }
415
416 if (!spdk_nvme_transport_available(trid->trtype)) {
417 SPDK_ERRLOG("NVMe trtype %u not available\n", trid->trtype);
418 return -1;
419 }
420
421 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
422
423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
424
425 if (!spdk_process_is_primary()) {
426 TAILQ_FOREACH(ctrlr, &g_spdk_nvme_driver->attached_ctrlrs, tailq) {
427 nvme_ctrlr_proc_get_ref(ctrlr);
428
429 /*
430 * Unlock while calling attach_cb() so the user can call other functions
431 * that may take the driver lock, like nvme_detach().
432 */
433 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
434 attach_cb(cb_ctx, &ctrlr->trid, ctrlr, &ctrlr->opts);
435 nvme_robust_mutex_lock(&g_spdk_nvme_driver->lock);
436 }
437
438 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
439 return 0;
440 }
441
442 nvme_robust_mutex_unlock(&g_spdk_nvme_driver->lock);
443 /*
444 * Keep going even if one or more nvme_attach() calls failed,
445 * but maintain the value of rc to signal errors when we return.
446 */
447
448 rc = nvme_init_controllers(cb_ctx, attach_cb);
449
450 return rc;
451 }
spdk_nvme_probe()의 처리 흐름은 다음과 같습니다.
001 - 405: rc = nvme_driver_init();
002 - 410-414: set trid if it is NULL
003 - 416: check NVMe trtype via spdk_nvme_transport_available(trid->trtype)
004 - 423: nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
005 - 425: check spdk process is primary, if not, do something at L426-440
006 - 448: rc = nvme_init_controllers(cb_ctx, attach_cb);
- 003 다음에 분석할 핵심 기능인 nvme_transport_ctrlr_scan() 함수를 호출하여 NVMe SSD 장치를 검색합니다.
2.3 nvme_transport_ctrlr_scan() 함수
다음으로 nvme_transport_ctrlr_scan() 함수를 살펴 보겠습니다.
/* spdk-17.07.1/lib/nvme/nvme.c */
423 nvme_transport_ctrlr_scan(trid, cb_ctx, probe_cb, remove_cb);
/* spdk-17.07.1/lib/nvme/nvme_transport.c#92 */
91 int
92 nvme_transport_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
93 void *cb_ctx,
94 spdk_nvme_probe_cb probe_cb,
95 spdk_nvme_remove_cb remove_cb)
96 {
97 NVME_TRANSPORT_CALL(trid->trtype, ctrlr_scan, (trid, cb_ctx, probe_cb, remove_cb));
98 }
매크로 NVME_TRANSPORT_CALL의 정의는 다음과 같습니다. 따라서, nvme_transport_ctrlr_scan()은 PCIe를 통한 NVMe의 경우 nvme_pcie_ctrlr_scan() 호출로 변환됩니다. PCIe 및 RDMA의 두 가지 전송 유형에 따른 함수 호출 변환 기법을 눈여겨 보시기 바랍니다.
/* spdk-17.07.1/lib/nvme/nvme_transport.c#60 */
52 #define TRANSPORT_PCIE(func_name, args) case SPDK_NVME_TRANSPORT_PCIE: return nvme_pcie_ ## func_name args;
..
60 #define NVME_TRANSPORT_CALL(trtype, func_name, args) \
61 do { \
62 switch (trtype) { \
63 TRANSPORT_PCIE(func_name, args) \
64 TRANSPORT_FABRICS_RDMA(func_name, args) \
65 TRANSPORT_DEFAULT(trtype) \
66 } \
67 SPDK_UNREACHABLE(); \
68 } while (0)
..
2.4 nvme_pcie_ctrlr_scan() 함수
다음으로 nvme_pcie_ctrlr_scan() 함수를 살펴 보겠습니다.
/* spdk-17.07.1/lib/nvme/nvme_pcie.c#620 */
619 int
620 nvme_pcie_ctrlr_scan(const struct spdk_nvme_transport_id *trid,
621 void *cb_ctx,
622 spdk_nvme_probe_cb probe_cb,
623 spdk_nvme_remove_cb remove_cb)
624 {
625 struct nvme_pcie_enum_ctx enum_ctx = {};
626
627 enum_ctx.probe_cb = probe_cb;
628 enum_ctx.cb_ctx = cb_ctx;
629
630 if (strlen(trid->traddr) != 0) {
631 if (spdk_pci_addr_parse(&enum_ctx.pci_addr, trid->traddr)) {
632 return -1;
633 }
634 enum_ctx.has_pci_addr = true;
635 }
636
637 if (hotplug_fd < 0) {
638 hotplug_fd = spdk_uevent_connect();
639 if (hotplug_fd < 0) {
640 SPDK_TRACELOG(SPDK_TRACE_NVME, "Failed to open uevent netlink socket\n");
641 }
642 } else {
643 _nvme_pcie_hotplug_monitor(cb_ctx, probe_cb, remove_cb);
644 }
645
646 if (enum_ctx.has_pci_addr == false) {
647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
648 } else {
649 return spdk_pci_nvme_device_attach(pcie_nvme_enum_cb, &enum_ctx, &enum_ctx.pci_addr);
650 }
651 }
L647에 해당하는 spck_pci_nvme_enumerate() 함수에 초점을 맞춥니다. 우리의 목표는 Classs Code를 사용하여 SSD 장치를 검색하는 방법을 이해하는 것이기 때문입니다.
/* spdk-17.07.1/lib/nvme/nvme_pcie.c */
647 return spdk_pci_nvme_enumerate(pcie_nvme_enum_cb, &enum_ctx);
2.5 spdk_pci_nvme_enumerate() 함수
다음으로 spdk_pci_nvme_enumerate() 함수를 살펴 보겠습니다.
/* spdk-17.07.1/lib/env_dpdk/pci_nvme.c */
81 int
82 spdk_pci_nvme_enumerate(spdk_pci_enum_cb enum_cb, void *enum_ctx)
83 {
84 return spdk_pci_enumerate(&g_nvme_pci_drv, enum_cb, enum_ctx);
85 }
참고 : L84의 첫 번째 매개변수는 전역 변수 g_nvme_pci_drv의 주소입니다 (전역 구조체 변수를 보는 것은 항상 흥미진진합니다 :-) ) .
/* spdk-17.07.1/lib/env_dpdk/pci_nvme.c */
38 static struct rte_pci_id nvme_pci_driver_id[] = {
39 #if RTE_VERSION >= RTE_VERSION_NUM(16, 7, 0, 1)
40 {
41 .class_id = SPDK_PCI_CLASS_NVME,
42 .vendor_id = PCI_ANY_ID,
43 .device_id = PCI_ANY_ID,
44 .subsystem_vendor_id = PCI_ANY_ID,
45 .subsystem_device_id = PCI_ANY_ID,
46 },
47 #else
48 {RTE_PCI_DEVICE(0x8086, 0x0953)},
49 #endif
50 { .vendor_id = 0, /* sentinel */ },
51 };
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54 .driver = {
55 .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
56 .id_table = nvme_pci_driver_id,
..
66 },
67
68 .cb_fn = NULL,
69 .cb_arg = NULL,
70 .mtx = PTHREAD_MUTEX_INITIALIZER,
71 .is_registered = false,
72 };
아하! Class Code(SPDK_PCI_CLASS_NVME = 0x010802)와 관련이 있습니다 . 전역 변수 g_nvme_pci_drv는 라인 L53에서 정의되고 g_nvme_pci_drv.driver.id_table은 L38에서 정의됩니다.
38 static struct rte_pci_id nvme_pci_driver_id[] = {
..
41 .class_id = SPDK_PCI_CLASS_NVME,
..
53 static struct spdk_pci_enum_ctx g_nvme_pci_drv = {
54 .driver = {
..
56 .id_table = nvme_pci_driver_id,
..
2.6 spdk_pci_enumerate() 함수
이제 spdk_pci_enumerate()를 파고들어 SSD 장치가 어떻게 검색되는지 알아내면 됩니다…
/* spdk-17.07.1/lib/env_dpdk/pci.c#150 */
149 int
150 spdk_pci_enumerate(struct spdk_pci_enum_ctx *ctx,
151 spdk_pci_enum_cb enum_cb,
152 void *enum_ctx)
153 {
...
168
169 #if RTE_VERSION >= RTE_VERSION_NUM(17, 05, 0, 4)
170 if (rte_pci_probe() != 0) {
171 #else
172 if (rte_eal_pci_probe() != 0) {
173 #endif
...
184 return 0;
185 }
일부 코드는 생략하고 L170에 초점을 맞춥니다.
170 if (rte_pci_probe() != 0) {
2.7 rte_pci_probe() 함수
rte_pci_probe() 함수를 시작으로 DPDK의 내부에 대해 살펴 보겠습니다 . 코드는 다음과 같습니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#413 */
407 /*
408 * Scan the content of the PCI bus, and call the probe() function for
409 * all registered drivers that have a matching entry in its id_table
410 * for discovered devices.
411 */
412 int
413 rte_pci_probe(void)
414 {
415 struct rte_pci_device *dev = NULL;
416 size_t probed = 0, failed = 0;
417 struct rte_devargs *devargs;
418 int probe_all = 0;
419 int ret = 0;
420
421 if (rte_pci_bus.bus.conf.scan_mode != RTE_BUS_SCAN_WHITELIST)
422 probe_all = 1;
423
424 FOREACH_DEVICE_ON_PCIBUS(dev) {
425 probed++;
426
427 devargs = dev->device.devargs;
428 /* probe all or only whitelisted devices */
429 if (probe_all)
430 ret = pci_probe_all_drivers(dev);
431 else if (devargs != NULL &&
432 devargs->policy == RTE_DEV_WHITELISTED)
433 ret = pci_probe_all_drivers(dev);
434 if (ret < 0) {
435 RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
436 " cannot be used\n", dev->addr.domain, dev->addr.bus,
437 dev->addr.devid, dev->addr.function);
438 rte_errno = errno;
439 failed++;
440 ret = 0;
441 }
442 }
443
444 return (probed && probed == failed) ? -1 : 0;
445 }
L430가 우리의 관심의 대상입니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */
430 ret = pci_probe_all_drivers(dev);
2.8 pci_probe_all_drivers () 함수
함수의 구현은 다음과 같습니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#307 */
301 /*
302 * If vendor/device ID match, call the probe() function of all
303 * registered driver for the given device. Return -1 if initialization
304 * failed, return 1 if no driver is found for this device.
305 */
306 static int
307 pci_probe_all_drivers(struct rte_pci_device *dev)
308 {
309 struct rte_pci_driver *dr = NULL;
310 int rc = 0;
311
312 if (dev == NULL)
313 return -1;
314
315 /* Check if a driver is already loaded */
316 if (dev->driver != NULL)
317 return 0;
318
319 FOREACH_DRIVER_ON_PCIBUS(dr) {
320 rc = rte_pci_probe_one_driver(dr, dev);
321 if (rc < 0)
322 /* negative value is an error */
323 return -1;
324 if (rc > 0)
325 /* positive value means driver doesn't support it */
326 continue;
327 return 0;
328 }
329 return 1;
330 }
L320가 우리의 관심의 대상입니다.
320 rc = rte_pci_probe_one_driver(dr, dev);
2.9 rte_pci_probe_one_driver() 함수
함수의 구현은 다음과 같습니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#200 */
195 /*
196 * If vendor/device ID match, call the probe() function of the
197 * driver.
198 */
199 static int
200 rte_pci_probe_one_driver(struct rte_pci_driver *dr,
201 struct rte_pci_device *dev)
202 {
203 int ret;
204 struct rte_pci_addr *loc;
205
206 if ((dr == NULL) || (dev == NULL))
207 return -EINVAL;
208
209 loc = &dev->addr;
210
211 /* The device is not blacklisted; Check if driver supports it */
212 if (!rte_pci_match(dr, dev))
213 /* Match of device and driver failed */
214 return 1;
215
216 RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
217 loc->domain, loc->bus, loc->devid, loc->function,
218 dev->device.numa_node);
219
220 /* no initialization when blacklisted, return without error */
221 if (dev->device.devargs != NULL &&
222 dev->device.devargs->policy ==
223 RTE_DEV_BLACKLISTED) {
224 RTE_LOG(INFO, EAL, " Device is blacklisted, not"
225 " initializing\n");
226 return 1;
227 }
228
229 if (dev->device.numa_node < 0) {
230 RTE_LOG(WARNING, EAL, " Invalid NUMA socket, default to 0\n");
231 dev->device.numa_node = 0;
232 }
233
234 RTE_LOG(INFO, EAL, " probe driver: %x:%x %s\n", dev->id.vendor_id,
235 dev->id.device_id, dr->driver.name);
236
237 if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
238 /* map resources for devices that use igb_uio */
239 ret = rte_pci_map_device(dev);
240 if (ret != 0)
241 return ret;
242 }
243
244 /* reference driver structure */
245 dev->driver = dr;
246 dev->device.driver = &dr->driver;
247
248 /* call the driver probe() function */
249 ret = dr->probe(dr, dev);
250 if (ret) {
251 dev->driver = NULL;
252 dev->device.driver = NULL;
253 if ((dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) &&
254 /* Don't unmap if device is unsupported and
255 * driver needs mapped resources.
256 */
257 !(ret > 0 &&
258 (dr->drv_flags & RTE_PCI_DRV_KEEP_MAPPED_RES)))
259 rte_pci_unmap_device(dev);
260 }
261
262 return ret;
263 }
L212가 우리의 관심의 대상입니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */
212 if (!rte_pci_match(dr, dev))
2.10 rte_pci_match() 함수
함수의 구현은 다음과 같습니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c#163 */
151 /*
152 * Match the PCI Driver and Device using the ID Table
153 *
154 * @param pci_drv
155 * PCI driver from which ID table would be extracted
156 * @param pci_dev
157 * PCI device to match against the driver
158 * @return
159 * 1 for successful match
160 * 0 for unsuccessful match
161 */
162 static int
163 rte_pci_match(const struct rte_pci_driver *pci_drv,
164 const struct rte_pci_device *pci_dev)
165 {
166 const struct rte_pci_id *id_table;
167
168 for (id_table = pci_drv->id_table; id_table->vendor_id != 0;
169 id_table++) {
170 /* check if device's identifiers match the driver's ones */
171 if (id_table->vendor_id != pci_dev->id.vendor_id &&
172 id_table->vendor_id != PCI_ANY_ID)
173 continue;
174 if (id_table->device_id != pci_dev->id.device_id &&
175 id_table->device_id != PCI_ANY_ID)
176 continue;
177 if (id_table->subsystem_vendor_id !=
178 pci_dev->id.subsystem_vendor_id &&
179 id_table->subsystem_vendor_id != PCI_ANY_ID)
180 continue;
181 if (id_table->subsystem_device_id !=
182 pci_dev->id.subsystem_device_id &&
183 id_table->subsystem_device_id != PCI_ANY_ID)
184 continue;
185 if (id_table->class_id != pci_dev->id.class_id &&
186 id_table->class_id != RTE_CLASS_ANY_ID)
187 continue;
188
189 return 1;
190 }
191
192 return 0;
193 }
아래 코드와 같이 드디어 SSD 장치가 어떻게 발견되는지 알아냈습니다.
/* dpdk-17.08/lib/librte_eal/common/eal_common_pci.c */
185 if (id_table->class_id != pci_dev->id.class_id &&c
186 id_table->class_id != RTE_CLASS_ANY_ID)
187 continue;
rte_pci_driver 및 rte_pci_device 구조체의 정의는 다음과 같습니다.
/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#100 */
96 /**
97 * A structure describing an ID for a PCI driver. Each driver provides a
98 * table of these IDs for each device that it supports.
99 */
100 struct rte_pci_id {
101 uint32_t class_id; /**< Class ID (class, subclass, pi) or RTE_CLASS_ANY_ID. */
102 uint16_t vendor_id; /**< Vendor ID or PCI_ANY_ID. */
103 uint16_t device_id; /**< Device ID or PCI_ANY_ID. */
104 uint16_t subsystem_vendor_id; /**< Subsystem vendor ID or PCI_ANY_ID. */
105 uint16_t subsystem_device_id; /**< Subsystem device ID or PCI_ANY_ID. */
106 };
/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#120 */
120 /**
121 * A structure describing a PCI device.
122 */
123 struct rte_pci_device {
124 TAILQ_ENTRY(rte_pci_device) next; /**< Next probed PCI device. */
125 struct rte_device device; /**< Inherit core device */
126 struct rte_pci_addr addr; /**< PCI location. */
127 struct rte_pci_id id; /**< PCI ID. */
128 struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE];
129 /**< PCI Memory Resource */
130 struct rte_intr_handle intr_handle; /**< Interrupt handle */
131 struct rte_pci_driver *driver; /**< Associated driver */
132 uint16_t max_vfs; /**< sriov enable if not zero */
133 enum rte_kernel_driver kdrv; /**< Kernel driver passthrough */
134 char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
135 };
/* dpdk-17.08/lib/librte_eal/common/include/rte_pci.h#178 */
175 /**
176 * A structure describing a PCI driver.
177 */
178 struct rte_pci_driver {
179 TAILQ_ENTRY(rte_pci_driver) next; /**< Next in list. */
180 struct rte_driver driver; /**< Inherit core driver. */
181 struct rte_pci_bus *bus; /**< PCI bus reference. */
182 pci_probe_t *probe; /**< Device Probe function. */
183 pci_remove_t *remove; /**< Device Remove function. */
184 const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
185 uint32_t drv_flags; /**< Flags contolling handling of device. */
186 };
3. 요약
지금까지 SSD 장치의 발견 과정을 다음과 같이 요약할 수 있습니다 .
- 01 - Class Code(0x010802)를 SSD 장치 검색의 기반으로 사용
- 02 - SSD 장치가 발견되면 SPDK에서 DPDK까지 함수 호출 스택은 다음과 같습니다.
00 hello_word.c
01 -> main()
02 --> spdk_nvme_probe()
03 ---> nvme_transport_ctrlr_scan()
04 ----> nvme_pcie_ctrlr_scan()
05 -----> spdk_pci_nvme_enumerate()
06 ------> spdk_pci_enumerate(&g_nvme_pci_drv, ...) | SPDK |
=========================================================================
07 -------> rte_pci_probe() | DPDK |
08 --------> pci_probe_all_drivers()
09 ---------> rte_pci_probe_one_driver()
10 ----------> rte_pci_match()
- 03 - DPDK의 환경 추상화 계층(EAL: Environment Abstraction Layer)의 rte_pci_match() 함수는 SSD 장치를 검색하는 핵심 로직입니다.
- 04 - DPDK 아키텍처에서 DPDK의 EAL 위치는 다음 그림과 같습니다.