Unable to create paired queue with ib_create_qp

I am writing an RDMA kernel module (InfiniBand).

So far, I have successfully created a protection domain, completion queues for send and receive queues.

But whenever I try to create a pair queue by calling ib_create_qp, it does not create a pair of queues. The code I wrote is shown below:

#include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/list.h> #include <linux/module.h> #include <linux/err.h> #include "myClient.h" struct workqueue_struct *myClient_workqueue; struct ib_sa_client myClient_sa_client; /* static void myClient_add_one(struct ib_device *device); static void myClient_remove_one(struct ib_device *device); */ struct ib_pd *mypd; struct ib_cq *myrcvcq; struct ib_cq *myClientsendcq; struct ib_qp *myClientqp; void myClient_ib_recvcompletion(struct ib_cq *cq) { printk("A user-specified callback that is invoked when a completion event occurs on the CQ.\n"); } void myClient_ib_sendcompletion(struct ib_cq *cq) { printk("A user-specified callback that is invoked when a completion event occurs on the CQ.\n"); } static void my_qp_event_handler(struct ib_event *myqpAsyncEvent, void *anyPointer) { printk(KERN_INFO "Dummy affiliated asynchronous event occured function called \n"); } static void myClient_add_one(struct ib_device *device) { union ib_gid tmp_gid; int ret; int hcaport = 1; int result = -ENOMEM; u16 port1Pkey; struct ib_port_attr attr; ret = ib_query_port(device,hcaport,&attr); printk("ib query port result %d \n", ret); // Creating the Protection Domain for RDMA mypd = ib_alloc_pd(device); if(IS_ERR(mypd)){ printk(KERN_INFO "Failed to allocate PD\n"); return; } else{ printk(KERN_INFO "1Successfully allocated the PD\n"); pdset = true; } // Creating the receive completion queue for RDMA myrcvcq = ib_create_cq(device,myClient_ib_recvcompletion,NULL,NULL,myClient_recvq_size,0); if(IS_ERR(myrcvcq)){ pr_err("%s:%d error code for receive cq%d\n", __func__, __LINE__, PTR_ERR(myrcvcq)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk("Recieve CQ successfully created in address: %x \n",myrcvcq); } // Creating the send completion queue for RDMA myClientsendcq = ib_create_cq(device,myClient_ib_sendcompletion, NULL, NULL,myClient_sendq_size,0 ); if(IS_ERR(myClientsendcq)){ pr_err("%s:%d scqerror code for send cq%d\n", __func__, __LINE__, PTR_ERR(myClientsendcq)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk("1Send CQ successfully created in address: %x \n",myClientsendcq); } // Creating the queue pair // Creating the queue pair struct ib_qp_init_attr init_qpattr; memset(&init_qpattr,0,sizeof(init_qpattr)); init_qpattr.event_handler = myClient_qp_event_handler; init_qpattr.cap.max_send_wr = 2; init_qpattr.cap.max_recv_wr = 2; init_qpattr.cap.max_recv_sge = 1; init_qpattr.cap.max_send_sge = 1; init_qpattr.sq_sig_type = IB_SIGNAL_ALL_WR; init_qpattr.qp_type = IB_QPT_UD; init_qpattr.send_cq = myClientsendcq; init_qpattr.recv_cq = myrcvcq; myClientqp = ib_create_qp(mypd,&init_qpattr); if(IS_ERR(myClientqp)){ pr_err("%s:%d error code %d\n", __func__, __LINE__, PTR_ERR(myClientqp)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk(KERN_INFO "1The queue pair is successfully created \n"); qpcreated = true; } } static void myClient_remove_one(struct ib_device *device) { } static struct ib_client my_client = { .name = "myRDMAclient", .add = myClient_add_one, .remove = myClient_remove_one }; static int __init myRDMAclient_init(void) { int ret; ret = ib_register_client(&my_client); if(ret){ //printk(KERN_ALERT "KERN_ERR Failed to register IB client\n"); goto err_sa; } printk(KERN_ALERT "lKERN_INFO Successfully registered myRDMAclient module \n"); return 0; err_sa: return ret; } module_init(myRDMAclient_init); 

Here all queries work except ib_create_qp(mypd,&init_qpattr); that cannot create a queue pair.

Updated: memory registered before queue creation. But still it shows an invalid argument error (error code -22) for ib_create_qp

 #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/list.h> #include <linux/module.h> #include <linux/err.h> #include "myClient.h" struct workqueue_struct *myClient_workqueue; struct ib_sa_client myClient_sa_client; /* static void myClient_add_one(struct ib_device *device); static void myClient_remove_one(struct ib_device *device); */ struct ib_pd *mypd; struct ib_cq *myrcvcq; struct ib_cq *myClientsendcq; struct ib_qp *myClientqp; struct ib_mr *mymr; void myClient_ib_recvcompletion(struct ib_cq *cq) { printk("A user-specified callback that is invoked when a completion event occurs on the CQ.\n"); } void myClient_ib_sendcompletion(struct ib_cq *cq) { printk("A user-specified callback that is invoked when a completion event occurs on the CQ.\n"); } static void my_qp_event_handler(struct ib_event *myqpAsyncEvent, void *anyPointer) { printk(KERN_INFO "Dummy affiliated asynchronous event occured function called \n"); } static void myClient_add_one(struct ib_device *device) { union ib_gid tmp_gid; int ret; int hcaport = 1; int result = -ENOMEM; u16 port1Pkey; struct ib_port_attr attr; ret = ib_query_port(device,hcaport,&attr); printk("ib query port result %d \n", ret); // Creating the Protection Domain for RDMA mypd = ib_alloc_pd(device); if(IS_ERR(mypd)){ printk(KERN_INFO "Failed to allocate PD\n"); return; } else{ printk(KERN_INFO "1Successfully allocated the PD\n"); pdset = true; } // Registering Memory mymr = ib_get_dma_mr(mypd,IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ| IB_ACCESS_REMOTE_WRITE); if(IS_ERR(mymr)){ printk("failed to register memory :( %d \n",PTR_ERR(mymr)); }else{ printk(KERN_INFO "Successfully registered memory region :) \n"); } // End Registering Memory // Creating the receive completion queue for RDMA myrcvcq = ib_create_cq(device,myClient_ib_recvcompletion,NULL,NULL,myClient_recvq_size,0); if(IS_ERR(myrcvcq)){ pr_err("%s:%d error code for receive cq%d\n", __func__, __LINE__, PTR_ERR(myrcvcq)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk("Recieve CQ successfully created in address: %x \n",myrcvcq); } // Creating the send completion queue for RDMA myClientsendcq = ib_create_cq(device,myClient_ib_sendcompletion, NULL, NULL,myClient_sendq_size,0 ); if(IS_ERR(myClientsendcq)){ pr_err("%s:%d scqerror code for send cq%d\n", __func__, __LINE__, PTR_ERR(myClientsendcq)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk("1Send CQ successfully created in address: %x \n",myClientsendcq); } // Creating the queue pair // Creating the queue pair struct ib_qp_init_attr init_qpattr; memset(&init_qpattr,0,sizeof(init_qpattr)); init_qpattr.event_handler = myClient_qp_event_handler; init_qpattr.cap.max_send_wr = 2; init_qpattr.cap.max_recv_wr = 2; init_qpattr.cap.max_recv_sge = 1; init_qpattr.cap.max_send_sge = 1; init_qpattr.sq_sig_type = IB_SIGNAL_ALL_WR; init_qpattr.qp_type = IB_QPT_UD; init_qpattr.send_cq = myClientsendcq; init_qpattr.recv_cq = myrcvcq; myClientqp = ib_create_qp(mypd,&init_qpattr); if(IS_ERR(myClientqp)){ pr_err("%s:%d error code %d\n", __func__, __LINE__, PTR_ERR(myClientqp)); //printk("Error creating QP: %d \n",PTR_ERR(myClientqp)); } else{ printk(KERN_INFO "1The queue pair is successfully created \n"); qpcreated = true; } } static void myClient_remove_one(struct ib_device *device) { } static struct ib_client my_client = { .name = "myRDMAclient", .add = myClient_add_one, .remove = myClient_remove_one }; static int __init myRDMAclient_init(void) { int ret; ret = ib_register_client(&my_client); if(ret){ //printk(KERN_ALERT "KERN_ERR Failed to register IB client\n"); goto err_sa; } printk(KERN_ALERT "lKERN_INFO Successfully registered myRDMAclient module \n"); return 0; err_sa: return ret; } module_init(myRDMAclient_init); 
+6
source share
2 answers

UPDATE

Based on the discussion in the comments below, I assume that you installed the Mellanox OFED drivers on top of your current distribution. Looking at the source 3.1 -.0.3 of the original Mellanox OFED kernel drivers, I see that they changed the layout of struct ib_qp_init_attr by adding a few fields. I am sure that your problem is that you create your module against the original SLE 3.0.76-0.11 headers, so the init_qpattr structure that you pass to the QP creation function does not have the values ​​that you set in the right places.

I don’t know how you installed the new drivers outside the tree, so I can’t say exactly how to build your module correctly, but you can try adding something like

  init_qpattr.qpg_type = 0; 

where did you create the structure. (I know that you memset all this is already zero, but this will ensure that the headers you build against have a new qpg_type element for the structure. I think the new field added by OFED, t in the original kernel headers, therefore if your module compiles, then you create against the correct headers)

OLD RESPONSE:

Therefore, I suspect that you encountered an error in the mlx4 driver related to creating such a small QP ( max_send_wr == max_recv_wr == 2 and max_send_sge == max_recv_sge == 1 ). I managed to find the source for the kernel 3.0.76-0.11 that you are using, and I do not see an obvious error, unfortunately.

Some things you might try to help debug this

  • Add the debug_level=1 module module to the mlx4_core module when loading it. Update your question with all the results of driver initialization (a lot of lines about "Max CQEs:", etc. There is sufficient logic in the mlx4 driver depending on the parameters returned by fimrware during initialization, and this output would allow us to see what it is .
  • In this regard, it is worth checking whether your HCA firmware has been updated - you can get better results with the new firmware (although the driver should work anyway, you may get an error in the driver’s unverified code due to missing firmware that launches a different path code).
  • Try updating your code to increase these options. You can try increasing max_send_sge and max_recv_sge to 2 and increasing max_send_wr and max_recv_wr to, say, 32 or 128. (Try increasing them individually or in combination)
  • If you know how to enable the tracer function ( This LWN article is useful, I accept the old SLES kernel has all the necessary functions), then enabling tracing for the mlx4_ib and mlx4_core modules, and then loading your module would be great. If you update your question using tracing, then we can see where the QP creation operation works — for example, does it work in set_rq_size() , comes to set_kernel_sq_size() or doesn’t work somewhere else?
+3
source

I think you forgot to register a memory area. Actions to be performed before creating QP:

  • Create a protection domain
  • Register memory area
  • Create completion queues

and only then the creation of QP.

I do not know which device and lib you are using, but in Mellanox IB lib this is:

 char mr_buffer[REGION_SIZE]; //mypd its your protection domain that you allocated struct ibv_mr *mr = ibv_reg_mr(mypd , mr_buffer, REGION_SIZE, 0); if (!mr) { //ERROR MSG } 
-1
source

All Articles