Mass ArcPy GDB Load

المشرف العام

Administrator
طاقم الإدارة
Looking for a best practice for loading mass quantities of data into an Oracle GDB using ArcPy.

I'm attempting to load ~10 million records of Polygon and Point shape data into a GDB (~5M per table). I'm currently running into a problem that, as the records go up, the time it takes to insert the records significantly increase. This was expected, but NOT to the extent that it's happening. At about ~2M records in, it's taking almost 20 minutes to append a 50mb file. Please correct me if I am wrong, but I am assuming this is caused by ArcPy adjusting spatial and objectid indices.

This is my primary sequence of commands simplified:

# First I copy the .shp file to a local fgdb. I need to do this to add and update fields used by my companyarcpy.CopyFeatures_management(shp_path, hold_table)# Then I add the fields I needarcpy.AddField_management(hold_table, "myfield", "int")# Then I update the field I madewith arcpy.da.UpdateCursor(hold_table, "myfield") as update_cursor: for row in update_cursor: row[0] = "myupdate" update_cursor.updateRow(row)# Now I append to the gdbarcpy.Append_management(hold_table, table_to_insert_path, "NO_TEST")I've also noticed a massive slowdown in a process that removes records from the GDB as the records increase as well, even with a where_statement included in the update cursor. For instance, removing ~2M records out of 5M total took almost 6 hours.

This is what I want to attempt to try, however, I feel I will run into the same issue:

Create a local fgbd for each "group" of data I need to insert, and then run an append, or a merge, on all those local fgdbs into the GDB at the end. Instead of running almost 400 files through the above process and instantly appending them to the GDB, this would create approximately 20 local FGDBs which would get appended/merged at the end of the whole process. (I would assume the appends at the end would take the same amount of time as doing them instantly, but then again, this is ArcPy...)

This is ArcGIS 10.3, using the ArcPy installation that comes with ArcGIS Desktop.

I have tried dropping the spatial indices from the 2 tables I am appending to. This doesn't seem to increase performance. Oddly enough, the exact same index seems to get recreated by the append function, but they are not registered as a mdsys.spatial_index.

Edit 1: In response to Yanes comment. I am currently appending to an empty schema. The data I am grabbing has more fields than the data I will be appending to the table. Thus, I am using arcpy.FieldMap() in order to direct the fields. I have tested the load with and without FieldMapping, and the performance hit is the same either way. I will not always be appending to an empty schema. I will need to be able to remove, make edits, and append to the full ~5M record table.



أكثر...
 
أعلى