More on Pandas Data Loading with ArcGIS (Another Example)

المشرف العام · 10 نوفمبر 2021

Large datasets can be a major problem with systems that are running 32-bit Python because there is an upper limit on memory use: 2 GB. Most times programs fail before they even hit the 2 GB mark, but there it is.

When working with large data that cannot fit into the 2 GB of RAM, how can we push the data into DataFrames?

One way is to chunk it into groups:

كود:

#--------------------------------------------------------------------------
def grouper_it(n, iterable):
    """
    creates chunks of cursor row objects to make the memory
    footprint more manageable
    """
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

This code takes an iterable object (has next() defined at Python 2.7 or __next__() for Python 3.4) and makes other iterators of size n where n is a whole number (integer).

Example Usage:

كود:

import itertools
import os
import json
import arcpy
import pandas as pd

with arcpy.da.SearchCursor(fc, ["Field1", "Field2"]) as rows:
groups = grouper_it(n=50000, iterable=rows)
for group in groups:
df = pd.DataFrame.from_records(group, columns=rows.fields)
df['Field1'] = "Another Value"
df.to_csv(r"\\sever\test.csv", mode='a')
del group
del df
del groups

This is one way to manage your memory footprint by loading records in smaller bits.

Some considerations on 'n'. I found the following effects the size of 'n': number of columns, field length, and data types.

Copyright AJC

أكثر...

More on Pandas Data Loading with ArcGIS (Another Example)

المشرف العام

Administrator