Reducing Memory

jrb · May 3, 2020, 1:19pm

Interesting! I tried to solve this problem a bit differently, as outlined here. The problem with the approach you’ve outlined is that pd.read_csv will have to load the entire dataset into memory at full precision. Although you do get a reduction in memory usage after the transform in reduce_mem_usage, the peak memory usage remains the same. Which means that you’re still bottlenecked on memory (and glacially slow swap, if the machine has it enabled). My approach around that problem is to use converters to ensure that the data is converted to the more succinct dtype at load time. It isn’t entirely free, the load time goes up by about a bit, but the memory usage never peaks.

Topic		Replies	Views
Faster data loading with datatable Data Science	6	1865	April 9, 2021
Data Availability and Compression Methods Tournament	1	546	November 1, 2022
Light as a Feather Data Science	2	3968	June 13, 2020
About the new dataset and RAM usage Tournament	4	2685	February 15, 2022
Running example model with less than half a gig of RAM Data Science	1	2527	April 19, 2021

Reducing Memory

Related topics