-
Notifications
You must be signed in to change notification settings - Fork 22
Memory efficiency tweak #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rajeee
wants to merge
34
commits into
develop
Choose a base branch
from
ppv2
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
a5e49fc
Gather upgrade results a different way
rajeee c56a1bf
efficient pp
rajeee ece7109
More schema cleanup
rajeee fdc3e19
Cleanup and gz fix
rajeee d26e6bb
Default job_id
rajeee 13b29ed
Fix tests
rajeee c47ebec
Change filename
rajeee a4b7aa4
Add ultra low disk mode
rajeee 9f48d45
Fix test and uld mode
rajeee 8e147ff
Allow some failures
rajeee b1696a8
Always add eplusout_error and upgrade
rajeee 91561d6
Merge branch 'develop' into ppv2
rajeee 0d2c71a
Use new workflow in hpc run
rajeee 1b59d31
Merge branch 'develop' into ppv2
rajeee 8616be7
HPC fixes
rajeee 10f4c8f
Use 5 significant digits in the timeseries output
rajeee 055decf
Don't write statistics in parquet file for annual results
rajeee 1497eba
Use nvme for workers as well
rajeee 51bc579
Add upper limit to str length of eplusout_err and step_failures
rajeee 3eb2d62
Disable enduse level emissions by default
rajeee 2ad98dc
Fix the test
rajeee ed5287c
Use read_parquet to reduce file I/O
rajeee 5eb233c
Use read_parquet to reduce file I/O
rajeee d24996a
Streaming parquet writer
rajeee 2921d6f
Streaming parquet writer working for local
rajeee 0530a8f
Path fix
rajeee d2c3f51
use pathlib
rajeee e24ded7
No need to create dirs beforehand
rajeee bbfcc47
Drop upgrade column before writing partitioned parquet
rajeee cdb8eca
Path fix
rajeee b6ed5a5
Handle ultra low disk mode
rajeee 0acbf82
Skip individual ts writing for local
rajeee b8d2936
Make dir if not exist
rajeee aad4058
Write building_id in dataframe
rajeee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,7 @@ | |
| import csv | ||
| from collections import defaultdict, Counter | ||
| import pprint | ||
| import pathlib | ||
|
|
||
| from buildstockbatch.__version__ import __schema_version__ | ||
| from buildstockbatch import sampler, workflow_generator, postprocessing | ||
|
|
@@ -107,11 +108,18 @@ def path_rel_to_projectfile(self, x): | |
|
|
||
| def _get_weather_files(self): | ||
| if "weather_files_path" in self.cfg: | ||
| logger.debug("Copying weather files") | ||
| weather_file_path = self.cfg["weather_files_path"] | ||
| with zipfile.ZipFile(weather_file_path, "r") as zf: | ||
| logger.debug("Extracting weather files to: {}".format(self.weather_dir)) | ||
| zf.extractall(self.weather_dir) | ||
| if os.path.isdir(weather_file_path): | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. weather file path now can be directory too (not just a zipfile). Will make buildstock_local much faster if using pre-unzipped directory.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sweet! |
||
| if os.path.isdir(self.weather_dir) and os.path.samefile(self.weather_dir, weather_file_path): | ||
| logger.debug(f"Weather files already exist at {self.weather_dir}") | ||
| return | ||
| else: | ||
| logger.debug(f"Copying weather files from directory: {weather_file_path} to {self.weather_dir}") | ||
| shutil.copytree(weather_file_path, self.weather_dir, dirs_exist_ok=True) | ||
| else: | ||
| with zipfile.ZipFile(weather_file_path, "r") as zf: | ||
| logger.debug(f"Extracting weather files to: {self.weather_dir}") | ||
| zf.extractall(self.weather_dir) | ||
| else: | ||
| logger.debug("Downloading weather files") | ||
| r = requests.get(self.cfg["weather_files_url"], stream=True) | ||
|
|
@@ -188,7 +196,9 @@ def make_sim_dir(building_id, upgrade_idx, base_dir, overwrite_existing=False): | |
| return sim_id, sim_dir | ||
|
|
||
| @staticmethod | ||
| def cleanup_sim_dir(sim_dir, dest_fs, simout_ts_dir, upgrade_id, building_id, low_disk=False): | ||
| def get_timeseries_df( | ||
| sim_dir, dest_fs, simout_ts_dir, upgrade_id, building_id, low_disk="", skip_write: bool = False | ||
| ): | ||
| """Clean up the output directory for a single simulation. | ||
|
|
||
| :param sim_dir: simulation directory | ||
|
|
@@ -201,14 +211,18 @@ def cleanup_sim_dir(sim_dir, dest_fs, simout_ts_dir, upgrade_id, building_id, lo | |
| :type upgrade_id: int | ||
| :param building_id: building id from buildstock.csv | ||
| :type building_id: int | ||
| :param low_disk: If true, remove the simulation directory entirely to save disk space | ||
| :type low_disk: bool | ||
| :param low_disk: If "low_disk", remove the simulation directory entirely to save disk space. | ||
| If "ultra_low_disk_no_timeseries", remove the simulation directory entirely to | ||
| save disk space and also delete the timeseries parquet file. | ||
| :type low_disk: str | ||
| :param skip_write: If True, skip writing the timeseries parquet file to dest_fs. Return only. | ||
| """ | ||
|
|
||
| # Convert the timeseries data to parquet | ||
| # and copy it to the results directory | ||
| # and copy it to the results directory if skip_write is False | ||
| output_dir = os.path.join(sim_dir, "run") | ||
| timeseries_filepath = os.path.join(output_dir, "results_timeseries.csv") | ||
| tsdf = None | ||
| # FIXME: Allowing both names here for compatibility. Should consolidate on one timeseries filename. | ||
| if os.path.isfile(timeseries_filepath): | ||
| units_dict = read_csv(timeseries_filepath, nrows=1).transpose().to_dict()[0] | ||
|
|
@@ -255,15 +269,23 @@ def get_clean_column_name(x): | |
| return x.lower() | ||
|
|
||
| tsdf.rename(columns=get_clean_column_name, inplace=True) | ||
| postprocessing.write_dataframe_as_parquet( | ||
| tsdf, | ||
| dest_fs, | ||
| f"{simout_ts_dir}/up{upgrade_id:02d}/bldg{building_id:07d}.parquet", | ||
| ) | ||
| tsdf["building_id"] = building_id | ||
| if not skip_write: | ||
| pathlib.Path(simout_ts_dir).mkdir(exist_ok=True, parents=True) | ||
| postprocessing.write_dataframe_as_parquet( | ||
| tsdf, | ||
| dest_fs, | ||
| f"{simout_ts_dir}/{building_id}-{upgrade_id}.parquet", | ||
| ) | ||
|
|
||
| if low_disk: | ||
| shutil.rmtree(sim_dir, ignore_errors=True) | ||
| return | ||
| if ( | ||
| low_disk == "ultra_low_disk_no_timeseries" | ||
| ): # only delete after writing to allow testing of writing workflow | ||
| if os.path.exists(f"{simout_ts_dir}/up{upgrade_id:02d}/bldg{building_id:07d}.parquet"): | ||
| os.remove(f"{simout_ts_dir}/up{upgrade_id:02d}/bldg{building_id:07d}.parquet") | ||
| return tsdf | ||
|
|
||
| # Remove files already in data_point.zip | ||
| zipfilename = os.path.join(sim_dir, "run", "data_point.zip") | ||
|
|
@@ -281,6 +303,7 @@ def get_clean_column_name(x): | |
| reports_dir = os.path.join(sim_dir, "reports") | ||
| if os.path.isdir(reports_dir): | ||
| shutil.rmtree(reports_dir, ignore_errors=True) | ||
| return tsdf | ||
|
|
||
| @classmethod | ||
| def validate_project(cls, project_file): | ||
|
|
@@ -958,7 +981,7 @@ def process_results(self, skip_combine=False, use_dask_cluster=True, continue_up | |
|
|
||
| fs = self.get_fs() | ||
| if not skip_combine: | ||
| postprocessing.combine_results(fs, self.results_dir, self.cfg, do_timeseries=do_timeseries) | ||
| postprocessing.combine_results(fs, self.results_dir, self.cfg) | ||
|
|
||
| aws_conf = self.cfg.get("postprocessing", {}).get("aws", {}) | ||
| if "s3" in aws_conf or "aws" in self.cfg: | ||
|
|
@@ -980,6 +1003,3 @@ def process_results(self, skip_combine=False, use_dask_cluster=True, continue_up | |
| finally: | ||
| if use_dask_cluster: | ||
| self.cleanup_dask() | ||
|
|
||
| keep_individual_timeseries = self.cfg.get("postprocessing", {}).get("keep_individual_timeseries", False) | ||
| postprocessing.remove_intermediate_files(fs, self.results_dir, keep_individual_timeseries) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this gets changed back after resstock gets updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this PR relies on some changes on resstock currently only available on that branch. These two branches are mutually tied up together - one needs the other. So, once we are ready on both sides, we switch the branch to develop on both sides and merge both side in quick order.