spine.io.write.CSVWriter
- class spine.io.write.CSVWriter(file_name: str = 'output.csv', directory: str | None = None, overwrite: bool = False, append: bool = False, accept_missing: bool = False, buffer_size: int = 1)[source]
Writes data to a CSV file with optimized performance.
Builds a CSV file to store the output of the analysis tools. It can only be used to store relatively basic quantities (scalars, strings, etc.).
Performance Optimization: This writer keeps the file handle open during its lifetime, eliminating the overhead of opening/closing the file on every write operation. This provides significant speedup when writing many rows. By default, uses line buffering (buffer_size=1) to ensure each row is safely written while maintaining excellent performance.
Usage: The writer should be properly closed when done:
Using context manager (recommended):
with CSVWriter('output.csv') as writer: writer.append({'col1': 1, 'col2': 2}) writer.append({'col1': 3, 'col2': 4}) # File automatically closed and flushed
Manual management (used by AnaBase):
writer = CSVWriter('output.csv') writer.append({'col1': 1, 'col2': 2}) writer.close() # Must call explicitly!
Configuration: Buffer size can be configured:
In analysis scripts (YAML config):
ana: buffer_size: 1 # Line buffered (default, safe and fast) my_analysis: ...
In driver logging (YAML config):
base: csv_buffer_size: 1 # For driver log file
Methods
append(data)Append the CSV file with the output.
array_diff(array_x, array_y)Compare the content of two arrays.
close()Close the file handle and ensure all data is written.
create(data)Initialize the header of the CSV file, record the keys to be stored.
flush()Explicitly flush the file buffer to disk.
open()Open the file handle for writing.
- __init__(file_name: str = 'output.csv', directory: str | None = None, overwrite: bool = False, append: bool = False, accept_missing: bool = False, buffer_size: int = 1) None[source]
Initialize the basics of the output file.
- Parameters:
file_name (str, default 'output.csv') – Name of the output CSV file
directory (str, optional) – Output directory. When provided, the CSV file is written under this directory using the basename of
file_name.overwrite (bool, default False) – If True, overwrite the output file if it already exists
append (bool, default False) – If True, add more rows to an existing CSV file
accept_missing (bool, default True) – Tolerate missing keys
buffer_size (int, default 1) – Buffer size for file writing. 1 is line buffered (default, safe), -1 uses system default buffering, 0 is unbuffered, >1 is buffer size in bytes
Methods
__init__([file_name, directory, overwrite, ...])Initialize the basics of the output file.
append(data)Append the CSV file with the output.
array_diff(array_x, array_y)Compare the content of two arrays.
close()Close the file handle and ensure all data is written.
create(data)Initialize the header of the CSV file, record the keys to be stored.
flush()Explicitly flush the file buffer to disk.
open()Open the file handle for writing.
Attributes
- name = 'csv'
- open() None[source]
Open the file handle for writing.
If the file handle is already open, this does nothing. The file is opened in append mode if append_file is True and the file exists, otherwise in write mode.
- close() None[source]
Close the file handle and ensure all data is written.
This flushes any buffered data before closing. After calling this, the writer cannot be used unless open() is called again.
- flush() None[source]
Explicitly flush the file buffer to disk.
This forces any buffered data to be written to disk without closing the file. Useful for ensuring data persistence at specific checkpoints.
- create(data: dict[str, Any]) None[source]
Initialize the header of the CSV file, record the keys to be stored.
- Parameters:
data (dict) – Dictionary containing the output of the reconstruction chain
- append(data: dict[str, Any]) None[source]
Append the CSV file with the output.
- Parameters:
result_blob (dict) – Dictionary containing the output of the reconstruction chain
- static array_diff(array_x: list[str], array_y: list[str]) set[str][source]
Compare the content of two arrays.
This functions returns the elemnts of the first array that do not appear in the second array.
- Parameters:
array_x (List[str]) – First array of strings
array_y (List[str]) – Second array of strings
- Returns:
Set of keys that appear in array_x but not in array_y.
- Return type:
Set[str]