I’ve recently been working on a Signals Pipeline. No secret sauce. As most of the ideas or even the code I used are from previous forum posts or messages from Rocketchat, I’ve open sourced it.
Used to work on a low memory machine so there is a lot of parquet column read/write to disk (that’s not the case anymore and might also be noticed on most recent parts of the code).
Hope someone finds it useful.
Any feedback is more than welcome!
Thanks @olivepossum ! I’m jumping back into Signals now and using this pipeline as my home base. I’m used to programming in R so I’m still getting my bearings with Python, my apologies for this surely noob-ish question. I did a private clone of the repo, specified my properties JSON file, and ran it, with a big block of errors I am unsure if I am reading correctly. I won’t have time to run the downloading again until next week, but wanted to see if my guesses are anywhere near correct. Googling about the errors that came up, I think I may want to use a prepended ‘r’ instead of ‘f’ in folders.py (see line 30 folders.py for example).
The downloader appears to be working as intended but did not succeed in writing the data files.
The output looks strange. Haven’t seen it before.
Did you run several executions at the same time? You shouldn’t for it to work.
It might also be related to OS and/or path issues. I’m running the code on a Linux box Ubuntu 20.04.4 LTS
ran again and this time there are 25 parquet files in the raw downloaded folder. looking at the console output it definitely tried to download again a number of times and had other weird errors. I think I’ll probably run this in a linux environment in the near future to avoid having to figure out Windows/PowerShell specific hang-ups.
This looks like it’s definitely a Windows issue. I’d strongly recommend setting up WSL on your machine WSL | Ubuntu and I also use Windows Terminal to manage the multiple command line interfaces.