Full signals pipeline

Hi,

I’ve recently been working on a Signals Pipeline. No secret sauce. As most of the ideas or even the code I used are from previous forum posts or messages from Rocketchat, I’ve open sourced it.

Used to work on a low memory machine so there is a lot of parquet column read/write to disk (that’s not the case anymore and might also be noticed on most recent parts of the code).

Hope someone finds it useful.
Any feedback is more than welcome!

The current validation metrics look like this:

Thanks to the whole community, especially to @jrdi @habakan @joakim_arvidsson @kunigaku @ageonsen @katsu1110

Thanks!

19 Likes

Thanks @olivepossum ! I’m jumping back into Signals now and using this pipeline as my home base. I’m used to programming in R so I’m still getting my bearings with Python, my apologies for this surely noob-ish question. I did a private clone of the repo, specified my properties JSON file, and ran it, with a big block of errors I am unsure if I am reading correctly. I won’t have time to run the downloading again until next week, but wanted to see if my guesses are anywhere near correct. Googling about the errors that came up, I think I may want to use a prepended ‘r’ instead of ‘f’ in folders.py (see line 30 folders.py for example).

The downloader appears to be working as intended but did not succeed in writing the data files.

Here’s a pastebin of my console output.

Also I’d be glad to contribute to the project after I get my bearings if I come up with anything useful!

Thanks.

Hi @liz good to have you back!

The output looks strange. Haven’t seen it before.
Did you run several executions at the same time? You shouldn’t for it to work.
It might also be related to OS and/or path issues. I’m running the code on a Linux box Ubuntu 20.04.4 LTS

1 Like

Thanks! Yeah I think I ran it once only, but gonna try again. I’m using PowerShell on Windows 10 Pro.

ran again and this time there are 25 parquet files in the raw downloaded folder. looking at the console output it definitely tried to download again a number of times and had other weird errors. I think I’ll probably run this in a linux environment in the near future to avoid having to figure out Windows/PowerShell specific hang-ups.

Linux box (or virtualbox w/ linux) gotta be better than powershell, right?

1 Like

This looks like it’s definitely a Windows issue. I’d strongly recommend setting up WSL on your machine WSL | Ubuntu and I also use Windows Terminal to manage the multiple command line interfaces.

(also welcome back!)

2 Likes