Chapter 2 Overview
Our vision is a scientific culture that is more efficient and collaborative, and can uncover environmental solutions faster.
This Series is going to be fun and empowering! We will talk about a lot of tools and practices to make your science more streamlined. This is really powerful, cool stuff, and not just for data: I made and published this book using the tools and workflows we’ll talk about.
The first half of the Series focuses on efficiency and open culture within the lab, and the second half is about sustained learning and bringing these practices to the broader campus community.
2.1 Why we’re here
We are passionate environmental scientists studying important, time-sensitive topics using data of all kinds. And we were never taught to work efficiently with data.
We are here because I know these files are on your computer — we all have them.
data_final_final.xls data_final_usethis.xls ... thesis_v16_new_ch1.docx thesis_v16.docx ...
And we also send and receive emails with subject lines like:
Re:FWD:Fwd:Data question Re:Sending again with the correct version
We are going to talk about how to make the data experience better, for you, your lab, your department, and beyond.
Data analysis can be inefficient and demoralizing when you’re without the right tools/skills and you feel alone.
But! Open tools, practices, and communities exist that are powerful and empowering, and game-changing for science. And we can learn and use open practices for science.
They are like the Force from Star Wars:
- More powerful than you ever imagined
- Helps you solve your current question powerfully – but also broadens the scope of the questions you can ask
- Learn from jedis, pass on what you have learned, have a ton of awesome allies (and not all allies are jedis)
2.2 What to expect
2.2.1 This is going to be fun and empowering!
We are going to be discussing a wide range of topics and working to seed habits for you to engage and learn with them with our lab and others on campus.
2.2.2 Exposure to relevant tools & practices, confidence & agency to engage, community to learn with
The plan is to expose you to a lot of great tools and practices that you can have confidence using in your research. We will also spend time helping you plan how to actually incrementally weave them into your existing workflows.
The point is not to overwhelm you or make you feel like it’s too late for you or that you would need to throw out and redo everything you’ve ever done in order to take the first step. No. By seeing what’s possible and how shared practices can make your own life easier, and life easier and more streamlined and fun with your lab and beyond, you’ll start experimenting with these practices and in a few years you will be working in a completely different way.
2.2.4 No skills required. We will strategize about general approaches, specific examples using R/RStudio and GitHub
There are no skills required to participate, and we will not be teaching hands-on how to code or set up databases. But we will be talking about how these are important and fit together in the big picture, and how to get started learning the skills you need. This is an opportunity to discuss existing tools and how to engage, meet other labs, discuss next steps, and stay accountable.
We’ll talk about tools and practices broadly, but also with specific examples using R and GitHub. Won’t that software eventually become outdated you say — is it worth learning them over something else? The answer is yes, software will change and become outdated; it always has. But seeing what is possible and becoming versed in embracing existing architecture and practices will set you up to make whatever transition comes, and you will make this transition with the community, not along. Your skills will be transferrable skills as the actual software changes. Analogy: if you learn one musical instrument, you will be able to learn another one more fluidly than if you have never learned one to begin with because maybe you can read music, understand something about timing and rhythm, etc.
2.2.5 Everyone is coming with different experiences & expectations
Everyone in this workshop is coming from a different place with different experiences and expectations. But everyone will learn something new here, because there is so much innovation in the data science world. You are encouraged to ask questions and answer those of others.
2.2.6 We are all learning together
These tools are new to all of us, and the best ideas come from questions from anyone. If you are already familiar with some of this material, think about how your experience was learning it, and how you might teach it to others. Use these workshop materials not only as a reference in the future but also for talking points so you can communicate the importance of these tools to your communities. A big part of this Series is not only for you to learn these skills, but for you to also teach others and increase the value and practice of open data science in science as a whole.
2.2.7 Vulnerability: yes! Shame: no.
Shame is not allowed here. No “I’m 34 and haven’t learned GitHub, it’s too late for me” or any of that. We have never had the opportunity to learn these things, there should be no shame on your part for that. It takes a lot of time and dedicated effort to learn and employ these practices, and they should be valued and taught. That’s why you’re here now, you should be proud that you are taking the initiative and your time to do this. No shame.
Vulnerability, however, will be involved in this Series. Vulnerability is a big part of learning and trying new things — this is a safe place for everyone to learn. Vulnerability is taking stock of where you are now and help you map out where you want to be. Being vulnerable is scary. But it shouldn’t be lonely: we all have data confessions that would love to talk about and get help with, if only our scientific culture said that was OK; if only we knew how to articulate our questions and have someone to ask. This is a place to share our vulnerabilities to ignite real change. Ask questions. Whether it’s a keyboard shortcut or philosophy of data workflows, ask and let’s talk about it.
2.2.8 Everyone is welcome here
You are all welcome here, please be respectful of one another. We are setting a tone of mutual respect and a space place for learning where we assume good intentions and interact with kindness and empathy. Pass it on.
2.3 What’s possible with open data science (demo)
- R for automation, visualizations
- github for collaborating (code, text)
- github for project management
- organize by project, i.e., keep that code and those methods in same parent folder, rather than all the R code you’ve ever written being in a giant folder, spanning projects
- public & private issues, tagging people on commits, kaban board
Live: fix a tpyo and republish the book/page
2.4 What we’ll learn
2.4.1 Expect that there is a better way
Seeing what’s possible opens up what you expect. There is a bit of a chicken and egg issue here: you need to be exposed to things so you know what’s possible and what skills to develop, but you need to kind of know what to look for so you can absorb what you are exposed to.
2.4.2 Have agency to find it
Break down that “I teach you learn” model. We are all here to learn and improve. Learning horizontally.
This series is not about micro-managing your science but about providing guidance & structure so that everyone in the lab is not silently struggling to reinvent the wheel and coming up with weird homegrown data approaches.
What skills you should have and what you should be thinking of, along with some of the tools you can use. Will be building out the Resources page on the website for this purpose. And search the blogs.
2.4.3 Have community to learn with
No more silently struggling & reinventing the wheel & creating weird, homegrown workarounds.
Embrace emerging and established community best practices
2.4.4 Identify what skills and tools you need, map next steps & learn
- Be champions for open data science (in your groups, departments, communities.)
- A more open culture in your group
- dedicated lab meetings to discuss data workflows
- “Seaside chats” (<- this is what we call them at OHI)
- stated code of conduct or lab group philosophy
- beginnings of a lab roadmap of shared data workflows
- dedicated lab meetings to discuss data workflows
- A growing community of practice on campus
- study groups / coding clubs (ex: Eco-Data-Science)
- hacky hours
2.5.1 What would you do in a Seaside Chat?
Example topics from the Ocean Health Index:
- Let’s have READMEs so we know what the heck things are
- Set up Zotero with RMarkdown
- Filepath woes: use .Rprojects
- Where to put data – here’s how our server works
- Filepath woes 2: use the new
- Let’s plan a lab “hackathon” to move these .xls to .csv files we store on Github
2.5.2 What would you do in a Study Group?
Example lessons from Eco-Data-Science (<- all lessons linked “previous sessions”)
There will be assignments between each call that should take about two hours over two weeks. Assignments are designed to be done during lab “Seaside chats,” weekly meetings to discuss data workflows and establish shared practices.
Assigned after each call. Do them collaboratively during lab “Seaside chats.” They should take take 2 hours (over 2 weeks). Come prepared to debrief in the following Cohort Call!
2.6 Additional reading
- Practical Computing for Biologists. Introduction to the Terminal/command line, introduction to regular expressions. Chapter 2 alone is incredibly powerful
- Virtual meetings