Introduction to Git for Open-Source Energy System Modeling

undefined
 
Open-Source Energy System Modeling
TU Wien, VU 370.062
 
Dipl.-Ing. Dr. Daniel Huppmann
 
Lecture 2:
Hands-on example of working with git
 
Working with 
git
 version control
 
 
Part 1
 
2
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 1
A quick introduction to version control using 
git
 
Key differences between 
git
 version control vs. folder synchronization 
(e.g. Dropbox, Google Drive)
Y
o
u
 
d
e
f
i
n
e
 
t
h
e
 
r
e
l
e
v
a
n
t
 
u
n
i
t
 
o
r
 
s
i
z
e
 
o
f
 
a
 
c
h
a
n
g
e
 
b
y
 
m
a
k
i
n
g
 
a
 
c
o
m
m
i
t
Adding comments to your commits allows to attach relevant info to your code changes
B
r
a
n
c
h
e
s
 
a
l
l
o
w
 
y
o
u
 
t
o
 
s
w
i
t
c
h
 
t
o
 
a
 
"
p
a
r
a
l
l
e
l
 
u
n
i
v
e
r
s
e
"
 
w
i
t
h
i
n
 
a
 
v
e
r
s
i
o
n
 
c
o
n
t
r
o
l
 
r
e
p
o
s
i
t
o
r
y
It’s a decentralized version control tool that supports offline, parallel work
T
h
e
r
e
 
i
s
 
a
 
w
e
l
l
-
d
e
f
i
n
e
d
 
r
o
u
t
i
n
e
 
f
o
r
 
m
e
r
g
i
n
g
 
d
e
v
e
l
o
p
m
e
n
t
s
 
f
r
o
m
 
p
a
r
a
l
l
e
l
 
b
r
a
n
c
h
e
s
Several 
git
 implementations 
(e.g., GitHub) 
provide additional project management tools
U
s
e
r
 
i
n
t
e
r
f
a
c
e
s
 
f
o
r
 
c
o
d
e
 
r
e
v
i
e
w
 
u
s
i
n
g
 
p
u
l
l
 
r
e
q
u
e
s
t
s
Issue tracking and discussion, kanban boards, ...
However, keep in mind that 
git
 is great for uncompiled code and text with simple mark-up
Use other version control tools for data, presentations, compiled software, ...
Git is so much more than just keeping track of code changes over time
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 1
3
A full 
git
 workflow
Git is a decentralized version control system geared for collaboration
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 1
4
 
T
h
e
 
r
e
m
o
t
e
 
r
e
p
o
s
i
t
o
r
y
o
f
 
t
h
e
 
o
f
f
i
c
i
a
l
 
c
o
d
e
b
a
s
e
 
upstream
 
“The internet” (e.g. GitHub)
 
Your computer
 
Working
directory
 
Staging
area
 
g
i
t
 
c
l
o
n
e
 
o
f
t
h
e
 
r
e
p
o
 
Y
o
u
r
 
r
e
m
o
t
e
 
c
o
p
y
 
(
f
o
r
k
)
o
f
 
t
h
e
 
r
e
p
o
s
i
t
o
r
y
 
origin
add
push
pull-request
fork
clone
checkout
commit
fetch
checkout
pull
fetch
 
local
Branching and merging with 
git
 
Getting started
with branching
 
T
h
r
e
e
 
o
p
t
i
o
n
s
 
t
o
 
m
e
r
g
e
 
t
h
e
 
c
h
a
n
g
e
s
 
f
r
o
m
 
d
e
v
 
i
n
t
o
 
m
a
s
t
e
r
There are multiple methods to bring parallel developments back together
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 1
5
 
master
 
dev
 
1) A merge commit
 
2) Rebase
 
3) Squash and merge
 
... a commit
Writing good 
git
 commit messages
 
Useful recommendations to help you (and your colleagues) keep track of your work
Limit the subject line (summary) to 50 characters
Capitalize the subject line
Do not end the subject line with a period
Use the imperative mood in the subject line
Use the body to explain what and why vs. how
A properly formed Git commit summary should be able to complete the following sentence:
If applied, this commit will 
your subject line here
If applied, this commit will 
update getting started documentation
If applied, this commit will 
release version 1.0.0
If applied, this commit will 
merge pull request #123 from user/branch
Selected items from 
chris.beams.io/posts/git-commit/
If at the end of the day/week/year, you don’t remember what you did...
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 1
6
 
Setting up a simple repository
with unit tests and continuous integration
 
The first rule of live demos: Never do a live demo. So let’s do a live demo.
 
Part 2
 
7
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
 
Hands-on exercise: 
github.com/danielhuppmann/lecture-spring-2021
 
Set up a new public GitHub repository at 
www.github.com
Update the README (formatting using 
markdown
)
“Clone” the repository to your computer (recommended for novices: 
gitkraken.com
)
Add a license (why not start with 
APACHE 2.0
?)
 Add the statement and the 
badge
 to the readme
Start developing a little Python function (recommended for novices: 
anaconda.com
)
Add a unit test
Add a gitignore file
Add continuous integration using a new branch
GitHub Actions to execute unit tests
stickler-ci
 to implement linter and code style verification
Create a pull request to execute the CI and merge the new branch into master
Add contributing guidelines, set up templates for pull requests
Create a release
 
8
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
 
Hands-on exercise (Part II)
 
 
If a non-admin user wants to push commits, you have to “fork” the repo
(create a copy under your GitHub user)
Clone the fork to your computer
Start a new branch
Add a new function or extend some feature such that the unit tests fail
Make a pull request to the upstream repository
Fix the code such that unit tests pass
Ask someone else to perform code review
Merge the new development (by an admin)
 
9
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
 
Some practical considerations and advice
 
 
Part 3
 
10
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
 
Time allocation for increasing efficiency through automation
 
xkcd
 by 
Randall
 Munroe
 
Is it worth the time to automate repetitive tasks? Probably not really...
 
11
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
Good enough scientific programming
 
Data management:
save both raw and intermediate forms, create tidy data amenable to analysis
Software:
write, organize, and sharing scripts and programs used in the analysis following best practices
Collaboration:
make it easy for existing and new collaborators to understand and contribute to a project
Project organization:
organize the digital artefacts of a project to ease discovery and understanding
Manuscripts:
write manuscripts with a clear audit trail and minimize manual merging of conflicts
Adapted from Greg Wilson et al. Good enough practices in scientific computing. 
PLoS Comput. Biol.
 13(6), 2017.
doi: 
10.1371/journal.pcbi.1005510
You don’t have to have a PhD in IT to do decent scientific programming!
In fact, it might actually help...
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
12
Good enough scientific programming – Software
Place a brief explanatory comment at the start of every program.
Do not comment and uncomment sections of code to control a program's behaviour.
Decompose programs into functions, and try to keep each function short enough for one screen.
Be ruthless about eliminating duplication.
Always search for well-maintained software libraries that do what you need.
Test libraries before relying on them.
Give functions and variables meaningful names.
Make dependencies and requirements explicit.
Provide a simple example or test data set.
Submit code to a reputable DOI-issuing repository (e.g., 
zenodo
).
Adapted from Greg Wilson et al. Good enough practices in scientific computing. 
PLoS Comput. Biol.
 13(6), 2017.
doi: 
10.1371/journal.pcbi.1005510
Your worst collaborator? Yourself  from six months ago...
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
13
Code style guides
 
Which programming language to use, which other conventions to follow?
If you don’t have a strong preference: follow the community or your room (office) mate!
Some practical guidelines:
Follow a coding etiquette, e.g., 
Black
 & 
PEP8
 for Python, Google’s 
R style guide
For larger projects, agree on a folder structure and hierarchy early (source data, etc.)
Only change folder structure when it’s really necessary
For more complex code (e.g., packages), use tools to automatically
build documentation such as 
Sphinx
 and 
readthedocs.org
Keep in mind...
Code is read more often than it is written
Good code should not need a lot of documentation
Key criteria: readability and consistency with (future) collaborators 
and yourself!
Programming should be seen as a (not foreign) language
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
14
Software releases and semantic versioning
 
Semantic versioning uses a structure like 
<MAJOR>.<MINOR>.<PATCH>
For a new release (i.e., a published version), you 
MUST
 increment...
MAJOR
 when making incompatible API changes,
MINOR
 when adding backwards-compatible functionality,
PATCH
 when making backwards-compatible bug fixes.
Other considerations:
Major version zero (
0.y.z
) is for initial development. Anything may change at any time.
Version 
1.0.0
 defines the public API. After that, rules above must always be followed.
Downstream version numbers 
MUST
 be reset to 0 when incrementing a version number.
You 
MAY
 increment when substantial new internal features are added.
A pre-release version 
MAY
 be denoted by appending a string, e.g., 
1.0.0-alpha
.
 
Adapted from Semantic Versioning 2.0.0, 
semver.org
If a piece of software is used by multiple people, clear versioning is critical
15
0.1
1.0
1.2
2.0
0.1.1
1.1
1.0.1
1.0.2
0.1.2
0.2
1.2.1
0.2.1
 
First release
 
Initial
development
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
Coding etiquette
When you search for my colleague Matthew Gidden on Twitter, the first tweet you find is...
Keep in mind that the internet remembers everything
16
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
Social etiquette
 
Collaborative scientific programming is about communication, not code...
It’s the people, stupid!
And don’t be annoyed when, sometimes, some collaborators are stubborn...
 
Keep in mind that discussions via e-mail, chat, pull requests comments, code review, etc.
lack a lot of the social cues that human interaction is built upon
 
If there are two roughly equivalent ways to do something
and a code reviewer suggests that you use the other approach...
Just do it her/his way if there is no good reason not to – out of respect for the reviewer
and to avoid getting bogged down in escalating discussions
 
Give credit generously to your collaborators and contributors!
Be kind and respectful in collaboration, code review and comments
17
Daniel Huppmann
Open-Source Energy System Modeling, Lecture 2
 
Homework assignment
 
Start a new GitHub repository, add a license and set up continuous-integration (CI) tools
Add functions or small features from any real-life project relevant to your work or interests
T
h
e
 
c
o
d
e
b
a
s
e
 
s
h
o
u
l
d
 
i
n
c
l
u
d
e
 
2
-
4
 
f
u
n
c
t
i
o
n
s
,
 
2
0
-
4
0
 
l
i
n
e
s
 
o
f
 
c
o
d
e
 
i
n
c
l
u
d
i
n
g
 
d
o
c
u
m
e
n
t
a
t
i
o
n
The repository should work as “stand-alone” project
(i.e., no need for other parts of your project/work that are not part of this repository)
If you need any dependencies/packages, add a simple list in a file 
requirements.txt
and follow the instructions 
here
 to make tests pass on Travis (or another, similar tool if you prefer)
Add at least one test per function and make sure that these are executed on CI
If data is necessary to understand the scope of the functions, add a stylized dataset
The README should explain the scope of the project and the purpose of the functions
Invite me as a collaborator to your repository when the project is ready to be reviewed/graded
 
Programming languages: Python (preferred), R, Julia
Invitation to collaborate due by Sunday, April 11, 23:59 (please do not push any commits after)
 
Create a simple repository based on any of your real-life projects
 
Daniel Huppmann
 
Open-Source Energy System Modeling, Lecture 2
 
18
undefined
 
Slide Note
Embed
Share

Git is a powerful version control tool essential for managing code changes in Open-Source Energy System Modeling. This hands-on example covers key concepts such as branches, merging, and writing good commit messages. Understanding Git workflow, branching strategies, and collaborating via platforms like GitHub are crucial for effective project management in this field.


Uploaded on Aug 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Lecture 2: Hands-on example of working with git Open-Source Energy System Modeling TU Wien, VU 370.062 Dipl.-Ing. Dr. Daniel Huppmann Please consider the environment before printing this slide deck Icon from all-free-download.com, Environmental icons 310835 by BSGstudio, under CC-BY all-free-download.com BSGstudio

  2. Part 1 Working with git version control Open-Source Energy System Modeling, Lecture 1 Daniel Huppmann 2

  3. A quick introduction to version control using git Git is so much more than just keeping track of code changes over time Key differences between git version control vs. folder synchronization (e.g. Dropbox, Google Drive) You define the relevant unit or size of a change by making a commit Adding comments to your commits allows to attach relevant info to your code changes Branches Branches allow you to switch to a "parallel universe" within a version control repository It s a decentralized version control tool that supports offline, parallel work There is a well-defined routine for merging merging developments from parallel branches commit Several git implementations (e.g., GitHub) provide additional project management tools User interfaces for code review using pull requests pull requests Issue tracking and discussion, kanban boards, ... However, keep in mind that git is great for uncompiled code and text with simple mark-up Use other version control tools for data, presentations, compiled software, ... Open-Source Energy System Modeling, Lecture 1 Daniel Huppmann 3

  4. A full git workflow Git is a decentralized version control system geared for collaboration The internet (e.g. GitHub) Your computer The remote remote repository of the official codebase Your remote remote copy (fork of the repository origin fork) git clone clone of the repo local Staging area Working directory upstream clone checkout fork commit push add fetch checkout pull-request pull fetch Open-Source Energy System Modeling, Lecture 1 Daniel Huppmann 4

  5. Branching and merging with git There are multiple methods to bring parallel developments back together merge the changes from dev into master 2) Rebase Three options to merge 1) A merge commit Getting started with branching 3) Squash and merge ... a commit dev dev dev dev master master master master Open-Source Energy System Modeling, Lecture 1 Daniel Huppmann 5

  6. Writing good git commit messages If at the end of the day/week/year, you don t remember what you did... Useful recommendations to help you (and your colleagues) keep track of your work Limit the subject line (summary) to 50 characters Capitalize the subject line Do not end the subject line with a period Use the imperative mood in the subject line Use the body to explain what and why vs. how A properly formed Git commit summary should be able to complete the following sentence: If applied, this commit will your subject line here If applied, this commit will update getting started documentation If applied, this commit will release version 1.0.0 If applied, this commit will merge pull request #123 from user/branch Selected items from chris.beams.io/posts/git-commit/ Open-Source Energy System Modeling, Lecture 1 Daniel Huppmann 6

  7. Part 2 Setting up a simple repository with unit tests and continuous integration The first rule of live demos: Never do a live demo. So let s do a live demo. Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 7

  8. Hands-on exercise: github.com/danielhuppmann/lecture-spring-2021 Set up a new public GitHub repository at www.github.com Update the README (formatting using markdown) Clone the repository to your computer (recommended for novices: gitkraken.com) Add a license (why not start with APACHE 2.0?) Add the statement and the badge to the readme Start developing a little Python function (recommended for novices: anaconda.com) Add a unit test Add a gitignore file Add continuous integration using a new branch GitHub Actions to execute unit tests stickler-ci to implement linter and code style verification Create a pull request to execute the CI and merge the new branch into master Add contributing guidelines, set up templates for pull requests Create a release Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 8

  9. Hands-on exercise (Part II) If a non-admin user wants to push commits, you have to fork the repo (create a copy under your GitHub user) Clone the fork to your computer Start a new branch Add a new function or extend some feature such that the unit tests fail Make a pull request to the upstream repository Fix the code such that unit tests pass Ask someone else to perform code review Merge the new development (by an admin) Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 9

  10. Part 3 Some practical considerations and advice Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 10

  11. Time allocation for increasing efficiency through automation Is it worth the time to automate repetitive tasks? Probably not really... xkcd by Randall Munroe Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 11

  12. Good enough scientific programming You don t have to have a PhD in IT to do decent scientific programming! In fact, it might actually help... Data management: save both raw and intermediate forms, create tidy data amenable to analysis Software: write, organize, and sharing scripts and programs used in the analysis following best practices Collaboration: make it easy for existing and new collaborators to understand and contribute to a project Project organization: organize the digital artefacts of a project to ease discovery and understanding Manuscripts: write manuscripts with a clear audit trail and minimize manual merging of conflicts Adapted from Greg Wilson et al. Good enough practices in scientific computing. PLoS Comput. Biol. 13(6), 2017. doi: 10.1371/journal.pcbi.1005510 Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 12

  13. Good enough scientific programming Software Your worst collaborator? Yourself from six months ago... Place a brief explanatory comment at the start of every program. Do not comment and uncomment sections of code to control a program's behaviour. Decompose programs into functions, and try to keep each function short enough for one screen. Be ruthless about eliminating duplication. Always search for well-maintained software libraries that do what you need. Test libraries before relying on them. Give functions and variables meaningful names. Make dependencies and requirements explicit. Provide a simple example or test data set. Submit code to a reputable DOI-issuing repository (e.g., zenodo). Adapted from Greg Wilson et al. Good enough practices in scientific computing. PLoS Comput. Biol. 13(6), 2017. doi: 10.1371/journal.pcbi.1005510 Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 13

  14. Code style guides Programming should be seen as a (not foreign) language Which programming language to use, which other conventions to follow? If you don t have a strong preference: follow the community or your room (office) mate! Some practical guidelines: Follow a coding etiquette, e.g., Black & PEP8 for Python, Google s R style guide For larger projects, agree on a folder structure and hierarchy early (source data, etc.) Only change folder structure when it s really necessary For more complex code (e.g., packages), use tools to automatically build documentation such as Sphinx and readthedocs.org Keep in mind... Code is read more often than it is written Good code should not need a lot of documentation Key criteria: readability and consistency with (future) collaborators and yourself! Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 14

  15. Software releases and semantic versioning If a piece of software is used by multiple people, clear versioning is critical 0.1 Semantic versioning uses a structure like <MAJOR>.<MINOR>.<PATCH> 0.1.1 For a new release (i.e., a published version), you MUST increment... MAJOR when making incompatible API changes, MINOR when adding backwards-compatible functionality, PATCH when making backwards-compatible bug fixes. Initial 0.1.2 development 0.2 0.2.1 First release 1.0 Other considerations: Major version zero (0.y.z) is for initial development. Anything may change at any time. Version 1.0.0 defines the public API. After that, rules above must always be followed. 1.0.1 1.0.2 1.1 Downstream version numbers MUST be reset to 0 when incrementing a version number. 1.2 You MAY increment when substantial new internal features are added. A pre-release version MAY be denoted by appending a string, e.g., 1.0.0-alpha. 1.2.1 Adapted from Semantic Versioning 2.0.0, semver.org 2.0 Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 15

  16. Coding etiquette Keep in mind that the internet remembers everything When you search for my colleague Matthew Gidden on Twitter, the first tweet you find is... Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 16

  17. Social etiquette Be kind and respectful in collaboration, code review and comments Collaborative scientific programming is about communication, not code... It s the people, stupid! And don t be annoyed when, sometimes, some collaborators are stubborn... Keep in mind that discussions via e-mail, chat, pull requests comments, code review, etc. lack a lot of the social cues that human interaction is built upon If there are two roughly equivalent ways to do something and a code reviewer suggests that you use the other approach... Just do it her/his way if there is no good reason not to out of respect for the reviewer and to avoid getting bogged down in escalating discussions Give credit generously to your collaborators and contributors! Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 17

  18. Homework assignment Create a simple repository based on any of your real-life projects Start a new GitHub repository, add a license and set up continuous-integration (CI) tools Add functions or small features from any real-life project relevant to your work or interests The codebase should include 2-4 functions, 20-40 lines of code including documentation The repository should work as stand-alone project (i.e., no need for other parts of your project/work that are not part of this repository) If you need any dependencies/packages, add a simple list in a file requirements.txt and follow the instructions here to make tests pass on Travis (or another, similar tool if you prefer) Add at least one test per function and make sure that these are executed on CI If data is necessary to understand the scope of the functions, add a stylized dataset including documentation The README should explain the scope of the project and the purpose of the functions Invite me as a collaborator to your repository when the project is ready to be reviewed/graded Programming languages: Python (preferred), R, Julia Invitation to collaborate due by Sunday, April 11, 23:59 (please do not push any commits after) Open-Source Energy System Modeling, Lecture 2 Daniel Huppmann 18

  19. Thank you very much for your attention! Dr. Daniel Huppmann Research Scholar Energy Program International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1, A-2361 Laxenburg, Austria huppmann@iiasa.ac.at huppmann@iiasa.ac.at http://www.iiasa.ac.at/staff/huppmann http://www.iiasa.ac.at/staff/huppmann This presentation is licensed under Creative Commons Attribution 4.0 International License a Creative Commons Attribution 4.0 International License

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#